An example of controlled technical debt

When I started working to help with Ceph backports, I was not familiar with the workflow (who does what, when and why) or the conventions (referencing commits from redmine issues, the redmine backport field, …). I felt the need for scripts to help me cross reference information (from git, github and redmine) and consolidate them into an inventory which I could use as a central point to measure progress and find what needed to be done. But I was not able to formulate this in so many words and at the beginning it was little more than a vague feeling that I would quickly be lost if I did not write down my findings. I chose to write a script, with no tests and no structure, to do things like matching a pull request with a redmine issue when the only clue was a Fixes: #XXX embedded in the comment one of the commits.

After a few weeks the script grew into a 500 lines monstrosity, extremely useful and quite impossible to maintain in the long run. My excuse was that I had no clue what I needed to begin with and that I could not have understood the backport workflow without this script. After the first backport release was declared ready, I stopped adding functionalities and re-started from scratch what became the ceph-workbench backport sub command.

This refactor was done without modifying the behavior of the original script (there were only a few occurrences where it was impossible to preserve). The architecture of the script was completely new: the original script was a near linear sequence of operations with only global variables. The quick summary is that the script pulls information from a few sources (one class for redmine, one for gitlab, one for git), cross reference them with ad-hoc methods and display them into rdoc pages to be displayed in the wiki.

Writing unit tests helped proceed incrementally, pulling one code snippet after the other and checking they were not broken by the refactor. Instead of unit testing the top level command, integration tests were written and run via tox, using real gitlab and redmine instances as fixtures running in docker containers. It will help when adding new use cases such as scrapping the ceph-qa mailing list to match teuthology job failures with the corresponding redmine issue or interpreting the Backport: field in commit messages.

retrieve github pull requests in JSON

The following python function returns a map associating each pull request number to its JSON description for the given repo. The OAuth token is needed so github will allow more requests to be processed during a given time frame. The result is cached in a file and refreshed every 24 hours.

import urllib2
import json
import re
import os
import time

def get_pull_requests(repo, token):
    # https://developer.github.com/v3/pulls/#list-pull-requests
    pulls_file = "/tmp/pulls.json"
    if ( not os.access(pulls_file, 0) or
         time.time() - os.stat(pulls_file).st_mtime > 24 * 60 * 60 ):
        pulls = {}
        url = ("https://api.github.com/repos/" + repo +
               "/pulls?state=all&access_token=" + token )
        while url:
            github = urllib2.Request(url=url)
            f = urllib2.urlopen(github)
            for pull in json.loads(f.read()):
                pulls[pull['number']] = pull
            url = None
            for link in f.info()['Link'].split(','):
                if 'rel="next"' in link:
                    m = re.search('<(.*)>', link)
                    if m:
                        url = m.group(1)
        with open(pulls_file, 'w') as f:
            json.dump(pulls, f)
    else:
        with open(pulls_file, 'r') as f:
            pulls = json.load(f)
    return pulls

For instance

pulls = get_pull_requests('ceph/ceph', '64933d355fda984108b4aad2c5cd4c4a224aad')

The same pagination logic applies to all API calls (see Web Linking RFC 5988 for more information) and parsing could use the LinkHeader module instead of rudimentary regexp parsing.

Testing CPU features with Qemu

The Ceph erasure code plugin must run on Intel CPU that have no SSE4.2 support. A Qemu is run without SSE4.2 support:

qemu-system-x86_64 -machine accel=kvm:tcg -m 2048 \
  -drive file=server.img -boot c \
  -display sdl \
  -net nic -net user,hostfwd=tcp::2222-:22 \
  -fsdev local,security_model=passthrough,id=fsdev0,path=~/ceph \
  -device virtio-9p-pci,id=fs0,fsdev=fsdev0,mount_tag=hostshare

The qemu CPU has no SSE4.2 although the native CPU has it:

$ grep sse4.2 /proc/cpuinfo | wc -l
4
$ ssh -p 2222 loic@127.0.0.1 grep sse4.2 /proc/cpuinfo | wc -l
0

The local development directory is a Plan 9 folder shared over Virtio mounted inside the VM:

sudo mount -t 9p -o trans=virtio,version=9p2000.L hostshare /home/loic/ceph

and the functional test is run to assert that encoding and decoding an object:

$ cd /home/loic/ceph/src
$ ./unittest_erasure_code_jerasure
...
[----------] Global test environment tear-down
[==========] 16 tests from 8 test cases ran. (30 ms total)
[  PASSED  ] 16 tests.

Vue subjective de la naissance de l'Erasure Code dans Ceph

L’erasure code, c’est aussi le RAID5, qui permet de perdre un disque dur sans perdre ses données. Du point de vue de l’utilisateur, le concept est simple et utile, mais pour la personne qui est chargée de concevoir le logiciel qui fait le travail, c’est un casse-tête. On trouve des boîtiers RAID5 à trois disques dans n’importe quelle boutique : quand l’un d’eux cesse de fonctionner, on le remplace et les fichiers sont toujours là. On pourrait imaginer ça avec six disques dont deux cessent de fonctionner simultanément. Mais non : au lieu d’avoir recours à une opération XOR, assimilable en cinq minutes, il faut des corps de Galois, un bon bagage mathématique et beaucoup de calculs. Pour corser la difficulté, dans un système de stockage distribué tel que Ceph, les disques sont souvent déconnectés temporairement pour cause d’indisponibilité réseau.
Continue reading “Vue subjective de la naissance de l'Erasure Code dans Ceph”

random read disk stress test

When a GNU/Linux machine exhibits a high iowait (i.e. more than 20% of the processor time is locked waiting for IO to complete), it does not always mean a lot of bytes are read or written. It is demonstrated that reading randomly on the disk will create conditions in which less than 5Mb/s are read and the disk will be busy most of the time, which translates into an iowait greater than 20%. A workaround is to give more RAM to the (virtual) machine so that pages read are cached in memory and read only once, therefore reducing the probability of random reads.
Continue reading “random read disk stress test”