Semi-reliable GitHub scripting

The githubpy python library provides a thin layer on top of the GitHub V3 API, which is convenient because the official GitHub documentation can be used. The undocumented behavior of GitHub is outside of the scope of this library and needs to be addressed by the caller.

For instance creating a repository is asynchronous and checking for its existence may fail. Something similar to the following function should be used to wait until it exists:

    def project_exists(self, name):
        retry = 10
        while retry > 0:
            try:
                for repo in self.github.g.user('repos').get():
                    if repo['name'] == name:
                        return True
                return False
            except github.ApiError:
                time.sleep(5)
            retry -= 1
        raise Exception('error getting the list of repos')

    def add_project(self):
        r = self.github.g.user('repos').post(
            name=GITHUB['repo'],
            auto_init=True)
        assert r['full_name'] == GITHUB['username'] + '/' + GITHUB['repo']
        while not self.project_exists(GITHUB['repo']):
            pass

Another example is merging a pull request. It sometimes fails (503, cannot be merged error) although it succeeds in the background. To cope with that, the state of the pull request should be checked immediately after the merge failed. It can either be merged or closed (although the GitHub web interface shows it as merged). The following function can be used to cope with that behavior:

    def merge(self, pr, message):
        retry = 10
        while retry > 0:
            try:
                current = self.github.repos().pulls(pr).get()
                if current['state'] in ('merged', 'closed'):
                    return
                logging.info('state = ' + current['state'])
                self.github.repos().pulls(pr).merge().put(
                    commit_message=message)
            except github.ApiError as e:
                logging.error(str(e.response))
                logging.exception('merging ' + str(pr) + ' ' + message)
                time.sleep(5)
            retry -= 1
        assert retry > 0

These two examples have been implemented as part of the ceph-workbench integration tests. The behavior described above can be reproduced by running the test in a loop during a few hours.

write-only ssh based rsync server

A write-only rsync server can be used by anyone to upload content with no risk of deleting existing files. Assuming access to the rsync server is handled via ssh, the following line can be added to the ~/.ssh/authorized_keys file

command="rrsync /usr/share/nginx/html" ssh-rsa AAAAB3NzaC1y...

The rrsync script is found in the rsync package documentation and installed with:

gzip -d < /usr/share/doc/rsync/scripts/rrsync.gz > /usr/bin/rrsync
chmod +x /usr/bin/rrsync

DNS spoofing with RPZ and bind9

When two web services reside on the same LAN, it may be convenient to spoof DNS entries to use the LAN IP instead of the public IP. It can be done using RPZ and bind9.
For instance workbench.dachary.org can be mapped to 10.0.2.21 with

$ cat /etc/bind/rpz.db
$TTL 60
@            IN    SOA  localhost. root.localhost.  (
                          2   ; serial
                          3H  ; refresh
                          1H  ; retry
                          1W  ; expiry
                          1H) ; minimum
                  IN    NS    localhost.

workbench.dachary.org        A    10.0.2.21

The zone is declared in

$ cat /etc/bind/named.conf.local
zone "rpz" {
      type master;
      file "/etc/bind/rpz.db";
      allow-query {none;};
};

and the response-policy is set in the options file with

$ cat /etc/bind/named.conf.options
...
	response-policy { zone "rpz"; };
};

When bind9 is restarted with /etc/init.d/bind9 restart, the mapping can be verified with

$ dig @127.0.0.1 workbench.dachary.org
workbench.dachary.org.	5	IN	A	10.0.2.21

If the bind9 server runs on a docker host, it can be used by docker containers with

docker run  ... --dns=172.17.42.1 ...

Using a cloud image with kvm

It would be convenient to have a virt-builder oneliner such as

$ virt-builder --arch i386 --ssh-inject ~/.ssh/id_rsa.pub fedora-21

to get an image suitable to run and login with

$ qemu-kvm -m 1024 -net user,hostfwd=tcp::2222-:22 \
  -drive file=fedora-21.qcow2 &
$ ssh -p 2222 localhost grep PRETTY /etc/os-release
PRETTY_NAME="Fedora 21 (Twenty One)"

Docker users have a simpler form because there is no need to ssh to enter the container:

$ docker run fedora:21 grep PRETTY /etc/os-release
PRETTY_NAME="Fedora 21 (Twenty One)"

It is not currently possible to use virt-builder as described above because

  • the set of images available by default is limited (no i386 architecture for instance)
  • the –inject-ssh option is only available in the development version

The libguestfs.org toolbox can however be used to implement a script modifying images prepared for the cloud (see ubuntu cloud images for instance):

  • wget the image
    wget -O my.img http://cloud-images.ubuntu.com/trusty/current/trusty-server-cloudimg-i386-disk1.img
    
  • create a config-drive for cloud-init to feed it the ssh public key.
    mkdir -p config-drive
    cat > config-drive/user-data <<EOF
    #cloud-config
    ssh_authorized_keys:
     - $(cat ~/.ssh/id_rsa.pub)
    chpasswd: { expire: False }
    EOF
    cat > config-drive/meta-data <<EOF
    instance-id: iid-123459
    local-hostname: testhost
    EOF
    ( cd config-drive ; LIBGUESTFS_BACKEND=direct virt-make-fs \
      --type=msdos --label=cidata .  ../config-drive.img )
    
  • launch the image with the config drive attached and it will be auto detected
    qemu-kvm -m 1024 -net user,hostfwd=tcp::2222-:22 \
      -drive file=my.img -drive config-drive.img
    

Continue reading “Using a cloud image with kvm”

Upgrade nodejs on Ubuntu 14.04

To run gh a version of nodejs more recent than the one packaged by default on Ubuntu 14.04 is required:

$ apt-cache policy nodejs
nodejs:
  Installed: 0.10.25~dfsg2-2ubuntu1
  Candidate: 0.10.25~dfsg2-2ubuntu1
  Version table:
 *** 0.10.25~dfsg2-2ubuntu1 0
        500 http://fr.archive.ubuntu.com/ubuntu/ trusty/universe amd64 Packages
        100 /var/lib/dpkg/status
$ gh watch
fatal: Please update your NodeJS version: http://nodejs.org/download

The recommended way to upgrade is currently broken and the following can be used instead:

sudo add-apt-repository 'deb https://deb.nodesource.com/node trusty main'
sudo apt-get update
sudo apt-get install nodejs

If either apt-get update or apt-get install fail with a message like SSL: certificate subject name:

...
Err https://deb.nodesource.com trusty/main amd64 Packages
  SSL: certificate subject name (login.meteornetworks.com) does not match target host name 'deb.nodesource.com'
Ign http://ceph.com trusty/main Translation-en
Err https://deb.nodesource.com trusty/main i386 Packages
  SSL: certificate subject name (login.meteornetworks.com) does not match target host name 'deb.nodesource.com'
Ign https://deb.nodesource.com trusty/main Translation-en_US
Ign https://deb.nodesource.com trusty/main Translation-en
Ign http://get.docker.io docker/main Translation-en_US
Ign http://get.docker.io docker/main Translation-en
W: Failed to fetch https://deb.nodesource.com/node/dists/trusty/main/binary-amd64/Packages  SSL: certificate subject name (login.meteornetworks.com) does not match target host name 'deb.nodesource.com'
W: Failed to fetch https://deb.nodesource.com/node/dists/trusty/main/binary-i386/Packages  SSL: certificate subject name (login.meteornetworks.com) does not match target host name 'deb.nodesource.com'
E: Some index files failed to download. They have been ignored, or old ones used instead.

The following will fix it:

echo 'Acquire::https::deb.nodesource.com::Verify-Peer "false";' > /etc/apt/apt.conf.d/99verify

Alternatively a version of gh that does not require a recent version of nodejs can be installed with

sudo npm install -g gh@1.9.4

An example of controlled technical debt

When I started working to help with Ceph backports, I was not familiar with the workflow (who does what, when and why) or the conventions (referencing commits from redmine issues, the redmine backport field, …). I felt the need for scripts to help me cross reference information (from git, github and redmine) and consolidate them into an inventory which I could use as a central point to measure progress and find what needed to be done. But I was not able to formulate this in so many words and at the beginning it was little more than a vague feeling that I would quickly be lost if I did not write down my findings. I chose to write a script, with no tests and no structure, to do things like matching a pull request with a redmine issue when the only clue was a Fixes: #XXX embedded in the comment one of the commits.

After a few weeks the script grew into a 500 lines monstrosity, extremely useful and quite impossible to maintain in the long run. My excuse was that I had no clue what I needed to begin with and that I could not have understood the backport workflow without this script. After the first backport release was declared ready, I stopped adding functionalities and re-started from scratch what became the ceph-workbench backport sub command.

This refactor was done without modifying the behavior of the original script (there were only a few occurrences where it was impossible to preserve). The architecture of the script was completely new: the original script was a near linear sequence of operations with only global variables. The quick summary is that the script pulls information from a few sources (one class for redmine, one for gitlab, one for git), cross reference them with ad-hoc methods and display them into rdoc pages to be displayed in the wiki.

Writing unit tests helped proceed incrementally, pulling one code snippet after the other and checking they were not broken by the refactor. Instead of unit testing the top level command, integration tests were written and run via tox, using real gitlab and redmine instances as fixtures running in docker containers. It will help when adding new use cases such as scrapping the ceph-qa mailing list to match teuthology job failures with the corresponding redmine issue or interpreting the Backport: field in commit messages.

retrieve github pull requests in JSON

The following python function returns a map associating each pull request number to its JSON description for the given repo. The OAuth token is needed so github will allow more requests to be processed during a given time frame. The result is cached in a file and refreshed every 24 hours.

import urllib2
import json
import re
import os
import time

def get_pull_requests(repo, token):
    # https://developer.github.com/v3/pulls/#list-pull-requests
    pulls_file = "/tmp/pulls.json"
    if ( not os.access(pulls_file, 0) or
         time.time() - os.stat(pulls_file).st_mtime > 24 * 60 * 60 ):
        pulls = {}
        url = ("https://api.github.com/repos/" + repo +
               "/pulls?state=all&access_token=" + token )
        while url:
            github = urllib2.Request(url=url)
            f = urllib2.urlopen(github)
            for pull in json.loads(f.read()):
                pulls[pull['number']] = pull
            url = None
            for link in f.info()['Link'].split(','):
                if 'rel="next"' in link:
                    m = re.search('<(.*)>', link)
                    if m:
                        url = m.group(1)
        with open(pulls_file, 'w') as f:
            json.dump(pulls, f)
    else:
        with open(pulls_file, 'r') as f:
            pulls = json.load(f)
    return pulls

For instance

pulls = get_pull_requests('ceph/ceph', '64933d355fda984108b4aad2c5cd4c4a224aad')

The same pagination logic applies to all API calls (see Web Linking RFC 5988 for more information) and parsing could use the LinkHeader module instead of rudimentary regexp parsing.

Testing CPU features with Qemu

The Ceph erasure code plugin must run on Intel CPU that have no SSE4.2 support. A Qemu is run without SSE4.2 support:

qemu-system-x86_64 -machine accel=kvm:tcg -m 2048 \
  -drive file=server.img -boot c \
  -display sdl \
  -net nic -net user,hostfwd=tcp::2222-:22 \
  -fsdev local,security_model=passthrough,id=fsdev0,path=~/ceph \
  -device virtio-9p-pci,id=fs0,fsdev=fsdev0,mount_tag=hostshare

The qemu CPU has no SSE4.2 although the native CPU has it:

$ grep sse4.2 /proc/cpuinfo | wc -l
4
$ ssh -p 2222 loic@127.0.0.1 grep sse4.2 /proc/cpuinfo | wc -l
0

The local development directory is a Plan 9 folder shared over Virtio mounted inside the VM:

sudo mount -t 9p -o trans=virtio,version=9p2000.L hostshare /home/loic/ceph

and the functional test is run to assert that encoding and decoding an object:

$ cd /home/loic/ceph/src
$ ./unittest_erasure_code_jerasure
...
[----------] Global test environment tear-down
[==========] 16 tests from 8 test cases ran. (30 ms total)
[  PASSED  ] 16 tests.