retrieve github pull requests in JSON

The following python function returns a map associating each pull request number to its JSON description for the given repo. The OAuth token is needed so github will allow more requests to be processed during a given time frame. The result is cached in a file and refreshed every 24 hours.

import urllib2
import json
import re
import os
import time

def get_pull_requests(repo, token):
    # https://developer.github.com/v3/pulls/#list-pull-requests
    pulls_file = "/tmp/pulls.json"
    if ( not os.access(pulls_file, 0) or
         time.time() - os.stat(pulls_file).st_mtime > 24 * 60 * 60 ):
        pulls = {}
        url = ("https://api.github.com/repos/" + repo +
               "/pulls?state=all&access_token=" + token )
        while url:
            github = urllib2.Request(url=url)
            f = urllib2.urlopen(github)
            for pull in json.loads(f.read()):
                pulls[pull['number']] = pull
            url = None
            for link in f.info()['Link'].split(','):
                if 'rel="next"' in link:
                    m = re.search('<(.*)>', link)
                    if m:
                        url = m.group(1)
        with open(pulls_file, 'w') as f:
            json.dump(pulls, f)
    else:
        with open(pulls_file, 'r') as f:
            pulls = json.load(f)
    return pulls

For instance

pulls = get_pull_requests('ceph/ceph', '64933d355fda984108b4aad2c5cd4c4a224aad')

The same pagination logic applies to all API calls (see Web Linking RFC 5988 for more information) and parsing could use the LinkHeader module instead of rudimentary regexp parsing.

Leave a Reply

Your email address will not be published. Required fields are marked *