The following python function returns a map associating each pull request number to its JSON description for the given repo. The OAuth token is needed so github will allow more requests to be processed during a given time frame. The result is cached in a file and refreshed every 24 hours.
import urllib2 import json import re import os import time def get_pull_requests(repo, token): # https://developer.github.com/v3/pulls/#list-pull-requests pulls_file = "/tmp/pulls.json" if ( not os.access(pulls_file, 0) or time.time() - os.stat(pulls_file).st_mtime > 24 * 60 * 60 ): pulls = {} url = ("https://api.github.com/repos/" + repo + "/pulls?state=all&access_token=" + token ) while url: github = urllib2.Request(url=url) f = urllib2.urlopen(github) for pull in json.loads(f.read()): pulls[pull['number']] = pull url = None for link in f.info()['Link'].split(','): if 'rel="next"' in link: m = re.search('<(.*)>', link) if m: url = m.group(1) with open(pulls_file, 'w') as f: json.dump(pulls, f) else: with open(pulls_file, 'r') as f: pulls = json.load(f) return pulls
For instance
pulls = get_pull_requests('ceph/ceph', '64933d355fda984108b4aad2c5cd4c4a224aad')
The same pagination logic applies to all API calls (see Web Linking RFC 5988 for more information) and parsing could use the LinkHeader module instead of rudimentary regexp parsing.