The following python function returns a map associating each pull request number to its JSON description for the given repo. The OAuth token is needed so github will allow more requests to be processed during a given time frame. The result is cached in a file and refreshed every 24 hours.
import urllib2
import json
import re
import os
import time
def get_pull_requests(repo, token):
# https://developer.github.com/v3/pulls/#list-pull-requests
pulls_file = "/tmp/pulls.json"
if ( not os.access(pulls_file, 0) or
time.time() - os.stat(pulls_file).st_mtime > 24 * 60 * 60 ):
pulls = {}
url = ("https://api.github.com/repos/" + repo +
"/pulls?state=all&access_token=" + token )
while url:
github = urllib2.Request(url=url)
f = urllib2.urlopen(github)
for pull in json.loads(f.read()):
pulls[pull['number']] = pull
url = None
for link in f.info()['Link'].split(','):
if 'rel="next"' in link:
m = re.search('<(.*)>', link)
if m:
url = m.group(1)
with open(pulls_file, 'w') as f:
json.dump(pulls, f)
else:
with open(pulls_file, 'r') as f:
pulls = json.load(f)
return pulls
For instance
pulls = get_pull_requests('ceph/ceph', '64933d355fda984108b4aad2c5cd4c4a224aad')
The same pagination logic applies to all API calls (see Web Linking RFC 5988 for more information) and parsing could use the LinkHeader module instead of rudimentary regexp parsing.

