Mirror github pull requests locally

Each GitHub pull request is associated with a reference in the target repository. For instance the commit sent to pull request 3948 is the reference refs/pull/3948/head. If GitHub successfully merges the pull request in the target branch, another reference is associated with it, for instance refs/pull/3948/merge.
It is convenient to mirror pull requests in local branches, for instance to trigger GitLab CI or jenkins git-plugin because they only react to commits that belong to branches, not orphaned references. The tip of the branch can be named pull/XXX and its tip reset to the matching merge reference.
The tip of the branch could be set to the head reference but it would be less effective than using the merge because it is based on an older version of the target branch. Whatever test runs on the head, it may fail although it could succeed after the merge.
GitHub does not set the merge and the head reference atomically: the head can be set before GitHub even tries to merge. Similarly, when a pull request is rebased and forced push, there is a window of opportunity for the merge reference to still be about the previous head.
The following shell function takes care of all these border cases and keeps an up to date set of branches accurately reflecting all pull requests from a GitHub repository:

function import_pull_requests() {
    local remote=$1

    local ref
    local remote_head

    git fetch $remote +refs/pull/*:refs/remotes/$remote/pull/*

    git for-each-ref \
        --sort='-committerdate' \
        --format='%(refname) %(objectname)' \
        refs/remotes/$remote/pull/*/head | \
        while read ref remote_head ; do

        local pr=$(echo $ref | perl -pe 's:.*/pull/(.*)/head:$1:')

        # ignore pull requests that cannot merge
        local merge=$(git rev-parse --quiet --verify
        test -z "$merge" && continue

        # ignore pull requests for which the merge does not match the
        # remote head, most likely because it has not been updated yet
        # after a rebase was pushed
        local merged_head=$(git rev-parse
        test "$merged_head" != "$remote_head" && continue

        # nothing to do if the head did not change since we last saw
        # it
        local local_head=$(git rev-parse --quiet --verify
        test "$remote_head" = "$local_head" && continue

        # remember the head for the next round
        git update-ref refs/pull/$pr/head $remote_head

        # create/update a branch with the successfull merge of the
        # head
        git update-ref refs/heads/pull/$pr $merge
        echo branch pull/$pr

Download the function and the associated tests

How was a cherry-pick conflict resolved ?

When a git cherry-pick fails because of a conflict, it can be resolved and committed. The reviewer is reminded that a conflict had to be resolved by the Conflicts section at the end of the message body:

commit 7b8e5c99a4a40ae788ad29e36b0d714f529b12eb
Author: John Spray 
Date:   Tue May 20 16:25:19 2014 +0100
    Signed-off-by: John Spray 
    (cherry picked from commit 1d9e4ac2e2bedfd40ee2d91a4a6098150af9b5df)

The difference between the original commit and the cherry-picked commit including the conflict resolution can be displayed with:

picked_from=$(git show --no-patch --pretty=%b $commit  |
  perl -ne 'print if(s/.*cherry picked from commit (\w+).*/$1/)')
diff -u --ignore-matching-lines '^[^+-]' \
   < (git show $picked_from) <(git show $commit)

Continue reading "How was a cherry-pick conflict resolved ?"

How to display the commits from a merged branch ?

A branch gmock was proposed as pull request 483 on GitHub, accepted, merged into master and deleted. It had two commits:

  • bf05ec1 tests: replace existing gtest 1.5.0
  • 5cbe0c5 gmock: use Google C++ Mocking

In GitHub, the reference pull/483/head is preserved and points to the last commit of the branch that no longer exists.

    master +
    849afe + merged and deleted branch gmock
    d5cb91 + \
           |  + bf05ec1 tests: replace existing gtest 1.5.0 (pull/483/head)
    5301b2 +  |
           |  + 5cbe0c5 gmock: use Google C++ Mocking
           | /
    5a6549 +

To list the commits that were merged we need to find the commit in master that is immediately after the former branch started.

base=$(git rev-list --topo-order master ^pull/483/head | tail -1)

The base variable contains 5301b2 which is the first commit of master that is not reachable from pull/483/head, in topological order instead of the default chronological order.

$ git rev-list --oneline $base^..pull/483/head
bf05ec1 tests: replace existing gtest 1.5.0
5cbe0c5 gmock: use Google C++ Mocking

Displays the commits of the former gmock branch. Note the ^ after $base that means the first parent of $base. If the graph is as shown above, it does not make a difference. But if it is as follows:

    master +
    849afe + merged and deleted branch gmock
           | \
           |  + bf05ec1 tests: replace existing gtest 1.5.0 (pull/483/head)
           |  |
           |  + 5cbe0c5 gmock: use Google C++ Mocking
           | /
    5a6549 +

Then $base would be 849afe and git rev-list –oneline $base..pull/483/head (without the ^) would display nothing because pull/483/head is reachable from $base. Since $base^ is the first parent (i.e. the left parent on the graph above), it is 5a6549 and we get the desired result.

Using git bisect with Ceph

When investingating a a problem using the latest Ceph sources, it was discovered that the problem only shows in the master branch and appeared after the v0.85 tag. The following script reproduces the problem and logs the result:

$ cat try.sh
cd src
log=$(git describe)
echo $log.log
make -j4 >& $log.log
rm -fr dev out ;  mkdir -p dev
MDS=1 MON=1 OSD=3 timeout 120 ./vstart.sh \
  -o 'paxos propose interval = 0.01' \
  -n -l mon osd mds >> $log.log 2>&1
exit $status

It can be used with git bisect to find the revision in which it first appeared.

$ git bisect start # initialize the search
$ git bisect bad origin/master # the problem happens
$ git bisect good tags/v0.85 # the problem does not happen
$ git bisect skip $(git log --format='%H' --no-merges tags/v0.85..origin/master)
$ git bisect run try.sh # binary search in tags/v0.85..origin/master
running try.sh
Bisecting: 339 revisions left to test after this (roughly 8 steps)
[ef006ae] Merge pull request #2658 from athanatos/wip-9625
running try.sh
Bisecting: 169 revisions left to test after this (roughly 7 steps)
[fa0bd06] ceph-disk: bootstrap-osd keyring ignores --statedir
running try.sh
d15ecafea4 is the first bad commit
commit d15eca
Author: John Spray 
Date:   Fri Sep 26 17:24:12 2014 +0100
    vstart: create fewer pgs for fs pools
:040000 040000 f42a324a8
 aa64cdc1ed3 M	src
bisect run success

The git bisect skip excludes all non merge commits from the search. The branches are carefully tested before being merged and are, at least, known to pass make check successfully. The individual commits within a branch are unlikely to pass make check and some of them may not even compile.
The information displayed by git bisect run is terse when it ends with skipped commits:

Bisecting: 39 revisions left to test after this (roughly 5 steps)
[083c2e42c663229ce505f74c40d8261ca530a79b] Merge pull request #6565 from chenji-kael/patch-1
running /home/loic/ceph-centos-7-loic/try.sh
There are only 'skip'ped commits left to test.
The first bad commit could be any of:

To get more detailed, git bisect log can be used:

# good: [b584388ce9ce998c99e219ec144725beaf09ab28] Merge pull request #6489 from xiexingguo/xxg-wip-13715
git bisect good b584388ce9ce998c99e219ec144725beaf09ab28
# bad: [5135292d9557269bab5cefc98d39606174aa6ebe] Merge branch 'wip-bigbang'
git bisect bad 5135292d9557269bab5cefc98d39606174aa6ebe
# good: [f3e88ace74c896c72f6e8485c44c7432f298d887] Merge remote-tracking branch 'gh/jewel'
git bisect good f3e88ace74c896c72f6e8485c44c7432f298d887
# good: [083c2e42c663229ce505f74c40d8261ca530a79b] Merge pull request #6565 from chenji-kael/patch-1
git bisect good 083c2e42c663229ce505f74c40d8261ca530a79b
# only skipped commits left to test
# possible first bad commit: [5135292d9557269bab5cefc98d39606174aa6ebe] Merge branch 'wip-bigbang'
# possible first bad commit: [9aabc8a9b8d7775337716c4e0fa3cc53938acb45] test/mon/osd-crush.sh: escape ceph tell mon.*
# possible first bad commit: [72edab282343e8509b387f92d05fc4d6ae96b25b] osd: make some of the pg_temp methods/fields private
# possible first bad commit: [987f68a8df292668ad241f4769d82792644454dd] osdc/Objecter: call notify completion only once
# possible first bad commit: [d201c6d93f40affe72d940605c8786247451d3e5] mon: change mon_osd_min_down_reporters from 1 -> 2

The footprints of 192 Ceph developers

Gource is run on the Ceph git repository for each of the 192 developers who contributed to its development over the past six years. Their footprint is the last image of a video clip created from all the commits they authored.

video clip
Sage Weil

video clip
Yehuda Sadeh

video clip
Greg Farnum

video clip
Samuel Just

video clip
Colin P. McCabe

video clip
Danny Al-Gaaf

video clip
Josh Durgin

video clip
John Wilkins

video clip
Loic Dachary

video clip
Dan Mick

Continue reading “The footprints of 192 Ceph developers”

Organization mapping and Reviewed-by statistics with git

shortlog is convenient to print a leader board counting contributions. For instance to display the top ten commiters of Ceph over the past year:

$ git shortlog --since='1 year' --no-merges -nes | nl | head -10
     1	  1890	Sage Weil <sage@inktank.com>
     2	   805	Danny Al-Gaaf <danny.al-gaaf@bisect.de>
     3	   491	Samuel Just <sam.just@inktank.com>
     4	   462	Yehuda Sadeh <yehuda@inktank.com>
     5	   443	John Wilkins <john.wilkins@inktank.com>
     6	   303	Greg Farnum <greg@inktank.com>
     7	   288	Dan Mick <dan.mick@inktank.com>
     8	   274	Loic Dachary <loic@dachary.org>
     9	   219	Yan, Zheng <zheng.z.yan@intel.com>
    10	   214	João Eduardo Luís <joao.luis@inktank.com>

To get the same output for reviewers over the past year, assuming the Reviewed-by is set consistently in the commit messages, the following can be used:

git log  --since='1 year' --pretty=%b | \
 perl -n -e 'print "$_\n" if(s/^\s*Reviewed-by:\s*(.*<.*>)\s*$/\1/)'  | \
 git check-mailmap --stdin | \
 sort | uniq -c | sort -rn | nl | head -10
     1	    652 Sage Weil <sage@inktank.com>
     2	    265 Greg Farnum <greg@inktank.com>
     3	    185 Samuel Just <sam.just@inktank.com>
     4	    106 Josh Durgin <josh.durgin@inktank.com>
     5	     95 João Eduardo Luís <joao.luis@inktank.com>
     6	     95 Dan Mick <dan.mick@inktank.com>
     7	     69 Yehuda Sadeh <yehuda@inktank.com>
     8	     46 David Zafman <david.zafman@inktank.com>
     9	     36 Loic Dachary <loic@dachary.org>
    10	     21 Gary Lowell <gary.lowell@inktank.com>

The body of the commit messages ( –pretty=%b ) is displayed for commits from the past year ( –since=’1 year’ ). perl reads an does not print anything ( -n ) unless it finds a Reviewed-by: string followed by what looks like First Last <mail@dot.com> ( ^\s*Reviewed-by:\s*(.*<.*>)\s*$ ). The authors found are remapped to fix typos ( git check-mailmap –stdin ).
The authors can further be remapped into the organization to which they are affiliated using the .organizationmap file which has the same format as the .mailmap file, only remapping normalized author names to organization names with git -c mailmap.file=.organizationmap check-mailmap –stdin

git log  --since='1 year' --pretty=%b | \
 perl -n -e 'print "$_\n" if(s/^\s*Reviewed-by:\s*(.*<.*>)\s*$/\1/)'  | \
 git check-mailmap --stdin | \
 git -c mailmap.file=.organizationmap check-mailmap --stdin | \
 sort | uniq -c | sort -rn | nl | head -10
     1	   1572 Inktank <contact@inktank.com>
     2	     39 Cloudwatt <libre.licensing@cloudwatt.com>
     3	      7 Intel <contact@intel.com>
     4	      4 University of California, Santa Cruz <contact@cs.ucsc.edu>
     5	      4 Roald van Loon Consultancy <roald@roaldvanloon.nl>
     6	      2 CERN <contact@cern.ch>
     7	      1 SUSE <contact@suse.com>
     8	      1 Mark Kirkwood <mark.kirkwood@catalyst.net.nz>
     9	      1 IWeb <contact@iweb.com>
    10	      1 Gaudenz Steinlin <gaudenz@debian.org>

tip to review a large patch with magit and ediff

When reviewing a large changeset with magit, it can be difficult to separate meaningfull changes from purely syntactic substitutions. Using ediff to navigate the patch highlights the words changed between two lines instead of just showing a line removed and another added.

In the above screenshot the oid change to oid.hobj is a syntactic change where the new block after oid.generation != ghobject_t::NO_GEN deserves more attention.
Continue reading “tip to review a large patch with magit and ediff”