October 2014 – Loïc Dachary

October 29, 2014

Teuthology docker targets hack (1/5)

teuthology runs jobs testing the Ceph integration on targets that can either be virtual machines or bare metal. The container hack adds support for docker containers as a replacement.

...
Running task exec...
Executing custom commands...
Running commands on role mon.a host container002
running sudo 'TESTDIR=/home/ubuntu/cephtest' bash '-c' '/bin/true'
running docker exec container002 bash /tmp/tmp/tmptJ7hxa
Duration was 0.088931 seconds
...

Continue reading “Teuthology docker targets hack (1/5)”

October 19, 2014

Ceph make -j8 check in less than 3mn

The Ceph sources contain tests that can be run with make check. As of v0.85 then can only be run sequentially because some tests bind the same ports and use the same files. It takes around 18 minutes on a spinner and 12 minutes on a SSD because of some I/O intensive ones. It becomes problematic because it’s a long time to wait when adding code and trying to validate it works and also because it keeps increasing as more tests are added. It’s also a recurring frustration because they conflict with vstart.sh cluster running for manual testing.
The tests have been reworked to ensure that none of them use the the same port or the same files. It reduces the time to 12mn on a spinner and around 2mn on a SSD with make -j8 check.

[loic@rex001 src]$ time make -j8 check
make[4]: Entering directory `/home/loic/ceph/src'
./check_version ./.git_version
...
make[1]: Leaving directory `/home/loic/ceph/src'

real    2m21.907s
user    5m45.958s
sys     1m50.431s

A number of tests such as qa/workunits/cephtool/test.sh take a long time but do not require much CPU or disk I/O. When on a 4 core machine setting -j8 gives a chance for these tests to run while more CPU intensive tests are using most of the CPU power.
Using larger values (for instance -j16) does not help much because a few tests take around 3mn to run anyway.

October 17, 2014

Using git bisect with Ceph

When investingating a a problem using the latest Ceph sources, it was discovered that the problem only shows in the master branch and appeared after the v0.85 tag. The following script reproduces the problem and logs the result:

$ cat try.sh
#!/bin/bash
cd src
log=$(git describe)
echo $log.log
make -j4 >& $log.log
rm -fr dev out ;  mkdir -p dev
MDS=1 MON=1 OSD=3 timeout 120 ./vstart.sh \
  -o 'paxos propose interval = 0.01' \
  -n -l mon osd mds >> $log.log 2>&1
status=$?
./stop.sh
exit $status

It can be used with git bisect to find the revision in which it first appeared.

$ git bisect start # initialize the search
$ git bisect bad origin/master # the problem happens
$ git bisect good tags/v0.85 # the problem does not happen
$ git bisect skip $(git log --format='%H' --no-merges tags/v0.85..origin/master)
$ git bisect run try.sh # binary search in tags/v0.85..origin/master
running try.sh
v0.85-679-g8d3f135.log
Bisecting: 339 revisions left to test after this (roughly 8 steps)
[ef006ae] Merge pull request #2658 from athanatos/wip-9625
running try.sh
v0.86-27-gef006ae.log
Bisecting: 169 revisions left to test after this (roughly 7 steps)
[fa0bd06] ceph-disk: bootstrap-osd keyring ignores --statedir
running try.sh
v0.85-1116-gfa0bd06.log
...
v0.86-263-g5f6589c.log
d15ecafea4 is the first bad commit
commit d15eca
Author: John Spray 
Date:   Fri Sep 26 17:24:12 2014 +0100
    vstart: create fewer pgs for fs pools
:040000 040000 f42a324a8
 aa64cdc1ed3 M	src
bisect run success

The git bisect skip excludes all non merge commits from the search. The branches are carefully tested before being merged and are, at least, known to pass make check successfully. The individual commits within a branch are unlikely to pass make check and some of them may not even compile.
The information displayed by git bisect run is terse when it ends with skipped commits:

Bisecting: 39 revisions left to test after this (roughly 5 steps)
[083c2e42c663229ce505f74c40d8261ca530a79b] Merge pull request #6565 from chenji-kael/patch-1
running /home/loic/ceph-centos-7-loic/try.sh
v9.2.0-920-g083c2e4.log
There are only 'skip'ped commits left to test.
The first bad commit could be any of:
536c70281a8952358e8d88a6ff8d7cd9b8db5a76

To get more detailed, git bisect log can be used:

# good: [b584388ce9ce998c99e219ec144725beaf09ab28] Merge pull request #6489 from xiexingguo/xxg-wip-13715
git bisect good b584388ce9ce998c99e219ec144725beaf09ab28
# bad: [5135292d9557269bab5cefc98d39606174aa6ebe] Merge branch 'wip-bigbang'
git bisect bad 5135292d9557269bab5cefc98d39606174aa6ebe
# good: [f3e88ace74c896c72f6e8485c44c7432f298d887] Merge remote-tracking branch 'gh/jewel'
git bisect good f3e88ace74c896c72f6e8485c44c7432f298d887
# good: [083c2e42c663229ce505f74c40d8261ca530a79b] Merge pull request #6565 from chenji-kael/patch-1
git bisect good 083c2e42c663229ce505f74c40d8261ca530a79b
# only skipped commits left to test
# possible first bad commit: [5135292d9557269bab5cefc98d39606174aa6ebe] Merge branch 'wip-bigbang'
# possible first bad commit: [9aabc8a9b8d7775337716c4e0fa3cc53938acb45] test/mon/osd-crush.sh: escape ceph tell mon.*
# possible first bad commit: [72edab282343e8509b387f92d05fc4d6ae96b25b] osd: make some of the pg_temp methods/fields private
# possible first bad commit: [987f68a8df292668ad241f4769d82792644454dd] osdc/Objecter: call notify completion only once
# possible first bad commit: [d201c6d93f40affe72d940605c8786247451d3e5] mon: change mon_osd_min_down_reporters from 1 -> 2

October 15, 2014

Manual bootstrap of a Ceph MON on Ubuntu 14.04

A Ceph MON can be created and run manually for test purposes on Ubuntu-14.04 with:

$ sudo apt-get install ceph
$ cat > /etc/ceph/ceph.conf <<EOF
[global]
fsid = $(uuidgen)
mon_host = 127.0.0.1
auth_cluster_required = none
auth_service_required = none
auth_client_required = none
filestore_xattr_use_omap = true
EOF
$ sudo ceph-mon --cluster ceph --mkfs -i a --keyring /dev/null
ceph-mon: mon.noname-a 127.0.0.1:6789/0 is local, renaming to mon.a
ceph-mon: set fsid to 80562a76-f13e-4b1e-8fd1-de8f774f2683
ceph-mon: created monfs at /var/lib/ceph/mon/ceph-a for mon.a
$ sudo ceph-mon -i a

the cluster is not healthy because it has no OSD but it is available:

    cluster 1b5ef3ac-be8c-4658-8568-bd090b534b19
     health HEALTH_ERR 192 pgs stuck inactive; 192 pgs stuck unclean; no osds
     monmap e1: 1 mons at {a=127.0.0.1:6789/0}, election epoch 2, quorum 0 a
     osdmap e1: 0 osds: 0 up, 0 in
      pgmap v2: 192 pgs, 3 pools, 0 bytes data, 0 objects
            0 kB used, 0 kB / 0 kB avail
                 192 creating

October 11, 2014

Manual bootstrap of a Ceph MON on RHEL7

A Ceph MON can be created and run manually for test purposes on RHEL7 with:

$ sudo yum install ceph
$ cat > /etc/ceph/ceph.conf <<EOF
[global]
fsid = $(uuidgen)
mon_host = 127.0.0.1
auth_cluster_required = none
auth_service_required = none
auth_client_required = none
filestore_xattr_use_omap = true
EOF
$ sudo ceph-mon --cluster ceph --mkfs -i a --keyring /dev/null
ceph-mon: mon.noname-a 127.0.0.1:6789/0 is local, renaming to mon.a
ceph-mon: set fsid to 80562a76-f13e-4b1e-8fd1-de8f774f2683
ceph-mon: created monfs at /var/lib/ceph/mon/ceph-a for mon.a
$ sudo touch /var/lib/ceph/mon/ceph-a/sysvinit
$ sudo service ceph start mon.a
=== mon.a ===
Starting Ceph mon.a on mira042...
Running as unit run-7661.service.
Starting ceph-create-keys on mira042...

the cluster is not healthy because it has no OSD but it is available:

$ ceph -s
    cluster 80562a76-f13e-4b1e-8fd1-de8f774f2683
     health HEALTH_ERR 192 pgs stuck inactive; 192 pgs stuck unclean; no osds
     monmap e1: 1 mons at {a=127.0.0.1:6789/0}, election epoch 2, quorum 0 a
     osdmap e1: 0 osds: 0 up, 0 in
      pgmap v2: 192 pgs, 3 pools, 0 bytes data, 0 objects
            0 kB used, 0 kB / 0 kB avail
                 192 creating

October 9, 2014

Testing ceph-disk with block devices in docker

The Ceph command to setup a block device ( ceph-disk) needs to call partprobe after zapping a disk. The patch adding the partprobe call needs a block device to test that it works as expected. The body of the test requires root privileges:

 dd if=/dev/zero of=vde.disk bs=1024k count=200
 losetup --find vde.disk
 local disk=$(losetup --associated vde.disk | cut -f1 -d:)
 ./ceph-disk zap $disk

which is potentially dangerous for the developer machine. The run of the test is delegated to a docker container so that accidentally removing /var/run has no consequence. Although the test recompiles Ceph the first time:

main_docker "$@" --compile
main_docker "$@" --user root --dev test/ceph-disk.sh test_activate_dev

it will reuse the compilation results if run a second time. Unless there is a new commit in which case it will recompile whatever make decides. The ceph-disk-root.sh script is added to the list of scripts that are run on make check but will only be considered if –enable-docker has been given to ./configure and docker is available. Otherwise it will be silently ignored

if ENABLE_DOCKER
check_SCRIPTS += \
 test/ceph-disk-root.sh
endif

Continue reading “Testing ceph-disk with block devices in docker”

October 8, 2014

Ceph make check in docker

After Ceph is built from sources, unit and functional tests can be run with make check. Delegating the execution to a container makes it possible to:

keep working on the sources without disrupting the run
run functional tests that require root privileges without modifying the development environment
check various operating systems

The src/test/docker-test-helper.sh library can be used from the command line:

$ test/docker-test.sh --os-type ubuntu --os-version 14.04 make check &
$ test/docker-test.sh --os-type centos --os-version 7 make check &

Each run uses a clone of the current repository and pulls from origin before executing the command. For instance, if running from /srv/ceph, the centos run will run make check in /srv/ceph-centos-7 which is bind mounted in the container. A possible workflow is:

work
commit
test/docker-test.sh make check which pulls the latest commits
keep working
check the make check output

In case an error happens, debugging starts by running a shell in the container

$ test/docker-test.sh --os-type ubuntu --os-version 14.04 --shell
remote: Counting objects: 10, done.
remote: Compressing objects: 100% (10/10), done.
remote: Total 10 (delta 8), reused 0 (delta 0)
Unpacking objects: 100% (10/10), done.
From /home/loic/software/ceph/ceph
 + 15046fe...8a39cad wip-9665 -> origin/wip-9665
HEAD is now at 8a39cad autotools: add --enable-docker
loic@203c085f3dc1:/srv/ceph-ubuntu-14.04$

The first time test/docker-test.sh runs, it creates a docker images populated with the packages necessary to compile and run Ceph. This lowers the overhead to run a test in the container:

$ time test/docker-test.sh --os-type ubuntu --os-version 14.04 unittest_str_map
HEAD is now at 8a39cad autotools: add --enable-docker
Running main() from gtest_main.cc
[==========] Running 2 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 2 tests from str_map
[ RUN      ] str_map.json
[       OK ] str_map.json (1 ms)
[ RUN      ] str_map.plaintext
[       OK ] str_map.plaintext (0 ms)
[----------] 2 tests from str_map (1 ms total)

[----------] Global test environment tear-down
[==========] 2 tests from 1 test case ran. (1 ms total)
[  PASSED  ] 2 tests.

real	0m3.340s
user	0m0.071s
sys	0m0.046s

October 7, 2014

Lowering Ceph scrub I/O priority

Note: the following does not currently work in Firefly because of http://tracker.ceph.com/issues/9677 . It has been backported to Firefly and will likely be in 0.80.8.

The disk I/O of a Ceph OSD thread scrubbing is the same as all other threads by default. It can be lowered with ioprio options for all OSDs with:

ceph tell osd.* injectargs '--osd_disk_thread_ioprio_priority 7'
ceph tell osd.* injectargs '--osd_disk_thread_ioprio_class idle'

All other threads in the OSD will be be (best effort) with priority 4 which is the default for daemons. The disk thread will show as idle:

$ sudo iotop --batch --iter 1 | grep 'ceph-osd -i 0' | grep -v be/4
 4156 idle loic        0.00 B/s    0.00 B/s  0.00 %  0.00 % ./ceph-osd -i 0 ..

Continue reading “Lowering Ceph scrub I/O priority”

October 2, 2014

Running Ceph with the tcmalloc heap profiler

When running a Ceph cluster from sources, the tcmalloc heap profiler can be started for all daemons with:

CEPH_HEAP_PROFILER_INIT=true \
  CEPH_NUM_MON=1 CEPH_NUM_OSD=3 \
  ./vstart.sh -n -X -l mon osd

The osd.0 stats can be displayed with

$ ceph tell osd.0 heap stats
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
osd.0tcmalloc heap stats:------------------------------------------------
MALLOC:        6084984 (    5.8 MiB) Bytes in use by application
MALLOC: +       180224 (    0.2 MiB) Bytes in page heap freelist
MALLOC: +      1430776 (    1.4 MiB) Bytes in central cache freelist
MALLOC: +      7402112 (    7.1 MiB) Bytes in transfer cache freelist
MALLOC: +      5873424 (    5.6 MiB) Bytes in thread cache freelists
MALLOC: +      1290392 (    1.2 MiB) Bytes in malloc metadata
MALLOC:   ------------
MALLOC: =     22261912 (   21.2 MiB) Actual memory used (physical + swap)
MALLOC: +            0 (    0.0 MiB) Bytes released to OS (aka unmapped)
MALLOC:   ------------
MALLOC: =     22261912 (   21.2 MiB) Virtual address space used
MALLOC:
MALLOC:           1212              Spans in use
MALLOC:             65              Thread heaps in use
MALLOC:           8192              Tcmalloc page size
------------------------------------------------

See the Ceph memory profiling documentation for more information.