HOWTO valgrind Ceph with teuthology

Teuthology can run a designated daemon with valgrind and preserve the report for analysis. The notcmalloc flavor is preferred to silence valgrind errors unrelated to Ceph itself.

- install:
   project: ceph
   branch: wip-5510
   flavor: notcmalloc

A daemon running with valgrind is much slower and warnings will show in the logs that should be marked as non relevant in this context:

    log-whitelist:
    - slow request
    - clocks
    - wrongly marked me down
    - objects unfound and apparently lost

The first osd is marked to be run with valgrind:

- ceph:
    valgrind:
      osd.0: --tool=memcheck

After running teuthology with

./virtualenv/bin/teuthology -v --archive /tmp/wip-5510-valgrind \
  --owner loic@dachary.org \
  ~/private/ceph/targets.yaml \
  ~/private/ceph/wip-5510-valgrind.yaml

errors may show

DEBUG:teuthology.run_tasks:Exception was not quenched, exiting: Exception: saw valgrind issues
INFO:teuthology.run:Summary data:
{duration: 344.2433888912201, failure_reason: saw valgrind issues, flavor: notcmalloc,
  owner: loic@dachary.org, success: false}
INFO:teuthology.run:FAIL

And the valgrind XML report containing the details about the error can be retrieved from the /tmp/wip-5510-valgrind/remote/ubuntu@target1/log/valgrind/osd.0.log.gz

The complete teuthology file:

roles:
- - mon.a
  - osd.0
- - osd.1
  - client.0
tasks:
- install:
   project: ceph
   branch: wip-5510
   flavor: notcmalloc
- ceph:
    valgrind:
      osd.0: --tool=memcheck
    log-whitelist:
    - slow request
    - clocks
    - wrongly marked me down
    - objects unfound and apparently lost
- rados:
    clients: [client.0]
    objects: 500
    op_weights:
      delete: 10
      read: 45
      write: 45
      rollback: 50
      snap_create: 50
      snap_remove: 50
    ops: 40