Ceph use case : proteomic analysis

The UMR de Génétique Végétale is a state funded french research facility located in a historical monument: a large farm surrounded by fields harvested for experiments. Johann Joets was assigned a most unusual mission : setup a state of the art datacenter in a former pigsty. His colleague Olivier Langella does the system administration for PAPPSO and researched an extendable storage solution able to transparently sustain the loss of any hardware component. A simple Ceph setup was chosen and is in use since 2012.

The roof above the pigsty turned into machine room
R515 Ceph node and switch on top


Early 2012 Olivier Langella setup a 45TB Ceph cluster from there. It is used to write the results obtained daily from a mass spectrometer. The data is serviced by fuse cephfs and legacy software rely on its posix interface to transform the raw input, compare it with known protein databases and compute jobs submitted by researchers using Condor. The cluster is setup so that it can sustain the loss of one of the three machines. Processed data is archived in a read-only btrfs file system supported by a 8TB RBD volume.

Mass spectrometer generating data stored in Ceph

The three machines have a few free disk slots which will be used to add a new OSD and increase the capacity of the Ceph cluster when needed. The 42U rack is half full and can be used to host more machines. The spectrometer creates about 500MB or raw data per hour, translated into 200MB XML files. It accumulates about 3TB of archived data every year.