Resizeable and resilient mail storage with Ceph

A common use case for Ceph is to store mail as objects using the S3 API. Although most mail servers do not yet support such a storage backend, deploying them on Ceph block devices is a beneficial first step. The disk can be resized live, while the mail server is running, and will remain available even when a machine goes down. In the following example it gains ~100GB every 30 seconds:

$ df -h /data/bluemind
Filesystem      Size  Used Avail Use% Mounted on
/dev/rbd2       1.9T  196M  1.9T   1% /data/bluemind
$ sleep 30 ; df -h /data/bluemind
Filesystem      Size  Used Avail Use% Mounted on
/dev/rbd2       2.0T  199M  2.0T   1% /data/bluemind

When the mail system is upgraded to a S3 capable mail storage backend, it will be able to use the Ceph S3 API right away: Ceph uses the same underlying servers for both purposes ( block and object storage ).

As an example, Ceph provides a 256GB RBD disk to store BlueMind mails. The same example is described on the BlueMind blog.

/dev/rbd2   256G  188M  256G   1% /data/bluemind

A daemon could notice when it gets over 80% full, and resized live to double its size with

(( size = 2 * 256 * 1024 )) ; rbd resize --size $size bluemind
resiz2fs /dev/rbd2

The live file system will grow at a rate of 3GB/s ( using a set of machines to support Ceph that are located within the same rack and connected with a cluster and user network at 10Gb/s ).
Growing from 1TB to 10TB takes approximately 1h30.

Filesystem at /dev/rbd2 is mounted on /data/bluemind;
   on-line resizing required
old_desc_blocks = 174, new_desc_blocks = 640
Performing an on-line resize of /dev/rbd2 to 2684354560 (4k) blocks.
The filesystem on /dev/rbd2 is now 2684354560 blocks long.
1.13user 555.98system 1:14:59elapsed 12%CPU
   (0avgtext+0avgdata 1708816maxresident)k
477664inputs+16outputs (0major+107567minor)pagefaults 0swaps
Filesystem      Size  Used Avail Use% Mounted on
/dev/rbd2        10T  165M   10T   1% /data/bluemind

The ceph RBD pool is configured with three replicas : the /data/bluemind block device will remain available even if two machines go down.