Hadoop like computing with Ceph

Computation can be co-located on the machine where a Ceph object resides and access it from the local disk instead of going through the network. Noah Watkins explains it in great detail and it can be experimented with a Hello World example which calls the hello plugin included in the Emperor release.

After compiling Ceph from sources, run a test cluster in the source directory with

$ cd src
$ rm -fr dev out ;  mkdir -p dev
$ LC_ALL=C MON=1 OSD=3 bash -x ./vstart.sh -d -n -X -l mon osd

Check that it works:

$ ./ceph -s
    cluster 091a6854-924b-405c-ac6e-7fe05baaeb63
     health HEALTH_WARN too few pgs per osd (8 < min 10)
     monmap e1: 1 mons at {a=}, election epoch 2, quorum 0 a
     osdmap e9: 3 osds: 3 up, 3 in
      pgmap v49: 24 pgs, 3 pools, 0 bytes data, 0 objects
            463 GB used, 85400 MB / 547 GB avail
                  24 active+clean

Then go to the example directory and modify the makefile to point to the sources just compiled instead of relying on an installed version

$ cd ../examples/librados/
$ cat > Makefile
all: hello_world.cc
        g++ -I../../src/include -g -c hello_world.cc -o hello_world.o
        libtool --mode=link g++ -L../../src -g hello_world.o -lrados -o librados_hello_world
$ make

and run it with

$ cd ../../src
$ ../examples/librados/librados_hello_world --conf ceph.conf
we just set up a rados cluster object
we just parsed our config options
we just connected to the rados cluster
we just created a new pool named hello_world_pool
we just created an ioctx for our pool
we just wrote new object hello_object, with contents
hello world!
we read our object hello_object, and got back 0 bytes with contents
hello world!
we set the xattr 'version' on our object!
we overwrote our object hello_object with contents
hello world!v2
we just failed a write because the xattr wasn't as specified
we overwrote our object hello_object following an xattr test with contents
hello world!v3

The hello world example can then be adapted and tested locally. When ready, the plugin can be installed on each OSD of the the actual Ceph cluster at


It will be loaded the next time the OSD is restarted and be ready to process data locally.

One Reply to “Hadoop like computing with Ceph”

  1. The creation of cls_hello is a really great thing to have in Ceph. For the next set of Ceph blueprints I think it’s time to start thinking about how to make the out-of-tree construction and installation of object classes easier. I would also like to see at the same time RADOS support for capabilities. For instance, some users may manage _all_ objects in a pool via an object class interface. However, errant writes that do not go through that interface will cause issues. A pool should contain a set of capabilities that say “this pool rejects operations X,Y,Z, allows cls_blah” etc..

Comments are closed.