Sharing hard drives with Ceph

A group of users give hard drives to the system administrator of the Ceph cluster. In exchange, each of them get credentials to access a dedicated pool of a given size from the Ceph cluster.

The system administrator runs:

# ceph-authtool jean-keyring --create-keyring --name client.jean \
   --gen-key --set-uid 458 \
   --cap mon 'allow profile simple-rados-client' \
   --cap osd 'allow rwx pool=jean-pool'
creating jean-keyring
# ceph auth import --in-file jean-keyring
imported keyring
# ceph auth get client.jean
exported keyring for client.jean
[client.jean]
        key = AQCziVZT6EJoIRAA/HVxueyPmRGjvqQeOR40hQ==
        auid = 458
        caps mon = "allow profile simple-rados-client"
        caps osd = "allow rwx pool=jean-pool"

which creates the user client.jean in the Ceph cluster, with limited access to the monitors ( simple-rados-client ) and read write access to the OSDs but only for the benefit of accessing the (not yet existent), pool jean-pool. The pool is then created with:

# cat > create-pool-auid.py <<EOF
import rados
import sys
cluster = rados.Rados(conffile = '/etc/ceph/ceph.conf')
cluster.connect()
cluster.create_pool(sys.argv[1], int(sys.argv[2]))
EOF
# python create-pool-auid.py jean-pool 458
# ceph osd pool set-quota jean-pool max_bytes $((1024 * 1024 * 1024))
set-quota max_bytes = 1073741824 for pool jean-pool

The python API is used to set jean as the owner of the pool, via the auid value 458 that was associated to it when it was created. The quota of the pool is set to 1GB and writes will fail when it is reached:

rados  put --pool jean-pool GROUP /etc/group
error putting jean-pool/GROUP: No space left on device

The user is provided with the keyring that was just created and a ceph.conf file with the list of monitors to access the cluster:

[global]
auth_service_required = cephx
fsid = 8790ab57-f06f-4b27-8507-55c8d59e1327
auth_supported = cephx
auth_cluster_required = cephx
mon_host = 10.89.0.2
auth_client_required = cephx

The user can then create an RBD volume with:

# rbd --name client.jean --keyring jean-keyring --pool jean-pool create --size 100 vda
# rbd --name client.jean --keyring jean-keyring --pool jean-pool info vda
rbd image 'vda':
        size 102400 kB in 25 objects
        order 22 (4096 kB objects)
        block_name_prefix: rb.0.10f5.74b0dc51
        format: 1

It is then mapped as a block device with:

# rbd --name client.jean --keyring jean-keyring --pool jean-pool map vda
# dmesg
...
  232.099642] Key type ceph registered
[  232.099695] libceph: loaded (mon/osd proto 15/24)
[  232.100879] rbd: loaded rbd (rados block device)
[  232.102434] libceph: client4399 fsid 8790ab57-f06f-4b27-8507-55c8d59e1327
[  232.102971] libceph: mon0 10.89.0.2:6789 session established
[  232.159177]  rbd1: unknown partition table
[  232.159230] rbd: rbd1: added with size 0x6400000
# ls -l /dev/rbd1
brw-rw---- 1 root disk 251, 0 Apr 22 17:49 /dev/rbd1

and can be formatted and mounted as a file system with:

# mkfs /dev/rbd1
mke2fs 1.42.9 (4-Feb-2014)
Filesystem label=
OS type: Linux
Block size=1024 (log=0)
Fragment size=1024 (log=0)
Stride=4096 blocks, Stripe width=4096 blocks
25688 inodes, 102400 blocks
5120 blocks (5.00%) reserved for the super user
First data block=1
Maximum filesystem blocks=67371008
13 block groups
8192 blocks per group, 8192 fragments per group
1976 inodes per group
Superblock backups stored on blocks:
        8193, 24577, 40961, 57345, 73729
Allocating group tables: done
Writing inode tables: done
Writing superblocks and filesystem accounting information: done
# mount /dev/rbd1 /mnt
# df -h /mnt
Filesystem      Size  Used Avail Use% Mounted on
/dev/rbd1        97M  1.6M   91M   2% /mnt

Note: The proposed change to automatically set auid with when the pool is created is too intrusive. Alternatively, a ceph osd pool set auid is proposed to provide a way to set auid using a shell command line instead of python code.

6 Replies to “Sharing hard drives with Ceph”

  1. Hi, nice to see this being possible and working.

    However, this turns me worried at the same time because I see Ceph following the same path as all other major software vendors. At their core are good intentions to move IT forward with new refined technology. But in the end, they fail again to have computers manage simple tasks like the one above without getting “admin bound”.

    The job is quite simple:

    “A group of users give hard drives to the system administrator of the Ceph cluster. In exchange, each of them get credentials to access a dedicated pool of a given size from the Ceph cluster.”

    However, one needs an expert to achieve a rather simple goal.

    Working with an Oracle database is quite similar. You need some simple job done and one ends up calling the database administrator to solve the problem with cryptic scripts.

    Linux is just the same, what one wants to achieve is simple, but the knowledge required to get the job done is sky rocketing.

    So, we turn from cpu bound > io bound > admin bound.

    My big hope is that Ceph, which has all my support, gets us out of turning “admin bound”

    1. What about this : “A user gives a hard drive and email to someone. (S)He receives a mail with instructions to access an online Ceph pool of X bytes.”. This does not remove the need for someone to plug the drive in. But from the user point of view, it would be a simple exchange : give hard drive, take online Ceph pool.

    2. If you integrate everything in one tool, you get a tool that is bad at everything.

      I’d rather have tool that can do one thing well (data storage) and manage tasks like adding new devices or permissions via tools that do that task well, like Puppet or Chef + some simple frontend for registering new nodes.

      Yes, still admin has to do it, but only once, not per user

      1. This could indeed be supported by a puppet, chef etc. modules. However, it is not in the scope of the existing Ceph puppet, chef, etc. modules.

  2. I installed ceph two nodes, 2 mon 2 osd in xfs, also used the RBD and mount the pool on two different ceph host and when I write data through one of the hosts at the other I do not see the data, what’s wrong?

Comments are closed.