Ceph disk requirements will be lower : a new backend is coming

When evaluating Ceph to run a new storage service, the replication factor only matters after the hardware provisionned from the start is almost full. It may happen months after the first user starts to store data. In the meantime a new storage backend ( erasure encoded ) reducing up to 50% of the hardware requirements is being developped in Ceph.

It does not matters to save disk from the beginning : it is not used anyway. The question is to figure out when the erasure encoded will be ready to double the usage value of the storage already in place.

When looking for a new storage solution the hardware requirements are an important factor. If Ceph is configured with three replicates, 1PB of usable storage requires 3PB of actual storage. The users are expected to occupy an increasing amount of disk space over time:


            ^
       10PB |
            |
            |
        6PB |
            |                                          /--
            |                                     /----
        4PB |                                /----
            |                           /----   usage
            |                      /----
        2PB |                 /----
            |             /---
            |        /----
            |   /----
            +----------------+----------------+------------>
                          A months          B months

Hardware provisioning is expected to follow the usage curve. In the following, 4PB are provisionned initialy, an additional 2PB after A months of operation etc.

            ^
       10PB |                                 +-----------
            |                                 |
            |                                 |
        6PB |                +----------------+
            |                |    provisioning         /---
            |                |                    /----
        4PB +----------------+               /----
            |                           /----  usage
            |                      /----
        2PB |                 /----
            |             /---
            |        /----
            |   /----
            +----------------+----------------+------------>
                          A months          B months

An erasure encoded Ceph backend could reduce the requirements for raw storage : 1PB of usable storage fits in 1.5PB of raw storage. If it was available the curve would not grow as fast and the need for provisioning more hardware would happen at a later time.


            ^
       10PB |
            |
            |
        6PB |                                    +---------
            |                                    |
            |                                    |
        4PB +------------------------------------+
            |                  provisioning
            |                                    /---------
        2PB |                          /---------  usage
            |                /---------
            |        /-------
            |   /----
            +----------------+----------------+------------>
                          A months          B months

The implementation of an erasure encoded backend for Ceph started in may 2012 and when it is released, it will progressively lower the disk space requirements. In the example above it will save money if it happens before A months. However, even if it happens later, it will still save money by reducing the storage footprint and make better use of the existing hardware.


            ^
       10PB |
            |
            |
        6PB |
            |
            |
        4PB +-----------------
            |
            |
        2PB |
            |
            |
            |
            +----------------+
                          A months

In any case, it does not save any money to have erasure encoding from the start because the provisionned hardware is completely empty. Up to A months, the investment to provision 4PB was done anyway.

6 Replies to “Ceph disk requirements will be lower : a new backend is coming”

  1. Hi Loic
    at what time the “a new backend is coming” service will come, does dumpling version have it?

    1. It will not be available for Dumpling. It is too soon to make any promise 🙂

      1. thanks for reply, can you tell me a probably time because my project’s demands is sensitive to the Storage Capacity . and i think the service is very useful.

  2. Thanks Loic , i found this blog to be very useful.
    I have a question : Can we take advantages of Erasure Code in all the formats of using ceph ex : RGW , cephfs , and RBD.

    I am keen to know how i can use earaure code in Ceph RBD that we uses for Openstack.

    Regards
    Karan

    1. Erasure Code and tiering can be used in all cases. Direct usage of an erasure coded pool is less straightforward because some operations such as partial writes are not implemented.

Comments are closed.