After a crushmap is changed (e.g. addition/removal of devices, modification of weights or tunables), objects may move from one device to another. The crush compare command can be used to show what would happen for a given rule and replication count. In the following example, two new OSDs are added to the crushmap, causing 22% of the objects to move from the existing OSDs to the new ones.
$ crush compare --rule firstn \ --replication-count 1 \ --origin before.json --destination after.json There are 1000 objects. Replacing the crushmap specified with --origin with the crushmap specified with --destination will move 229 objects (22.9% of the total) from one item to another. The rows below show the number of objects moved from the given item to each item named in the columns. The objects% at the end of the rows shows the percentage of the total number of objects that is moved away from this particular item. The last row shows the percentage of the total number of objects that is moved to the item named in the column. osd.8 osd.9 objects% osd.0 3 4 0.70% osd.1 1 3 0.40% osd.2 16 16 3.20% osd.3 19 21 4.00% osd.4 17 18 3.50% osd.5 18 23 4.10% osd.6 14 23 3.70% osd.7 14 19 3.30% objects% 10.20% 12.70% 22.90%
The crush compare command can also show the impact of a change in one or more “tunables”, such as setting chooseleaf_stable to 1.
$ diff -u original.json destination.json --- original.json 2017-03-14 23:41:47.334740845 +0100 +++ destination.json 2017-03-04 18:36:00.817610217 +0100 @@ -608,7 +608,7 @@ "choose_local_tries": 0, "choose_total_tries": 50, "chooseleaf_descend_once": 1, - "chooseleaf_stable": 0, + "chooseleaf_stable": 1, "chooseleaf_vary_r": 1, "straw_calc_version": 1 }
In the following example some columns were removed for brevity and replaced with dots. It shows that 33% of the objects will move after chooseleaf_stable is changed from 0 to 1. Each device will receive and send more than 1% and less than 3% of these objects.
$ crush compare --origin original.json --destination destination.json \ --rule replicated_ruleset --replication-count 3 There are 300000 objects. Replacing the crushmap specified with --origin with the crushmap specified with --destination will move 99882 objects (33.294% of the total) from one item to another. The rows below show the number of objects moved from the given item to each item named in the columns. The objects% at the end of the rows shows the percentage of the total number of objects that is moved away from this particular item. The last row shows the percentage of the total number of objects that is moved to the item named in the column. osd.0 osd.1 osd.11 osd.13 osd.20 ... osd.8 osd.9 objects% osd.0 0 116 180 0 3972 ... 138 211 1.89% osd.1 121 0 129 64 116 ... 112 137 1.29% osd.11 194 126 0 12 0 ... 168 222 1.94% osd.13 0 75 19 0 211 ... 0 4552 2.06% osd.20 4026 120 0 197 0 ... 90 0 1.92% osd.21 120 2181 65 130 116 ... 85 75 1.29% osd.24 176 150 265 63 0 ... 160 258 2.29% osd.25 123 99 190 198 99 ... 92 182 2.19% osd.26 54 83 62 258 254 ... 51 69 2.27% osd.27 124 109 0 90 73 ... 1840 0 1.55% osd.29 43 54 0 98 123 ... 1857 0 1.60% osd.3 74 82 2112 137 153 ... 61 44 1.62% osd.37 65 108 0 0 166 ... 67 0 1.66% osd.38 163 119 0 0 73 ... 58 0 1.68% osd.44 56 73 2250 148 173 ... 77 43 1.68% osd.46 60 71 132 67 0 ... 39 125 1.31% osd.47 0 51 70 126 70 ... 0 73 1.35% osd.8 151 112 163 0 76 ... 0 175 1.67% osd.9 197 130 202 4493 0 ... 188 0 2.03% objects% 1.92% 1.29% 1.95% 2.03% 1.89% ... 1.69% 2.06% 33.29%
Comparing Ceph crushmaps
Sometimes it is useful to compare two Ceph crushmaps – one real and one hypothetical – for example, to model what will happen if OSDs are added or other parameters (such as weights and tunables) are changed. The crushmap of a running Ceph cluster can be exported in JSON format by issuing the command ceph osd crush dump > origin.json. The “origin.json” file can then be copied to make a new file “destination.json” which is edited to reflect the proposed changes (addition of OSDs, etc.). Please note that the JSON produced by Ceph is different from the JSON consumed by python-crush so an extra conversion step (see the “crush ceph –convert” command) is required. Then use the following command to compare the two crushmaps:
crush compare --rule firstn --replication-count 1 \ --origin <(crush ceph --convert origin.json) \ --destination <(crush ceph --convert destination.json)
Caveat
Movement of objects in a real Ceph cluster may be influenced by parameters that are not stored in the crushmap, such as device weight (stored in the OSDMap) and primary affinity.
Thanks
Many thanks to Nathan Cutler for proofreading part of this post. The well written parts are from him, the rest is my doing.