node resurrection in the FSF France ganeti cluster

The z2-8 node of the FSF France ganeti cluster was resurected using a 24GB RAM, 2TB disk, i7 processor rented hardware.

backup policy

Each node of the FSF France ganeti cluster is backed up on a dedicated LVM volume on a USB disk behind a fiber link at lawomatic. ; if [ -d /mnt/$vm/lost+found ] ; then last=$(ls -t /mnt/$vm | grep -v lost+found | grep -v EXCLUDE | grep -v $(date +%Y-%m-%d)-$vm | head -1) ; rsync -avzH --numeric-ids --delete-excluded --delete $bwlimit $exclude --exclude=/srv --exclude=/tmp --exclude=/proc --exclude=/mnt --exclude=/sys --exclude=*.raw --exclude=*.iso --exclude-from=/mnt/$vm/EXCLUDE --link-dest=/mnt/$vm/${last:-unknown}/ root@$vm:/ /mnt/$vm/$(date +%Y-%m-%d)-$vm/ ; touch /mnt/$vm/$(date +%Y-%m-%d)-$vm/ ; else echo mount /dev/*/$vm /mnt/$vm ; fi

The command is run manually whenever a sysadmin does something on the machine. Back in may 2010 the hardware hosting the z2-8 node has been decommissioned and the VM it hosted migrated to other nodes. The last remaining backup was:

migration plan

When the z2-8 node was shutdown, it was marked offline but not actually removed from the cluster:

gnt-node modify --offline=yes --auto-promote

When fredix pointed out an attractive hardware rental offer, a year later, the z2-8 node could resurrected to replace the z2-9 node. The z2-9 node currently runs on a 12GB RAM, 2TB disk, Intel(R) Core(TM) i7 CPU 950 @ 2.67GHz rented for 99 euros per month and z2-8 would run on a 24GB RAM, 2TB disk, Intel(R) Core(TM) i7 CPU 920 @ rented for 3.07GHz rented for 60 euros per month.
Once the resurrection is done, the IPs going to z2-9 would be migrated to z2-8 as well as all the virtual machines. The IP bound to the z2-9 hardware cannot be migrated. Because of that it is never used to address services and can be shutdown without inconvenience. All services are bound to IP that can be migrated from a OVH/kimsufi hardware to another. Once the node is ready to be back, it is re-inserted in the cluster:

gnt-node modify --offline=no


The backup of z2-8 dated may 2010 is the full root file system of the operating system. Copying it to the disk and adjusting for the new hardware is enough to resurrect it. Once the hardware was delivered by OVH, it was rebooted in rescue mode, i.e. not using the disk. The former z2-8 host had two 1TB disks instead of one for the new hardware. The disk was partitioned as follows:

Disk /dev/sda: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x00021db5

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1               1        2491    20008926   83  Linux
/dev/sda2            2492        5410    23446867+  82  Linux swap / Solaris
/dev/sda3            5411      243201  1910056207+  8e  Linux LVM

The sda1 partition was formatted with as ext4 and mounted on /mnt. The file system was restored from the machine hosting the backups:

rsync -avzH --numeric-ids /mnt/

The generic instructions to resurrect a node were run and updated. After proceeding it was possible to login again on the ancient node:

Last login: Mon May 31 18:18:36 2010 from

There was a number of changes to pull for the shorewall and dhcp configurations. In the cluster, all node duplicate the same configuration and share it with mercurial. It allows them to share it while being able to use them independently should the other hosts be unreachable.

z2-8:/etc/dhcp3# hg pull
pulling from ssh://dhcp.fsffrance.vm.gnt//etc/dhcp3/
searching for changes
adding changesets
adding manifests
adding file changes
added 55 changesets with 48 changes to 2 files

Once the node successfully inserted in the cluster, the proxy dispatching all incoming HTTP requests directed to the node was restored from the secondary device that still held a copy of its disk.

gnt-instance replace-disks -p proxy8.vm.gnt