Although it is extremely unlikely to loose an object stored in Ceph, it is not impossible. When it happens to a Cinder volume based on RBD, knowing which has an object missing will help with disaster recovery.
Continue reading “What cinder volume is missing an RBD object ?”
enable secondary network interface and ignore the default route
When two network interfaces are associated to an OpenStack instance, the Ubuntu precise guest will only configure the first one. Assuming the second can be configured via DHCP, it can be added with:
cat > /etc/network/interfaces.d/eth1.cfg <<EOF auto eth1 iface eth1 inet dhcp EOF
If the DHCP server answering on eth1 provides a default gateway, it will override the gateway provided by the DHCP server answering on eth0. The routers request can be removed from the list of default requests in /etc/dhcp/dhclient.conf
request subnet-mask, broadcast-address, time-offset, domain-name, domain-name-servers, domain-search, host-name, netbios-name-servers, netbios-scope, interface-mtu, rfc3442-classless-static-routes, ntp-servers, dhcp6.domain-search, dhcp6.fqdn, dhcp6.name-servers, dhcp6.sntp-servers;
It can then be re-activated only for eth0
echo 'interface "eth0" { request routers; }' >> /etc/dhcp/dhclient.conf
Recovering from a cinder RBD host failure
OpenStack Havana Cinder volumes associated with a RBD Ceph pool are bound to a host.
cinder service-list --host bm0014.the.re@rbd-ovh +---------------+-----------------------+------+---------+-------+ | Binary | Host | Zone | Status | State | +---------------+-----------------------+------+---------+-------+ | cinder-volume | bm0014.the.re@rbd-ovh | ovh | enabled | up | +---------------+-----------------------+------+---------+-------+
A volume created on this host is permanently associated with it:
$ mysql -e "select host from volumes where deleted = 0 and display_name = 'nerrant.fr'" cinder +-----------------------+ | host | +-----------------------+ | bm0014.the.re@rbd-ovh | +-----------------------+
If the host fails, any attempt to detach the volume will fail because the cinder-api cannot reach the host:
/var/log/cinder/cinder-api.log 2014-05-04 17:50:59.928 15128 TRACE cinder.api.middleware.fault Timeout: Timeout while waiting on RPC response - topic: "cinder-volume:bm0014.the.re@rbd-ovh", RPC method: "terminate_connection" info: ""
The failed cinder host is first disabled so the scheduler will no longer try to access it:
cinder service-disable bm0014.the.re cinder-volume
The database is updated with another host configured with access to the same Ceph pool.
$ mysql -e "update volumes set host = 'bm0015.the.re@rbd-ovh' \ where deleted = 0 and display_name = 'nerrant.fr'" cinder
Non profit OpenStack & Ceph cluster distributed over five datacenters
A few non profit organizations (April, FSF France, tetaneutral.net…) and volunteers constantly research how to get compute, storage and bandwidth that are:
- 100% Free Software
- Content neutral
- Low maintenance
- Reliable
- Cheap
The latest setup, in use since ocbober 2013, is based on a Ceph and OpenStack cluster spread over five datacenters. It has been designed for the following use cases:
- Free Software development and continuous integration
- Hosting low activity web sites, mail servers etc.
- Keeping backups
- Sharing movies and music
Continue reading “Non profit OpenStack & Ceph cluster distributed over five datacenters”
Fixing OpenVSwitch and GRE asymetric performances
OpenStack Havana is configured to use OpenVSwitch 1.10.2 as packaged for Ubuntu precise, with a linux-3.11 kernel.The cluster is connected to a 100Mb/s link. When sending data from an instance to the internet (using iperf), it shows ~90Mb/s. When receiving data from the internet to the instance, it is down to ~1Mb/s. After capturing the packets on the interface used by the default route on the hypervisor running the neutron router with
tcpdump -i eth0 host 91.224.149.132 -w /tmp/bad.cap
wireshark /tmp/bad.cap shows a lot of retransmissions.
A similar problem was reported back in October 2013 and hints that it may be a kernel problem. Upgrading the kernel of the hypervisor running the neutron router to linux-3.13 indeed fixes the problem. The compute nodes running the instances do not need their kernel updated, they can keep using the linux-3.11 kernel with the 1.10.2 version of the OpenVSwitch datapath kernel module. The OpenVSwitch kernel part is in the linux-3.13 tree and the openvswitch-datapath-dkms is not used any longer. It will fail to compile against the linux-3.13 headers but the error can be ignored and the package uninstalled.
Two minor pitfalls when upgrading Havana stable
When upgrading an OpenStack compute or l3 agent node from 1:2013.2 to 1:2013.2.3 on Ubuntu precise 12.04.4:
- The nova-compute version 1:2013.2 is expected to fail with
/var/log/nova/nova-compute.log IncompatibleObjectVersion: Version 1.9 of Instance is not supported
when interfaced with a 1:2013.2.3. It will not disrupt the running instances but will prevent operations on them until the upgrade is complete.
-
neutron-l3-agent will fail with:
/var/log/neutron/metadata-agent.log AttributeError: 'HTTPClient' object has no attribute 'auth_tenant_id'
because the python-neutronclient package must also be upgraded . It only happens if upgrading with apt-get install neutron-common but will be fine if upgrading with apt-get dist-upgrade.
Reseting an instance {power,vm,task}_state in Havana
Sometime, after an hypervisor crash or nova-compute error, an OpenStack instance can be left in a state that cannot be conveniently fixed with nova reset-state.
$ nova list +--------------------------------------+----------------+---------+... | ID | Name | Status | +--------------------------------------+----------------+---------+... | ca9496e9-0bd2-4734-9cf9-eb4e264628f7 | www | SHUTOFF | +--------------------------------------+----------------+---------+... ... -------------+-------------+----------------------------------+ Task State | Power State | Networks | ... -------------+-------------+----------------------------------+ powering-on | Shutdown | fsf-lan=10.0.3.18, 93.20.168.177 | ... -------------+-------------+----------------------------------+
Setting the fields for the instance directly in the database will allow operations on the instance (nova start or nova volume-detach for instance):
$ mysql -e "update instances set task_state = NULL, \ vm_state = 'stopped', \ power_state = 4 \ where deleted = 0 and hostname = 'www' and \ uuid = 'ca9496e9-0bd2-4734-9cf9-eb4e264628f7'" nova
Using the uuid is necessary to avoid modifying an unrelated instance with the same name. This should be done only after verifying that the instance does not exist on the hypervisor with:
$ ps -fauwwwx | grep ca9496e9-0bd2-4734-9cf9-eb4e264628f7
HOWTO migrate an AMI from Essex to a bootable volume on Havana
A snapshot of an Essex OpenStack instance contains an AMI ext3 file system. It is rsync’ed to a partitioned volume in the Havana cluster. After installing grub from chroot, a new instance can be booted from the volume.
Continue reading “HOWTO migrate an AMI from Essex to a bootable volume on Havana”
wget on an OpenStack instance hangs ? Try lowering the MTU
Why would OpenStack instances fail to wget a URL and work perfectly on others ? For instance:
$ wget -O - 'https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/autobuild.asc' Connecting to ceph.com (ceph.com)|208.113.241.137|:443... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/plain] Saving to: `STDOUT' [<=> ] 0 --.-K/s ^
If it can be fixed by lowering the MTU from the default of 1500 to 1400 with:
$ sudo ip link set mtu 1400 dev eth0 $ sudo ip link show dev eth0 2: eth0:mtu 1400 qdisc pfifo_fast state UP qlen 1000 link/ether fa:16:3e:85:ee:a5 brd ff:ff:ff:ff:ff:ff
it means the underlying OpenStack DHCP should be fixed to set the MTU to 1400.
Continue reading “wget on an OpenStack instance hangs ? Try lowering the MTU”
Mixing Ceph and LVM volumes in OpenStack
Ceph pools are defined to collocate volumes and instances in OpenStack Havana. For volumes that do not need the resilience provided by Ceph, a LVM cinder backend is defined in /etc/cinder/cinder.conf:
[lvm] volume_group=cinder-volumes volume_driver=cinder.volume.drivers.lvm.LVMISCSIDriver volume_backend_name=LVM
and appended to the list of existing backends:
enabled_backends=rbd-default,rbd-ovh,rbd-hetzner,rbd-cloudwatt,lvm
A cinder volume type is created and associated with it:
# cinder type-create lvm +--------------------------------------+------+ | ID | Name | +--------------------------------------+------+ | c77552ff-e513-4851-a5e6-2c83d0acb998 | lvm | +--------------------------------------+------+ # cinder type-key lvm set volume_backend_name=LVM # cinder extra-specs-list +--------------------------------------+-----------+--------------------------------------------+ | ID | Name | extra_specs | +--------------------------------------+-----------+--------------------------------------------+ ... | c77552ff-e513-4851-a5e6-2c83d0acb998 | lvm | {u'volume_backend_name': u'LVM'} | ... +--------------------------------------+-----------+--------------------------------------------+
To reduce the network overhead, a backend availability zone is defined for each bare metal by adding to /etc/cinder/cinder.conf:
storage_availability_zone=bm0015
and restarting cinder-volume:
# restart cinder-volume # sleep 5 # cinder-manage host list host zone ... bm0015.the.re@lvm bm0015 ...
where bm0015 is the hostname of the machine. To create a LVM backed volume that is located on bm0015:
cinder create --availability-zone bm0015 --volume-type lvm --display-name test 1
In order for the allocation of RBD volumes to keep working without specifying an availability zone, there must be at least one cinder volume running in the default availability zone ( nova presumably ) and configured with the expected RBD backends. This can be checked with:
# cinder-manage host list | grep nova ... bm0017.the.re@rbd-cloudwatt nova bm0017.the.re@rbd-ovh nova bm0017.the.re@lvm nova bm0017.the.re@rbd-default nova bm0017.the.re@rbd-hetzner nova ...
In the above the lvm volume type is also available in the nova availability zone and is used as a catch all when a LVM volume is prefered but collocating it on the same machine as the instance does not matter.