OpenStack script to pre-allocate fixed IPs

The create-ports.py script allocates ports and indirectly gets fixed IPs from the DHCP server. The ports are named openstack000, openstack001 etc. and they are displayed in a format suitable for dnsmasq:

$ python create-ports.py --count 2 --net fsf-lan |  \
   sudo tee /etc/dnsmasq.d/openstack
host-record=openstack000,10.0.3.32
host-record=openstack001,10.0.3.33

If fsf-lan is a network shared with other tenants, it makes sure the IP are reserved, although they are not yet bound to an instance.

$ neutron port-list
+--------------------------------------+--------------+..
| id                                   | name         | ...
+--------------------------------------+--------------+...
| 1d1a05b1-383d-49ef-ae75-5ddcb5c714db | openstack001 |....
...
+--------------------------------------+--------------+...

An new instance can then be given a known IP with:

$ openstack server create --image ubuntu-trusty-14.04 \
  --flavor 1cpu-1G \
  --key-name teuthology \
  --nic net-id=d936f445-5d68-485a-94f2-b852fd6b7d0c,v4-fixed-ip=10.0.3.33 \
  --wait openstack001

In the case of teuthology it is useful because the DNS can be configured once and for all while instances are dynamically created using IPs from the DNS instead of relying on allocation from the OpenStack DHCP server.
Continue reading “OpenStack script to pre-allocate fixed IPs”

create / delete an OpenStack instance with python-openstackclient

The python-openstackclient library has an example that provides the basic structure for a new command (the auth_url problem workaround may be needed). To create a virtual machine with 1GB RAM, 1CPU, ubuntu-14.04, using the teuthology keypair on the fsf-lan network, the matching flavor, image, keypair and network objects can be found with:

    for flavor in client_manager.compute.flavors.list():
        if flavor.ram == 1024 and flavor.vcpus == 1:
            break
    for network in client_manager.compute.networks.list():
        if network.label == 'fsf-lan':
            break
    for image in client_manager.compute.images.list():
        if 'ubuntu' in image.name and '14.04' in image.name:
            break
    for keypair in client_manager.compute.keypairs.list():
        if keypair.name == 'teuthology':
            break

The test instance can then be created

   server = client_manager.compute.servers.create('test',
                        image, flavor,
                        key_name=keypair.name,
                        nics=[{'net-id': network.id}])

but it won’t be immediately active and the wait_for_status can be used to block until it is:

from openstackclient.common import utils
...
    utils.wait_for_status(
        client_manager.compute.servers.get,
        server.id)

Deleting the instance is simpler:

    client_manager.compute.servers.delete(server.id)
    utils.wait_for_delete(client_manager.compute.servers.get, server.id)

See create-delete.py for a standalone script including the above lines that can be run as:

$ python create-server.py --help
usage: create-server.py [-h] [--os-compute-api-version ]
...
$ python create-server.py
FLAVOR: {'name': u'm1.small', ...
NETWORK: {'cidr_v6': None, 'dns2': None, 'dns1': None, 'netmask': None, 'label': u'fsf-lan',...
IMAGE: {'status': u'ACTIVE', 'updated': u'2014-05-19T11:43:00Z', 'name': u'ubuntu-trusty-14.04',...
KEYPAIR: {'public_key': u'ssh-rsa AAAAB3...

Continue reading “create / delete an OpenStack instance with python-openstackclient”

OpenStack Upstream Training challenges

The OpenStack Upstream Training scheduled november 1st, 2014 in Paris will have a unprecedented number of participants and for the first time there is a shortage of Lego. In addition to the 80 pounds of spare parts (picture fourground), six new buildings have been acquired today (Tower Bridge, Sydney Opera, Parisian Restaurant, Pets Shop, Palace Cinema and Grand Emporium). They will be at Lawomatic for assembly form October 1st to October 31st. Anyone willing to participate please send me an email.

Once this first challenge is complete, the buildings will have to be transported to the Hyatt conference rooms where the training will take place. The rendez-vous point is 8am Lawomatic Saturday November 1st, 2014. Each of us will carefully transport a building (or part of it in the case of the Tower Bridge) in the subway. There will be coffee and croissants upon arrival 🙂

enable secondary network interface and ignore the default route

When two network interfaces are associated to an OpenStack instance, the Ubuntu precise guest will only configure the first one. Assuming the second can be configured via DHCP, it can be added with:

cat > /etc/network/interfaces.d/eth1.cfg <<EOF
auto eth1
iface eth1 inet dhcp
EOF

If the DHCP server answering on eth1 provides a default gateway, it will override the gateway provided by the DHCP server answering on eth0. The routers request can be removed from the list of default requests in /etc/dhcp/dhclient.conf

request subnet-mask, broadcast-address, time-offset,
        domain-name, domain-name-servers, domain-search, host-name,
        netbios-name-servers, netbios-scope, interface-mtu,
        rfc3442-classless-static-routes, ntp-servers,
        dhcp6.domain-search, dhcp6.fqdn,
        dhcp6.name-servers, dhcp6.sntp-servers;

It can then be re-activated only for eth0

echo 'interface "eth0" { request routers; }' >> /etc/dhcp/dhclient.conf

Recovering from a cinder RBD host failure

OpenStack Havana Cinder volumes associated with a RBD Ceph pool are bound to a host.

cinder service-list --host bm0014.the.re@rbd-ovh
+---------------+-----------------------+------+---------+-------+
|     Binary    |          Host         | Zone |  Status | State |
+---------------+-----------------------+------+---------+-------+
| cinder-volume | bm0014.the.re@rbd-ovh | ovh  | enabled |   up  |
+---------------+-----------------------+------+---------+-------+

A volume created on this host is permanently associated with it:

$ mysql -e "select host from volumes where deleted = 0 and display_name = 'nerrant.fr'" cinder
+-----------------------+
| host                  |
+-----------------------+
| bm0014.the.re@rbd-ovh |
+-----------------------+

If the host fails, any attempt to detach the volume will fail because the cinder-api cannot reach the host:

/var/log/cinder/cinder-api.log
2014-05-04 17:50:59.928 15128 TRACE cinder.api.middleware.fault Timeout: Timeout while
   waiting on RPC response - topic: "cinder-volume:bm0014.the.re@rbd-ovh",
   RPC method: "terminate_connection" info: ""

The failed cinder host is first disabled so the scheduler will no longer try to access it:

cinder service-disable bm0014.the.re cinder-volume

The database is updated with another host configured with access to the same Ceph pool.

$ mysql -e "update volumes set host = 'bm0015.the.re@rbd-ovh' \
   where deleted = 0 and display_name = 'nerrant.fr'" cinder

Non profit OpenStack & Ceph cluster distributed over five datacenters

A few non profit organizations (April, FSF France, tetaneutral.net…) and volunteers constantly research how to get compute, storage and bandwidth that are:

  • 100% Free Software
  • Content neutral
  • Low maintenance
  • Reliable
  • Cheap

The latest setup, in use since ocbober 2013, is based on a Ceph and OpenStack cluster spread over five datacenters. It has been designed for the following use cases:

  • Free Software development and continuous integration
  • Hosting low activity web sites, mail servers etc.
  • Keeping backups
  • Sharing movies and music

Continue reading “Non profit OpenStack & Ceph cluster distributed over five datacenters”

Fixing OpenVSwitch and GRE asymetric performances

OpenStack Havana is configured to use OpenVSwitch 1.10.2 as packaged for Ubuntu precise, with a linux-3.11 kernel.The cluster is connected to a 100Mb/s link. When sending data from an instance to the internet (using iperf), it shows ~90Mb/s. When receiving data from the internet to the instance, it is down to ~1Mb/s. After capturing the packets on the interface used by the default route on the hypervisor running the neutron router with

tcpdump -i eth0 host 91.224.149.132 -w /tmp/bad.cap

wireshark /tmp/bad.cap shows a lot of retransmissions.

A similar problem was reported back in October 2013 and hints that it may be a kernel problem. Upgrading the kernel of the hypervisor running the neutron router to linux-3.13 indeed fixes the problem. The compute nodes running the instances do not need their kernel updated, they can keep using the linux-3.11 kernel with the 1.10.2 version of the OpenVSwitch datapath kernel module. The OpenVSwitch kernel part is in the linux-3.13 tree and the openvswitch-datapath-dkms is not used any longer. It will fail to compile against the linux-3.13 headers but the error can be ignored and the package uninstalled.

Two minor pitfalls when upgrading Havana stable

When upgrading an OpenStack compute or l3 agent node from 1:2013.2 to 1:2013.2.3 on Ubuntu precise 12.04.4:

  • The nova-compute version 1:2013.2 is expected to fail with
    /var/log/nova/nova-compute.log
    IncompatibleObjectVersion: Version 1.9 of Instance is not supported
    

    when interfaced with a 1:2013.2.3. It will not disrupt the running instances but will prevent operations on them until the upgrade is complete.

  • neutron-l3-agent will fail with:

    /var/log/neutron/metadata-agent.log
    AttributeError: 'HTTPClient' object has no attribute 'auth_tenant_id'
    

    because the python-neutronclient package must also be upgraded . It only happens if upgrading with apt-get install neutron-common but will be fine if upgrading with apt-get dist-upgrade.

Reseting an instance {power,vm,task}_state in Havana

Sometime, after an hypervisor crash or nova-compute error, an OpenStack instance can be left in a state that cannot be conveniently fixed with nova reset-state.

$ nova list
+--------------------------------------+----------------+---------+...
| ID                                   | Name           | Status  |
+--------------------------------------+----------------+---------+...
| ca9496e9-0bd2-4734-9cf9-eb4e264628f7 | www            | SHUTOFF |
+--------------------------------------+----------------+---------+...
...  -------------+-------------+----------------------------------+
      Task State  | Power State | Networks                         |
...  -------------+-------------+----------------------------------+
      powering-on | Shutdown    | fsf-lan=10.0.3.18, 93.20.168.177 |
...  -------------+-------------+----------------------------------+

Setting the fields for the instance directly in the database will allow operations on the instance (nova start or nova volume-detach for instance):

$ mysql -e "update instances set task_state = NULL, \
           vm_state = 'stopped', \
           power_state = 4 \
           where deleted = 0 and hostname = 'www' and \
           uuid = 'ca9496e9-0bd2-4734-9cf9-eb4e264628f7'" nova

Using the uuid is necessary to avoid modifying an unrelated instance with the same name. This should be done only after verifying that the instance does not exist on the hypervisor with:

$ ps -fauwwwx | grep ca9496e9-0bd2-4734-9cf9-eb4e264628f7