nova-network debugging tips

A single machine is installed with Debian GNU/Linux OpenStack Folsom. Four instances are created and it turns out that nova-network is configured with the wrong public interface. It can be fixed without shutting down the instance:

nova suspend target1

The instance is suspended to disk (as if it was a laptop) and the corresponding KVM process is killed. While the instance is suspended, nova-network can be stopped.

/etc/init.d/nova-network stop

The source of the problem was a typo in the public interface leading to an incorrect VLAN interface

13: vlan100@eth2:  mtu 1500 qdisc noqueue state DOWN mode DEFAULT
    link/ether fa:16:3e:54:5b:57 brd ff:ff:ff:ff:ff:ff

it can be fixed in the /etc/nova/nova.conf configuration file at the line:

public_interface = eth3

The incorrect VLAN interface is manually deleted and nova-network can be restarted. The instance is then resumed with

nova resume target1

and nova-network will automatically re-create the VLAN interface.

fixing nova-network configuration and restarting the service

When the public IP of the bare metal host is not configured properly in /etc/nova/nova.conf

my_ip = 192.168.20.10

nova-network will create an incorrect SNAT iptables rule

0 0 SNAT all  --  any tun0 10.20.0.0/16 anywhere to:192.168.20.10

When the my_ip line is fixed, /etc/init.d/nova-network can be safely be restarted, even when instances are running on the bare metal. It will not disrupt their connections and the iptables rule will be updated as expected.

modifying the interfaces and restarting the service

Some problems cannot be fixed by simply modifying the /etc/nova/nova.conf file and the VLAN interface must be deleted manually. When the public interface is wrongly configured:

public_interface = eth2

and an instance has been created on the bare metal, a VLAN interface and a bridge are created:

# ip link vlan100
13: vlan100@eth2:  mtu 1500 qdisc noqueue state DOWN mode DEFAULT
    link/ether fa:16:3e:54:5b:57 brd ff:ff:ff:ff:ff:ff
# brctl show br100
bridge name     bridge id               STP enabled     interfaces
br100           8000.fa163e6e08de       no              vlan100
                                                        vnet0
                                                        vnet1
                                                        vnet2
                                                        vnet3

If the configuration file is fixed to use eth3 instead of eth2

public_interface = eth3

restarting nova-network will not change the interface to which vlan100 is attached. Assuming the instances bound to the bridge are as follows:

 nova list
+--------------------------------------+------------+--------+---------------------+
| ID                                   | Name       | Status | Networks            |
+--------------------------------------+------------+--------+---------------------+
| 5e263310-a578-4653-bb48-697cca589297 | target1    | ACTIVE | private_0=10.20.0.6 |
| b108007d-d7e4-4289-83a8-a72280541eb2 | target2    | ACTIVE | private_0=10.20.0.7 |
| 8b97f260-f888-49a2-8f80-065ea49ea3b6 | target3    | ACTIVE | private_0=10.20.0.8 |
| 585ad852-2d34-45a5-8036-607fa0087511 | teuthology | ACTIVE | private_0=10.20.0.5 |
+--------------------------------------+------------+--------+---------------------+

they can be temporarily suspended with:

# nova suspend target1
# nova suspend target2
# nova suspend target3
# nova suspend teuthology
# nova list
+--------------------------------------+------------+-----------+---------------------+
| ID                                   | Name       | Status    | Networks            |
+--------------------------------------+------------+-----------+---------------------+
| 5e263310-a578-4653-bb48-697cca589297 | target1    | SUSPENDED | private_0=10.20.0.6 |
| b108007d-d7e4-4289-83a8-a72280541eb2 | target2    | SUSPENDED | private_0=10.20.0.7 |
| 8b97f260-f888-49a2-8f80-065ea49ea3b6 | target3    | SUSPENDED | private_0=10.20.0.8 |
| 585ad852-2d34-45a5-8036-607fa0087511 | teuthology | SUSPENDED | private_0=10.20.0.5 |
+--------------------------------------+------------+-----------+---------------------+

It will kill the KVM process running the instance and save its state to disk, the equivalent of a laptop suspend to disk. The bridge shows they are no longer attachd to it:

# brctl show br100
bridge name     bridge id               STP enabled     interfaces
br100           8000.fa163e6e08de       no              vlan100

The bridge and the VLAN interface are manually deleted

# ip link set br100 down
# brctl delbr br100
# ip link delete vlan100

When nova-network is stopped, the dnsmasq process persists and it will not notice when a new bridge is created. It must be killed so that nova-network restarts it after the re-creating the bridge.

pkill dnsmasq

If dnsmasq is not killed, the instance will resume properly but will loose its IP after trying to renew the DHCP lease. The instances can be resumed after starting nova-network

# /etc/init.d/nova-network start
# nova resume target1
# nova resume target2
# nova resume target3
# nova resume teuthology

The VLAN interface is created as a side effect of starting the first instance:

# ip link vlan100
13: vlan100@eth3:  mtu 1500 qdisc noqueue state DOWN mode DEFAULT
    link/ether fa:16:3e:54:5b:57 brd ff:ff:ff:ff:ff:ff

and the instances will not notice the difference.