Network boot an OpenStack instance

When an OpenStack instance is defined as a raw disk, the default libvirt XML description is set to boot from it, assuming it contains a boot record. The libvirt.xml.template file is modified to add an attempt to boot from the network before booting from the disk:

<boot dev="network" />

The ebtables and iptables network filtering rules are relaxed by removing the filterref element from the libvirt.xml.template so that the DHCP request from a new instance can be answered by a DHCP server provided by another instance running on the same VLAN.
The dnsmasq run by OpenStack is configured so that it will not answer DHCP requests issued by iPXE

dhcp-userclass=set:ignore,iPXE
dhcp-ignore=tag:ignore

so that DHCP requests issued by DHCP clients such as ISC DHCP get an IP address from the OpenStack provided dnsmasq while instances trying to boot from the network will get their IP and kernel from the DHCP server running on an instance created by a user.

booting from network

When an OpenStack image format is either qcow2 or raw, it is assumed to contain a boot record. No attempt is made to boot from the network, although iPXE is installed by default. kvm could use

-boot order=nc

instead of the default which is to only try to boot from disk. OpenStack is instructed to use an alternate libvirt XML by adding the following line in /etc/nova/nova.conf

--libvirt_xml_template=/etc/nova/libvirt.xml.template

It can also be specified using puppet as explained in OpenStack nested virtual machines. The /etc/nova/libvirt.xml.template is modified to change

<boot dev="hd" />

into

<boot dev="network" />
<boot dev="hd" />

The boot element containing the dev=”network” attribute is placed before the one with the dev=”hd” attribute to ensure that an attempt to boot from network is made before booting from the disk.

The screenshot above shows iPXE trying to acquire an IP from a DHCP server. It is run by kvm when required to boot from network and supports the same network cards as kvm.
When an AMI image is used, libvirt will not attempt to boot from the network, even if instructed to and it will not call iPXE. The kernel element takes precedence and the boot order is ignored.

Relaxing network firewall

The OpenStack network has been setup with VLAN and the users are trusted. The intent of the administrator is to protect from accidents rather than intrusions from the tenant users.
The ebtables and iptables network filtering rules are relaxed by removing the filterref element from the libvirt.xml.template. By default it looks like this when instantiated from libvirt.xml.template:

<filterref filter='nova-instance-instance-0000006c-fa163e5043cd'>
 <parameter name='DHCPSERVER' value='10.145.6.4'/>
 <parameter name='IP' value='10.145.6.5'/>
</filterref>

It refers to a series of nwfilter rules located in the /etc/libvirt/nwfilter/nova-instance-instance-0000006c-fa163e5043cd.xml file on the OpenStack node running the instance.
It means that only DHCP answers issued by the server running on the IP 10.145.6.4 will reach the instance. And that any packet going out of the instance with an IP that is not 10.145.6.5 will be discarded. The filterref element is removed to allow the tenant to run its own DHCP server. This DHCP server will assign an IP to the instance that is different from 10.145.6.5 and packets going out of the instance will be authorized.
Since the tenant has its own VLAN, the DHCP server can provide IP from any subnet without risking a conflict with the network assigned to other tenants. For the same reason, there is no risk that the DHCP server run by the tenant answers requests from instances run by another tenant.

In the screenshot above, the iPXE has been interrupted with control-b to display a command prompt. The dhcp command acquires the IP 10.145.6.107 as shown by the route command. It has been acquired from a DHCP server running on an instance of the same tenant and configured as follows:

subnet 10.145.6.0 netmask 255.255.255.0 {
  pool
  {
    range 10.145.6.100 10.145.6.150;
  }
  option subnet-mask 255.255.255.0;
  option routers 10.145.6.4;
}

resolving DHCP server race conditions

The DHCP server run from the instance within the tenant is going to compete with the DHCP server provided by OpenStack. The following rules are established to resolve the race conditon:

  • All iPXE DHCP clients requests are ignored by the DHCP server provided by OpenStack
  • All non iPXE DHCP client requests are ignored by the DHCP server provided by an instance running in the same tenant

The selection makes use of the iPXE user class set on each DHCP request issued by iPXE.
Assuming an instance is running ISC dhcp, providing an IP is restricted to iPXE DHCP requests using the following conditional:

if exists user-class and option user-class = "iPXE" {
...
}

Within OpenStack, dnsmasq is configured by adding the following line in /etc/nova/nova.conf

--dnsmasq_config_file=/etc/nova/dnsmasq.conf

and editing /etc/nova/dnsmasq.conf to add

dhcp-userclass=set:ipxe,iPXE
dhcp-ignore=tag:ipxe

It creates the ipxe tag when detecting the user class string iPXE with dhcp-userclass=set:ipxe,iPXE. It then uses the ipxe tag to ignore any incoming request when it is set (see dhcp-ignore=tag:ipxe ). If it does not work, adding the log-dhcp will show a verbose output in /var/log/daemon.log.
It translates into the following puppet snippet when OpenStack is deployed using puppet

  ##################################################

  nova_config { 'dnsmasq_config_file': value => '/etc/nova/dnsmasq.conf' }

  file { '/etc/nova/dnsmasq.conf':
    ensure  => present,
    owner   => "root",
    group   => "root",
    mode    => 0444,
    content  => "dhcp-userclass=set:ipxe,iPXE\ndhcp-ignore=tag:ipxe\n",
  }


In the screenshot above, a razor server running on a tenant instance stands ready to deploy OpenStack within OpenStack.

caveats

The IP allocated by the DHCP server running in an instance are not known by OpenStack. OpenStack will allocate an IP to the instance, no matter what, in a static configuration file specific to the tenant, such as /var/lib/nova/networks/nova-br2006.conf:

fa:16:3e:12:8b:2f,razor.novalocal,10.145.6.3
fa:16:3e:36:02:72,cir.novalocal,10.145.6.5
fa:16:3e:0d:1e:6b,cir2.novalocal,10.145.6.6
fa:16:3e:18:f1:d2,abc.novalocal,10.145.6.7
fa:16:3e:51:b0:1f,def.novalocal,10.145.6.8

and dnsmasq will be run to use it with

dnsmasq \
...
  --dhcp-range=10.145.6.3,static,120s \
  --dhcp-lease-max=256 \
  --dhcp-hostsfile=/var/lib/nova/networks/nova-br2006.conf \
...

The DHCP instance created from the user must chose to allocate IP from a range that does not conflict with the range used by OpenStack to avoid problems. Chosing the range 10.145.6.150 to 10.145.6.250 will work in the example above and for OpenStack Essex, but it is not guaranteed to migrate to future versions of OpenStack and will conflict if more than 150 instances try to boot from the OpenStack dnsmasq provided DHCP server. It might be better to chose a completly different subnet such as 192.168.6.0/24.

iPXE kernel

Using the ipxe.lkrn kernel as an AKI and a iPXE script as an ARI could work. However, it is not compatible with the presence of a command line such as

     <cmdline>root=/dev/vda1 console=ttyS0</cmdline>

which can’t be controlled user side. The iPXE kernel will complain that it does not understand the root=/dev/vda1 and will not consider the script provided in the ARI. There may be a simple solution to this problem.

2 Replies to “Network boot an OpenStack instance”

  1. Loic, merci beaucoup! This is exactly what I am attempting to do.. but in OpenStack Grizzly. Should this approach be followed in Grizzly?

    1. Hi,

      Things changed significantly since Essex and to be honest I don’t know how that should be done in Grizzly or Havana 🙂 I would love to know though.

      Cheers

Comments are closed.