OpenStack Havana is configured to use OpenVSwitch 1.10.2 as packaged for Ubuntu precise, with a linux-3.11 kernel.The cluster is connected to a 100Mb/s link. When sending data from an instance to the internet (using iperf), it shows ~90Mb/s. When receiving data from the internet to the instance, it is down to ~1Mb/s. After capturing the packets on the interface used by the default route on the hypervisor running the neutron router with
tcpdump -i eth0 host 91.224.149.132 -w /tmp/bad.cap
wireshark /tmp/bad.cap shows a lot of retransmissions.
A similar problem was reported back in October 2013 and hints that it may be a kernel problem. Upgrading the kernel of the hypervisor running the neutron router to linux-3.13 indeed fixes the problem. The compute nodes running the instances do not need their kernel updated, they can keep using the linux-3.11 kernel with the 1.10.2 version of the OpenVSwitch datapath kernel module. The OpenVSwitch kernel part is in the linux-3.13 tree and the openvswitch-datapath-dkms is not used any longer. It will fail to compile against the linux-3.13 headers but the error can be ignored and the package uninstalled.
It seems like openvswitch 2.0, available in the Ubuntu Cloud Archive repository for precise, has much better performance and is more stable.
Thaks for the hint ! The best I could find is
am I missing something ?
Indeed, that is a bit strange. Version 2.0.1 of the package is available at http://ubuntu-cloud.archive.canonical.com/ubuntu/pool/main/o/openvswitch/, but doesn’t appear in any Packages file.
Found version 2.0.1 of the package in the icehouse-proposed and icehouse-updates sections. I am pretty sure openvswitch 2 works with Havana, but we don’t use the packages from Cloud Archive.
Have you tried this test with TSO and GRO turned off on the VMs interfaces?
I did not try this. Could you expand on why that could fix this kind of problem ?
Hi Loic, they are some bugs report about gro and tunnels
https://bugs.launchpad.net/neutron/+bug/1252900
(I think it should be related in case of mtu fragmentation)
GRO is indeed active on the hypervisor running the l3 agent but the linux-3.13 kernel seems to handle the situation well so far.
root@bm0015# ethtool -k eth0 | grep generic-receive-offload:
generic-receive-offload: on
root@bm0501:~# ethtool -k eth0 | grep generic-receive-offload:
generic-receive-offload: on