Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F

Subject: Re: [virtio-dev] [PATCH v4] content: Introduce VIRTIO_NET_F_STANDBY feature

Hi all,

It's been a while since the last discussion here. I have been working on implementing the standby feature in Qemu. I have tried multiple approaches for implementation and in the end decided to implement using the hotplug/unplug infrastructure for multiple reasons which I'll go over when I send the patches. For now you can find the implementation here:

https://github.com/sameehj/qemu/tree/failover_hidden_optsÂ(the full command line I used can be found at the end of the email)

I have tested my implementation in Qemu with Fedora 29 guest, I can see the failover interface successfully and assign an ip to it. The feature is acked and the primary device is plugged in with no issues.

I have created a setup which has two hosts (host A and host B) with X710 10G cardsÂconnected back to back. On one host (I'll refer to this host as host A) I have configured a bridge with the PF interface as well as vitio-net's interface (standby) both attached to it. I ran the guest with the patched Qemu on host A and pinged the bridge successfully, I also have a ping between host A and Host B, however, I can't ping host B from the VM and vice versa, this only happens when the feature is enabled for some reason I have yet to figure out.

I haven't tested migration yet, but on my way to do so.

Since I couldn't ping from VM to host B, I did an iperf test between the VM and host A with the feature enabled and during the test I have unplugged the sriov device, the device was unplugged successfully and no drops where observed as you can see in the results below:

[root@dhcp156-44 ~]# ifconfig
ens3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> Âmtu 1500
Â Â Â Â inet 10.19.156.44 Ânetmask 255.255.248.0 Âbroadcast 10.19.159.255
Â Â Â Â inet6 fe80::d306:561f:9f43:ff77 Âprefixlen 64 Âscopeid 0x20<link>
Â Â Â Â inet6 2620:52:0:1398:9699:325b:25f9:e7bb Âprefixlen 64 Âscopeid 0x0<global>
Â Â Â Â ether 56:cc:c1:01:cc:21 Âtxqueuelen 1000 Â(Ethernet)
Â Â Â Â RX packets 12258 Âbytes 870822 (850.4 KiB)
Â Â Â Â RX errors 11 Âdropped 0 Âoverruns 0 Âframe 11
Â Â Â Â TX packets 294 Âbytes 32432 (31.6 KiB)
Â Â Â Â TX errors 0 Âdropped 0 overruns 0 Âcarrier 0 Âcollisions 0

ens4: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> Âmtu 1500
Â Â Â Â inet 192.168.1.17 Ânetmask 255.255.255.0 Âbroadcast 192.168.1.255
Â Â Â Â inet6 fe80::bc87:86b8:bc86:be4e Âprefixlen 64 Âscopeid 0x20<link>
Â Â Â Â ether 8a:f7:20:29:3b:cb Âtxqueuelen 1000 Â(Ethernet)
Â Â Â Â RX packets 41052 Âbytes 2775833 (2.6 MiB)
Â Â Â Â RX errors 0 Âdropped 0 Âoverruns 0 Âframe 0
Â Â Â Â TX packets 47468 Âbytes 15629 (15.2 KiB)
Â Â Â Â TX errors 0 Âdropped 0 overruns 0 Âcarrier 0 Âcollisions 0

ens6: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> Âmtu 1500
Â Â Â Â ether 8a:f7:20:29:3b:cb Âtxqueuelen 1000 Â(Ethernet)
Â Â Â Â RX packets 214 Âbytes 14966 (14.6 KiB)
Â Â Â Â RX errors 0 Âdropped 0 Âoverruns 0 Âframe 0
Â Â Â Â TX packets 163 Âbytes 26498 (25.8 KiB)
Â Â Â Â TX errors 0 Âdropped 0 overruns 0 Âcarrier 0 Âcollisions 0

ens4nsby: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> Âmtu 1500
Â Â Â Â ether 8a:f7:20:29:3b:cb Âtxqueuelen 1000 Â(Ethernet)
Â Â Â Â RX packets 41052 Âbytes 2775833 (2.6 MiB)
Â Â Â Â RX errors 0 Âdropped 0 Âoverruns 0 Âframe 0
Â Â Â Â TX packets 47468 Âbytes 2889827541 (2.6 GiB)
Â Â Â Â TX errors 0 Âdropped 0 overruns 0 Âcarrier 0 Âcollisions 0

lo: flags=73<UP,LOOPBACK,RUNNING> Âmtu 65536
Â Â Â Â inet 127.0.0.1 Ânetmask 255.0.0.0
Â Â Â Â inet6 ::1 Âprefixlen 128 Âscopeid 0x10<host>
Â Â Â Â loop Âtxqueuelen 1000 Â(Local Loopback)
Â Â Â Â RX packets 176 Âbytes 19712 (19.2 KiB)
Â Â Â Â RX errors 0 Âdropped 0 Âoverruns 0 Âframe 0
Â Â Â Â TX packets 176 Âbytes 19712 (19.2 KiB)
Â Â Â Â TX errors 0 Âdropped 0 overruns 0 Âcarrier 0 Âcollisions 0

[root@dhcp156-44 ~]# iperf -c 192.168.1.117 -t 100 -i 1
------------------------------------------------------------
Client connecting to 192.168.1.117, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[ Â3] local 192.168.1.17 port 40368 connected with 192.168.1.117 port 5001
[ ID] Interval Â Â Â Transfer Â Â Bandwidth
[ Â3] Â0.0- 1.0 sec Â3.47 GBytes Â29.8 Gbits/sec
[ Â3] Â1.0- 2.0 sec Â4.35 GBytes Â37.4 Gbits/sec
[ Â3] Â2.0- 3.0 sec Â4.10 GBytes Â35.2 Gbits/sec
[ Â3] Â3.0- 4.0 sec Â4.20 GBytes Â36.1 Gbits/sec
[ Â3] Â4.0- 5.0 sec Â4.20 GBytes Â36.1 Gbits/sec
[ Â3] Â5.0- 6.0 sec Â4.07 GBytes Â34.9 Gbits/sec
[ Â3] Â6.0- 7.0 sec Â4.53 GBytes Â38.9 Gbits/sec
[ Â3] Â7.0- 8.0 sec Â4.38 GBytes Â37.6 Gbits/sec
[ Â3] Â8.0- 9.0 sec Â4.60 GBytes Â39.5 Gbits/sec
[ Â3] Â9.0-10.0 sec Â4.60 GBytes Â39.5 Gbits/sec
[ Â3] 10.0-11.0 sec Â4.56 GBytes Â39.2 Gbits/sec
[ Â3] 11.0-12.0 sec Â4.70 GBytes Â40.4 Gbits/sec
[ Â3] 12.0-13.0 sec Â4.65 GBytes Â39.9 Gbits/sec
[ Â3] 13.0-14.0 sec Â4.51 GBytes Â38.7 Gbits/sec
[ Â3] 14.0-15.0 sec Â4.48 GBytes Â38.5 Gbits/sec
[ Â3] 15.0-16.0 sec Â4.67 GBytes Â40.2 Gbits/sec
[ Â3] 16.0-17.0 sec Â4.37 GBytes Â37.5 Gbits/sec
[ Â3] 17.0-18.0 sec Â4.68 GBytes Â40.2 Gbits/sec
[ Â3] 18.0-19.0 sec Â4.99 GBytes Â42.9 Gbits/sec
[ Â3] 19.0-20.0 sec Â5.00 GBytes Â42.9 Gbits/sec
[ Â3] 20.0-21.0 sec Â4.90 GBytes Â42.1 Gbits/sec
[ Â3] 21.0-22.0 sec Â4.72 GBytes Â40.5 Gbits/sec
[ Â3] 22.0-23.0 sec Â4.60 GBytes Â39.5 Gbits/sec
[ Â3] 23.0-24.0 sec Â4.72 GBytes Â40.6 Gbits/sec
[ Â3] 24.0-25.0 sec Â4.42 GBytes Â38.0 Gbits/sec
[ Â3] 25.0-26.0 sec Â4.44 GBytes Â38.2 Gbits/sec
[ Â3] 26.0-27.0 sec Â4.18 GBytes Â35.9 Gbits/sec
[ Â3] 27.0-28.0 sec Â4.20 GBytes Â36.1 Gbits/sec
[ Â3] 28.0-29.0 sec Â4.27 GBytes Â36.7 Gbits/sec
[ Â3] 29.0-30.0 sec Â4.16 GBytes Â35.7 Gbits/sec
[ Â3] 30.0-31.0 sec Â4.14 GBytes Â35.6 Gbits/sec
[ Â3] 31.0-32.0 sec Â4.13 GBytes Â35.4 Gbits/sec
[ Â3] 32.0-33.0 sec Â4.16 GBytes Â35.7 Gbits/sec
[ Â3] 33.0-34.0 sec Â4.33 GBytes Â37.2 Gbits/sec
[ Â3] 34.0-35.0 sec Â4.31 GBytes Â37.0 Gbits/sec
[ Â3] 35.0-36.0 sec Â4.26 GBytes Â36.6 Gbits/sec
[ Â3] 36.0-37.0 sec Â4.36 GBytes Â37.5 Gbits/sec
[ Â3] 37.0-38.0 sec Â4.11 GBytes Â35.3 Gbits/sec
[ Â3] 38.0-39.0 sec Â4.00 GBytes Â34.4 Gbits/sec
[ Â3] 39.0-40.0 sec Â4.53 GBytes Â38.9 Gbits/sec
[ Â3] 40.0-41.0 sec Â4.06 GBytes Â34.9 Gbits/sec
[ Â3] 41.0-42.0 sec Â4.17 GBytes Â35.8 Gbits/sec
[ Â3] 42.0-43.0 sec Â4.14 GBytes Â35.6 Gbits/sec
[ Â3] 43.0-44.0 sec Â4.07 GBytes Â34.9 Gbits/sec
^C[ Â3] Â0.0-44.5 sec Â 195 GBytes Â37.5 Gbits/sec
[root@dhcp156-44 ~]# ifconfig
ens3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> Âmtu 1500
Â Â Â Â inet 10.19.156.44 Ânetmask 255.255.248.0 Âbroadcast 10.19.159.255
Â Â Â Â inet6 fe80::d306:561f:9f43:ff77 Âprefixlen 64 Âscopeid 0x20<link>
Â Â Â Â inet6 2620:52:0:1398:9699:325b:25f9:e7bb Âprefixlen 64 Âscopeid 0x0<global>
Â Â Â Â ether 56:cc:c1:01:cc:21 Âtxqueuelen 1000 Â(Ethernet)
Â Â Â Â RX packets 12547 Âbytes 889713 (868.8 KiB)
Â Â Â Â RX errors 11 Âdropped 0 Âoverruns 0 Âframe 11
Â Â Â Â TX packets 373 Âbytes 45723 (44.6 KiB)
Â Â Â Â TX errors 0 Âdropped 0 overruns 0 Âcarrier 0 Âcollisions 0

ens4: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> Âmtu 1500
Â Â Â Â inet 192.168.1.17 Ânetmask 255.255.255.0 Âbroadcast 192.168.1.255
Â Â Â Â inet6 fe80::bc87:86b8:bc86:be4e Âprefixlen 64 Âscopeid 0x20<link>
Â Â Â Â ether 8a:f7:20:29:3b:cb Âtxqueuelen 1000 Â(Ethernet)
Â Â Â Â RX packets 2862498 Âbytes 192898865 (183.9 MiB)
Â Â Â Â RX errors 0 Âdropped 0 Âoverruns 0 Âframe 0
Â Â Â Â TX packets 3414905 Âbytes 209192841687 (194.8 GiB)
Â Â Â Â TX errors 0 Âdropped 0 overruns 0 Âcarrier 0 Âcollisions 0

ens4nsby: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> Âmtu 1500
Â Â Â Â ether 8a:f7:20:29:3b:cb Âtxqueuelen 1000 Â(Ethernet)
Â Â Â Â RX packets 2862498 Âbytes 192898865 (183.9 MiB)
Â Â Â Â RX errors 0 Âdropped 0 Âoverruns 0 Âframe 0
Â Â Â Â TX packets 3414905 Âbytes 212082653599 (197.5 GiB)
Â Â Â Â TX errors 0 Âdropped 0 overruns 0 Âcarrier 0 Âcollisions 0

lo: flags=73<UP,LOOPBACK,RUNNING> Âmtu 65536
Â Â Â Â inet 127.0.0.1 Ânetmask 255.0.0.0
Â Â Â Â inet6 ::1 Âprefixlen 128 Âscopeid 0x10<host>
Â Â Â Â loop Âtxqueuelen 1000 Â(Local Loopback)
Â Â Â Â RX packets 176 Âbytes 19712 (19.2 KiB)
Â Â Â Â RX errors 0 Âdropped 0 Âoverruns 0 Âframe 0
Â Â Â Â TX packets 176 Âbytes 19712 (19.2 KiB)
Â Â Â Â TX errors 0 Âdropped 0 overruns 0 Âcarrier 0 Âcollisions 0

__________________________________________________________________________________________________________________

The command line I used:

/root/qemu/x86_64-softmmu/qemu-system-x86_64 \

-netdev tap,id=hostnet0,script=world_bridge_standalone.sh,downscript=no,ifname=cc17 \

-device e1000,netdev=hostnet0,mac=56:cc:c1:01:cc:21,id=cc17 \

-netdev tap,vhost=on,id=hostnet1,script=test_bridge_standalone.sh,downscript=no,ifname=cc1_72,queues=4 \

-device virtio-net,host_mtu=1500,netdev=hostnet1,mac=8a:f7:20:29:3b:cb,id=cc1_72,vectors=10,mq=on,primary=cc1_71 \

-device vfio-pci,host=65:02.1,id=cc1_71,standby=cc1_72 \

-enable-kvm \

-name netkvm \

-m 3000M \

-drive file=/dev/shm/fedora_29.qcow2,if=ide,id=drivex \

-smp 4 \

-vga qxl \

-spice port=6110,disable-ticketing \

-device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x7 \

-chardev spicevmc,name=vdagent,id=vdagent \

-device virtserialport,nr=1,bus=virtio-serial0.0,chardev=vdagent,name=com.redhat.spice.0 \

-chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 \

-device virtio-serial \

-device virtserialport,chardev=qga0,name=org.qemu.guest_agent.0 \

-monitor stdio

On Fri, Oct 19, 2018 at 6:45 AM Michael S. Tsirkin <mst@redhat.com> wrote:

On Wed, Oct 10, 2018 at 06:26:50PM -0700, Siwei Liu wrote:
> On Fri, Oct 5, 2018 at 12:18 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Thu, Oct 04, 2018 at 05:03:14PM -0700, Siwei Liu wrote:
> > > On Tue, Oct 2, 2018 at 5:43 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Tue, Oct 02, 2018 at 01:42:09AM -0700, Siwei Liu wrote:
> > > > > The VF's MAC can be updated by PF/host on the fly at any time. One can
> > > > > start with a random MAC but use group ID to pair device instead. And
> > > > > only update MAC address to the real one when moving MAC filter around
> > > > > after PV says OK to switch datapath.
> > > > >
> > > > > Do you see any problem with this design?
> > > >
> > > > Isn't this what I proposed:
> > > >Â Â Â Â ÂMaybe we can
> > > >Â Â Â Â Âstart VF with a temporary MAC, then change it to a final one when guest
> > > >Â Â Â Â Âtries to use it. It will work but we run into fact that MACs are
> > > >Â Â Â Â Âcurrently programmed by mgmnt - in many setups qemu does not have the
> > > >Â Â Â Â Ârights to do it.
> > > >
> > > > ?
> > > >
> > > > If yes I don't see a problem with the interface design, even though
> > > > implementation wise it's more work as it will have to include management
> > > > changes.
> > >
> > > I thought we discussed this design a while back:
> > > https://www.spinics.net/lists/netdev/msg512232.html
> > >
> > > ... plug in a VF with a random MAC filter programmed in prior, and
> > > initially use that random MAC within guest. This would require:
> > > a) not relying on permanent MAC address to do pairing during the
> > > initial discovery, e.g. use the failover group ID as in this
> > > discussion
> > > b) host to toggle the MAC address filter: which includes taking down
> > > the tap device to return the MAC back to PF, followed by assigning
> > > that MAC to VF using "ip link ... set vf ..."
> > > c) notify guest to reload/reset VF driver for the change of hardware MAC address
> > > d) until VF reloads the driver it won't be able to use the datapath,
> > > so very short period of network outage is (still) expected
> > >
> > > though I still don't think this design can elimnate downtime.
> >
> >
> > No, my idea is somewhat different. As you say there is a problem
> > of delay at point (c).
> That's true, I never say the downtime can be avoided because of this
> delay in the guest side. But with this the downtime gets to the bare
> minimum and in most situations packets won't be lost on reception as
> long as the PF sets up the filter in timely manner.

It's not really the bare minimum IMHO. E.g. fixing the PF to
defer filter update will give you less downtime.

> > Further, the need to poke at PF filters
> > with set vf does not match the current security model where
> > any security related configuration such as MAC filtering is done upfront.
>
> The security model belongs to the VM policy not the VF, right? I think
> same MAC address will always be used on the VM as it starts with
> virtio. Why it is a security issue that VF starts with an unused MAC
> before it's able to be used in the guest?

Basically if guest is able to trigger MAC changes,
it might be able to exploit some bug to escalate that to
full network access. Completely blocking configuration
changes after setup feels safer.

Case in point, with QEMU a typical selinux policy will block
attempts to change MACs, that task will have to be
delegated to a suitably priveledged tool.

>
> >
> >
> > So I have two suggestions:
> >
> > 1. Teach pf driver not to program the filter until vf driver actually goes up.
> >
> >Â Â How do we know it went up? For example, it is highly likely
> >Â Â that driver will send some kind of command on init.
> >Â Â E.g. linux seems to always try to set the mac address during init.
> >Â Â We can have any kind of command received by the PF enable
> >Â Â the filter, until reset.
>
> I'm not sure it's a valid assumption for any guest, say Windows. The
> VF can start with the MAC address advertised from PF in the first
> reset, and the MAC filter generally will be activated at that point.
> Some other PF/VF variants enable the filter after that until the VF is
> brought up in guest, while some others enable the filter even before
> the VF gets assigned to guest. Trying to assume the behaviour on
> specific guest or specific NIC device is a slippery slope.

Is all this just theoretical or do you observe any problems in practice?

> The only
> thing that's reliable is the semantics of ndo_vf_xxx interface for the
> PF.

ndo_vf_xxx is an internal Linux interface. That's not guaranteed to be
stable at all. I think you mean the netlink interface that triggers
that. That should be stable but if what you say above is true isn't
fully defined.

> You seem to overly assume too much on the specific PF behaviour
> which is not defined in the interface itself.

So IMHO it's something that we should fix in Linux,
making all devices behave consistently.

> >
> >Â Â In absence of an appropriate command, QEMU can detect bus master
> >Â Â enable and do that.
> >
> > 2. Create a variant of trusted VF where it starts out without a valid
> >Â Â MAC, guest can set a softmac MAC but only can set it to the specific
> >Â Â value that matches virtio.
> >Â Â Alternatively - if it's preferred for some reason - allow
> >Â Â guest to program just two MACs, the original one and the virtio one.
> >Â Â Any other value is denied.
>
> I am getting confused, I don't know why that's even needed. The
> management tool can set any predefined MAC that is deemed safe for VF
> to start with. Why it needs to be that complicated? What is the
> purpose of another model for trusted VF and softmac? It's the PF that
> changes the MAC not the VF.

This will give us a simple solution without guest driver changes for
when VF is trusted. In particular it will work e.g. for PFs as well.

> >
> >
> >
> > > However,
> > > it looks like as of today the MAC matching still haven't addressed the
> > > datapath switching and error handling in a clean way. As said, for
> > > SR-IOV live migration on iSCSI root disk there will be a lot of
> > > dancing parts going along the way, reliable network connectity and
> > > dedicated handshakes are critical to this kind of setup.
> > >
> > > -Siwei
> >
> > I think MAC matching removes downtime when device is removed but not
> > when it's re-added, yes. It has the advantage of an already present
> > linux driver support, but if you are prepared to work on
> > adding e.g. bridge based matching, that will go away.
>
> The removal order and consequence will be the same between MAC
> matching and group ID based matching. It's just the initial discovery
> that's slightly different. Why do you think the downtime will be
> different for the removal scenario? And why do you think it's needed
> to alter the current PF driver behavior to support bridge based
> matching? Sorry I'm really confused about your suggestion. Those PF
> driver model changes are not needed acutally. The fact is that the
> bridge based matching is supposed to work quite well for any PF driver
> implementation no matter when the MAC address filters gets added or
> enabled.
>
> Thanks,
> -Siwei

It seems that it requires a bunch of changes for all VF drivers
though.

>
> >
> >
> > > >
> > > > --
> > > > MST
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > > >

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org

virtio-dev message