OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-dev message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: [Proposal] Relationship between XDP and rx-csum in virtio-net


Currently, the VIRTIO_NET_F_GUEST_CSUM(NETIF_F_RXCSUM) feature of the virtio-net
driver conflicts with the loading of the XDP program, which is caused by the
problem described in [1][2], that is, XDP may cause errors in partial csumed-related
fields and resulting in packet dropping. rx CHECKSUM_PARTIAL mainly exists in the
virtualized environment, and its purpose is to save computing resource overhead.

The *goal* of this proposal is to enable the coexistence of XDP and VIRTIO_NET_F_GUEST_CSUM.

1. We need to understand why the device driver receives the rx CHECKSUM_PARTIAL packet.

Drivers related to the virtualized environment, such as virtio-net/veth/loopback,
etc., may receive partial csumed packets.

When the tx device finds that the destination rx device of the packet is
located on the same host, it is clear that the packet may not pass through
the physical link, so the tx device sends the packet with csum_{start, offset}
directly to the rx side to save computational resources without computing a fully csum
(depends on the specific implementation, some virtio-net backend devices are known to
behave like this currently). From [3], the stack trusts such packets.

However, veth still has NETIF_F_RXCSUM turned on when loading XDP. This may cause
packet dropping as [1][2] stated. But currently the veth community does not seem to
have reported such problems, can we guess that the coexistence of XDP and
rx CHECKSUM_PARTIAL has less negative impact?

2. About rx CHECKSUM_UNECESSARY:

We have just seen that in a virtualized environment a packet may flow between the
same host, so not computing the complete csum for the packet saves some cpu resources.

The purpose of the checksum is to verify that packets passing through the
physical link are correct. Of course, it is also reasonable to do a fully csum for
packets of the virtualized environment, which is exactly what we need.

rx CHECKSUM_UNECESSARY indicates that the packet has been fully checked,
that is, it is a credible packet. If such a packet is modified by the XDP program,
the user should recalculate the correct checksum using bpf_csum_diff() and
bpf_{l3,l4}_csum_replace().

Therefore, for those drivers(physical nic drivers?), such as atlantic/bnxt/mlx,
etc., XDP and NETIF_F_RXCSUM coexist, because their packets will be fully checked
at the tx side.

AWS's ena driver is also designed to be in this fully checksum mode
(we also mentioned below that a feature bit can be provided for virtio-net,
telling the sender that a fully checksum must be calculated to implement similar
behavior to other drivers), although it is in a virtualized environment.

3. To sum up:

It seems that only virtio-net sets XDP and VIRTIO_NET_F_GUEST_CSUM as mutually
exclusive, which may cause the following problems:

When XDP loads,

1) For packets that are fully checked by the sender, packets are marked as CHECKSUM_UNECESSARY
by the rx csum hw offloading.

virtio-net driver needs additional CPU resources to compute the checksum for any packet.

When testing with the following command in Aliyun ECS:
    qperf dst_ip -lp 8989 -m 64K -t 20 tcp_bw
    (mtu = 1500, dev layer GRO is on)

The csum-related overhead we tested on X86 is 11.7%, and on ARM is 15.8%.

2)
One of the main functions of the XDP prog is to be used as a monitoring and
firewall, etc., which means that the XDP prog may not modify the packet.
This is applicable to both rx CHECKSUM_PARTIAL and rx CHECKSUM_UNECESSARY,
but we ignore the rx csum hw offloading capability for both cases.

4. How we try to solve:

1) Add a feature bit to the virtio specification to tell the sender that a fully
csumed packet must be sent. Then XDP can coexist with VIRTIO_NET_F_GUEST_CSUM when this
feature bit is negotiated. (similar to ENA behavior)

2) Modify the current virtio-net driver

No longer filter the VIRTIO_NET_F_GUEST_CSUM feature in virtnet_xdp_set().
Then we can immediately get the ability from VIRTIO_NET_F_GUEST_CSUM and enjoy the software
CPU resources saved by rx csum hw offloading.
(This method is a bit rude)

5. Ending 

This is a proposal and does not represent a formal solution. Looking forward to feedback
from the community and exploring a possible/common solution to the problem described in
this proposal.

6. Quote

[1] 18ba58e1c234

    virtio-net: fail XDP set if guest csum is negotiated

    We don't support partial csumed packet since its metadata will be lost
    or incorrect during XDP processing. So fail the XDP set if guest_csum
    feature is negotiated.

[2] e59ff2c49ae1

    virtio-net: disable guest csum during XDP set

    We don't disable VIRTIO_NET_F_GUEST_CSUM if XDP was set. This means we
    can receive partial csumed packets with metadata kept in the
    vnet_hdr. This may have several side effects:

    - It could be overridden by header adjustment, thus is might be not
      correct after XDP processing.
    - There's no way to pass such metadata information through
      XDP_REDIRECT to another driver.
    - XDP does not support checksum offload right now.

    So simply disable guest csum if possible in this the case of XDP.

[3] static inline int skb_csum_unnecessary(const struct sk_buff *skb)
    {
        return ((skb->ip_summed == CHECKSUM_UNNECESSARY) ||
            skb->csum_valid ||
            (skb->ip_summed == CHECKSUM_PARTIAL &&
            skb_checksum_start_offset(skb) >= 0));
    }


Thanks a lot!
-- 
2.19.1.6.gb485710b



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]