[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: [Proposal] Relationship between XDP and rx-csum in virtio-net
Currently, the VIRTIO_NET_F_GUEST_CSUM(NETIF_F_RXCSUM) feature of the virtio-net driver conflicts with the loading of the XDP program, which is caused by the problem described in [1][2], that is, XDP may cause errors in partial csumed-related fields and resulting in packet dropping. rx CHECKSUM_PARTIAL mainly exists in the virtualized environment, and its purpose is to save computing resource overhead. The *goal* of this proposal is to enable the coexistence of XDP and VIRTIO_NET_F_GUEST_CSUM. 1. We need to understand why the device driver receives the rx CHECKSUM_PARTIAL packet. Drivers related to the virtualized environment, such as virtio-net/veth/loopback, etc., may receive partial csumed packets. When the tx device finds that the destination rx device of the packet is located on the same host, it is clear that the packet may not pass through the physical link, so the tx device sends the packet with csum_{start, offset} directly to the rx side to save computational resources without computing a fully csum (depends on the specific implementation, some virtio-net backend devices are known to behave like this currently). From [3], the stack trusts such packets. However, veth still has NETIF_F_RXCSUM turned on when loading XDP. This may cause packet dropping as [1][2] stated. But currently the veth community does not seem to have reported such problems, can we guess that the coexistence of XDP and rx CHECKSUM_PARTIAL has less negative impact? 2. About rx CHECKSUM_UNECESSARY: We have just seen that in a virtualized environment a packet may flow between the same host, so not computing the complete csum for the packet saves some cpu resources. The purpose of the checksum is to verify that packets passing through the physical link are correct. Of course, it is also reasonable to do a fully csum for packets of the virtualized environment, which is exactly what we need. rx CHECKSUM_UNECESSARY indicates that the packet has been fully checked, that is, it is a credible packet. If such a packet is modified by the XDP program, the user should recalculate the correct checksum using bpf_csum_diff() and bpf_{l3,l4}_csum_replace(). Therefore, for those drivers(physical nic drivers?), such as atlantic/bnxt/mlx, etc., XDP and NETIF_F_RXCSUM coexist, because their packets will be fully checked at the tx side. AWS's ena driver is also designed to be in this fully checksum mode (we also mentioned below that a feature bit can be provided for virtio-net, telling the sender that a fully checksum must be calculated to implement similar behavior to other drivers), although it is in a virtualized environment. 3. To sum up: It seems that only virtio-net sets XDP and VIRTIO_NET_F_GUEST_CSUM as mutually exclusive, which may cause the following problems: When XDP loads, 1) For packets that are fully checked by the sender, packets are marked as CHECKSUM_UNECESSARY by the rx csum hw offloading. virtio-net driver needs additional CPU resources to compute the checksum for any packet. When testing with the following command in Aliyun ECS: qperf dst_ip -lp 8989 -m 64K -t 20 tcp_bw (mtu = 1500, dev layer GRO is on) The csum-related overhead we tested on X86 is 11.7%, and on ARM is 15.8%. 2) One of the main functions of the XDP prog is to be used as a monitoring and firewall, etc., which means that the XDP prog may not modify the packet. This is applicable to both rx CHECKSUM_PARTIAL and rx CHECKSUM_UNECESSARY, but we ignore the rx csum hw offloading capability for both cases. 4. How we try to solve: 1) Add a feature bit to the virtio specification to tell the sender that a fully csumed packet must be sent. Then XDP can coexist with VIRTIO_NET_F_GUEST_CSUM when this feature bit is negotiated. (similar to ENA behavior) 2) Modify the current virtio-net driver No longer filter the VIRTIO_NET_F_GUEST_CSUM feature in virtnet_xdp_set(). Then we can immediately get the ability from VIRTIO_NET_F_GUEST_CSUM and enjoy the software CPU resources saved by rx csum hw offloading. (This method is a bit rude) 5. Ending This is a proposal and does not represent a formal solution. Looking forward to feedback from the community and exploring a possible/common solution to the problem described in this proposal. 6. Quote [1] 18ba58e1c234 virtio-net: fail XDP set if guest csum is negotiated We don't support partial csumed packet since its metadata will be lost or incorrect during XDP processing. So fail the XDP set if guest_csum feature is negotiated. [2] e59ff2c49ae1 virtio-net: disable guest csum during XDP set We don't disable VIRTIO_NET_F_GUEST_CSUM if XDP was set. This means we can receive partial csumed packets with metadata kept in the vnet_hdr. This may have several side effects: - It could be overridden by header adjustment, thus is might be not correct after XDP processing. - There's no way to pass such metadata information through XDP_REDIRECT to another driver. - XDP does not support checksum offload right now. So simply disable guest csum if possible in this the case of XDP. [3] static inline int skb_csum_unnecessary(const struct sk_buff *skb) { return ((skb->ip_summed == CHECKSUM_UNNECESSARY) || skb->csum_valid || (skb->ip_summed == CHECKSUM_PARTIAL && skb_checksum_start_offset(skb) >= 0)); } Thanks a lot! -- 2.19.1.6.gb485710b
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]