OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [PATCH v5] virtio-net: device does not deliver partially checksummed packet and may validate the checksum


On Mon, Dec 11, 2023 at 05:11:59PM +0800, Heng Qi wrote:
> virtio-net works in a virtualized system and is somewhat different from
> physical nics. One of the differences is that to save virtio device
> resources, rx may receive partially checksummed packets. However, XDP may
> cause partially checksummed packets to be dropped.
> So XDP loading currently conflicts with the feature VIRTIO_NET_F_GUEST_CSUM.
> 
> This patch lets the device to supply fully checksummed packets to the driver.
> Then XDP can coexist with VIRTIO_NET_F_GUEST_CSUM to enjoy the benefits of
> device validation checksum.
> 
> In addition, implementation of some performant devices always do not generate
> partially checksummed packets, but the standard driver still need to clear
> VIRTIO_NET_F_GUEST_CSUM when XDP is there.


> 
> A new feature VIRTIO_NET_F_GUEST_FULLY_CSUM is added to solve the above
> situation, which provides the driver with configurable offload.
> If the offload is enabled, then the device must deliver fully
> checksummed packets to the driver and may validate the checksum.
> 
> Use case example:
> If VIRTIO_NET_F_GUEST_FULLY_CSUM is negotiated and the offload is enabled,
> after XDP processes a fully checksummed packet, the VIRTIO_NET_HDR_F_DATA_VALID bit
> is retained if the device has validated its checksum, resulting in the guest
> not needing to validate the checksum again. This is useful for guests:
>   1. Bring the driver advantages such as cpu savings.
>   2. For devices that do not generate partially checksummed packets themselves,
>      XDP can be loaded in the driver without modifying the hardware behavior.
> 
> Several solutions have been discussed in the previous proposal[1].
> After historical discussion, we have tried the method proposed by Jason[2],
> but some complex scenarios and challenges are difficult to deal with.
> We now return to the method suggested in [1].
> 
> [1] https://lists.oasis-open.org/archives/virtio-dev/202305/msg00291.html
> [2] https://lore.kernel.org/all/20230628030506.2213-1-hengqi@linux.alibaba.com/
> 
> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
> v4->v5:
> - Remove the modification to the GUEST_CSUM.
> - The description of this feature has been reorganized for greater clarity.
> 
> v3->v4:
> - Streamline some repetitive descriptions. @Jason
> - Add how features should work, when to be enabled, and overhead. @Jason @Michael
> 
> v2->v3:
> - Add a section named "Driver Handles Fully Checksummed Packets"
>   and more descriptions. @Michael
> 
> v1->v2:
> - Modify full checksum functionality as a configurable offload
>   that is initially turned off. @Jason
> 
>  device-types/net/description.tex        | 74 +++++++++++++++++++++++--
>  device-types/net/device-conformance.tex |  1 +
>  device-types/net/driver-conformance.tex |  1 +
>  introduction.tex                        |  3 +
>  4 files changed, 73 insertions(+), 6 deletions(-)
> 
> diff --git a/device-types/net/description.tex b/device-types/net/description.tex
> index aff5e08..ab6c13d 100644
> --- a/device-types/net/description.tex
> +++ b/device-types/net/description.tex
> @@ -122,6 +122,9 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits
>      device with the same MAC address.
>  
>  \item[VIRTIO_NET_F_SPEED_DUPLEX(63)] Device reports speed and duplex.
> +
> +\item[VIRTIO_NET_F_GUEST_FULLY_CSUM (64)] Device delivers fully checksummed packets
> +    to the driver and may validate the checksum.
>  \end{description}

I propose
VIRTIO_NET_F_GUEST_CSUM_COMPLETE
instead.


>  \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device / Feature bits / Feature bit requirements}
> @@ -136,6 +139,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device
>  \item[VIRTIO_NET_F_GUEST_UFO] Requires VIRTIO_NET_F_GUEST_CSUM.
>  \item[VIRTIO_NET_F_GUEST_USO4] Requires VIRTIO_NET_F_GUEST_CSUM.
>  \item[VIRTIO_NET_F_GUEST_USO6] Requires VIRTIO_NET_F_GUEST_CSUM.
> +\item[VIRTIO_NET_F_GUEST_FULLY_CSUM] Requires VIRTIO_NET_F_GUEST_CSUM and VIRTIO_NET_F_CTRL_GUEST_OFFLOADS.




>  \item[VIRTIO_NET_F_HOST_TSO4] Requires VIRTIO_NET_F_CSUM.
>  \item[VIRTIO_NET_F_HOST_TSO6] Requires VIRTIO_NET_F_CSUM.
> @@ -398,6 +402,58 @@ \subsection{Device Initialization}\label{sec:Device Types / Network Device / Dev
>  A truly minimal driver would only accept VIRTIO_NET_F_MAC and ignore
>  everything else.
>  
> +\subsubsection{Device Delivers Fully Checksummed Packets}\label{sec:Device Types / Network Device / Device Initialization / Device Delivers Fully Checksummed Packets}
> +
> +If the feature VIRTIO_NET_F_GUEST_FULLY_CSUM is negotiated, the driver can
> +benefit from the device's ability to calculate and validate the checksum.
> +
> +If the feature VIRTIO_NET_F_GUEST_FULLY_CSUM is negotiated,
> +the device behaves as follows:
> +\begin{itemize}
> +  \item The device delivers a fully checksummed packet to the driver rather than a partially checksummed packet.

where does "partially checksummed packet" come from?
I think it comes from:

   The VIRTIO_NET_F_GUEST_CSUM feature indicates that partially
  checksummed packets can be received, and if it can do that then
  the VIRTIO_NET_F_GUEST_TSO4, VIRTIO_NET_F_GUEST_TSO6, 
  VIRTIO_NET_F_GUEST_UFO, VIRTIO_NET_F_GUEST_ECN, VIRTIO_NET_F_GUEST_USO4
  and VIRTIO_NET_F_GUEST_USO6 are the input equivalents of the features described above.
  See \ref{sec:Device Types / Network Device / Device Operation /


so that one needs to be updated too.


> +Partially checksummed packets come from TCP/UDP protocols \ref{devicenormative:Device Types / Network Device / Device Operation / Processing of Packets}.
> +  \item The device may validate the packet checksum before delivering it.
> +If the packet checksum has been verified, the VIRTIO_NET_HDR_F_DATA_VALID bit
> +in \field{flags} is set: in case of multiple encapsulated protocols, one
> +level of checksums has been validated (Just like VIRTIO_NET_F_GUEST_CSUM does.).
> +  \item The device can not set the VIRTIO_NET_HDR_F_NEEDS_CSUM bit in \field{flags}.
> +\end{itemize}
> +
> +Note that packet types that the driver or device can recognize and the device
> +may verify will not change due to the additional negotiated VIRTIO_NET_F_GUEST_FULLY_CSUM.
> +These remain consistent with VIRTIO_NET_F_GUEST_CSUM.

This part is confusing. "change" and "remain" makes no sense for someone reading
the spec text as opposed to reviewing the patch.
also it does not matter whether VIRTIO_NET_F_GUEST_FULLY_CSUM
is negotiated right? it only matters whether it is enabled.


> +Specific transport protocols that may have VIRTIO_NET_HDR_F_DATA_VALID set
> +in \field{flags} include TCP, UDP, GRE (Generic Routing Encapsulation),
> +and SCTP (Stream Control Transmission Protocol).
> +A fully checksummed packet's checksum field for each of the above protocols
> +is set to a calculated value that covers the transport header and payload
> +(TCP or UDP involves the additional pseudo header) of the packet.
> +
> +Delivering fully checksummed packets rather than partially
> +checksummed packets incurs additional overhead for the device.
> +The overhead varies from device to device, for example the overhead of
> +calculating and validating the packet checksum is a few microseconds
> +for a hardware device.

wow really is that standard? There are devices that deliver the whole
packet in a few microseconds. Maybe "for some hardware devices"?

> +
> +The feature VIRTIO_NET_F_GUEST_FULLY_CSUM has a corresponding offload \ref{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Offloads State Configuration},
> +which when enabled means that the device delivers fully checksummed packets
> +to the driver and may validate the checksum.
> +The offload is disabled by default.

This is unusual, unlike any other offload. So needs to be stressed
more.  And what does "default" mean here?
E.g. "Note: unlike other offloads, this offloads is disabled
even after VIRTIO_NET_F_GUEST_FULLY_CSUM has been negotiation.
The offload has to be enabled ... "


> +
> +The driver can enable the offload by sending the
> +VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET command with the
> +VIRTIO_NET_F_GUEST_FULLY_CSUM bit set when, for example,
> +eXpress Data Path (XDP) \hyperref[intro:xdp]{[XDP]} is functioning.

It is not worth adding a spec link just to provide an example.
If you really want to provide it:
"eXpress Data Path (XDP) in Linux is active".

But this is the problem this patch does not solve in my opinion.
A device might actually provide a full checksum 
at negligeable extra cost and driver will still keep it off by default.
So it slows device down - when does it make sense to enable this feature? 
Just giving an example of XDP is not sufficient.



> +
> +\drivernormative{\subsubsection}{Device Delivers Fully Checksummed Packets}{sec:Device Types / Network Device / Device Initialization / Device Delivers Fully Checksummed Packets}
> +
> +The driver MUST NOT enable the offload for which VIRTIO_NET_F_GUEST_FULLY_CSUM has not been negotiated.

what does "the offload for which" mean here?
and how is this special for VIRTIO_NET_F_GUEST_FULLY_CSUM?

> +\devicenormative{\subsubsection}{Device Delivers Fully Checksummed Packets}{sec:Device Types / Network Device / Device Initialization / Device Delivers Fully Checksummed Packets}
> +
> +Upon the device reset, the device MUST disable the offload.
> +

reset has nothing to do with it I think. it's about feature negotiation.


>  \subsection{Device Operation}\label{sec:Device Types / Network Device / Device Operation}
>  
>  Packets are transmitted by placing them in the
> @@ -723,7 +779,8 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
>    \field{num_buffers} is one, then the entire packet will be
>    contained within this buffer, immediately following the struct
>    virtio_net_hdr.
> -\item If the VIRTIO_NET_F_GUEST_CSUM feature was negotiated, the
> +\item If the VIRTIO_NET_F_GUEST_CSUM feature (regardless of whether
> +  VIRTIO_NET_F_GUEST_FULLY_CSUM was negotiated) was negotiated, the
>    VIRTIO_NET_HDR_F_DATA_VALID bit in \field{flags} can be
>    set: if so, device has validated the packet checksum.
>    In case of multiple encapsulated protocols, one level of checksums
> @@ -747,7 +804,8 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
>    number of coalesced TCP segments in \field{csum_start} field and
>    number of duplicated ACK segments in \field{csum_offset} field
>    and sets bit VIRTIO_NET_HDR_F_RSC_INFO in \field{flags}.
> -\item If the VIRTIO_NET_F_GUEST_CSUM feature was negotiated, the
> +\item If the VIRTIO_NET_F_GUEST_CSUM feature was negotiated but the
> +  VIRTIO_NET_F_GUEST_FULLY_CSUM feature was not negotiated, the
>    VIRTIO_NET_HDR_F_NEEDS_CSUM bit in \field{flags} can be
>    set: if so, the packet checksum at offset \field{csum_offset}
>    from \field{csum_start} and any preceding checksums
> @@ -805,8 +863,9 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
>  device MUST set the VIRTIO_NET_HDR_GSO_ECN bit in
>  \field{gso_type}.
>  
> -If the VIRTIO_NET_F_GUEST_CSUM feature has been negotiated, the
> -device MAY set the VIRTIO_NET_HDR_F_NEEDS_CSUM bit in
> +If the VIRTIO_NET_F_GUEST_CSUM feature has been negotiated but
> +the VIRTIO_NET_F_GUEST_FULLY_CSUM feature has not been negotiated,
> +the device MAY set the VIRTIO_NET_HDR_F_NEEDS_CSUM bit in
>  \field{flags}, if so:
>  \begin{enumerate}
>  \item the device MUST validate the packet checksum at
> @@ -826,7 +885,8 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
>  been negotiated, the device MUST set \field{gso_type} to
>  VIRTIO_NET_HDR_GSO_NONE.
>  
> -If \field{gso_type} differs from VIRTIO_NET_HDR_GSO_NONE, then
> +If the VIRTIO_NET_F_GUEST_FULLY_CSUM feature has not been negotiated and
> +\field{gso_type} differs from VIRTIO_NET_HDR_GSO_NONE, then
>  the device MUST also set the VIRTIO_NET_HDR_F_NEEDS_CSUM bit in
>  \field{flags} MUST set \field{gso_size} to indicate the desired MSS.
>  If VIRTIO_NET_F_RSC_EXT was negotiated, the device MUST also
> @@ -842,7 +902,8 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
>  not less than the length of the headers, including the transport
>  header.
>  
> -If the VIRTIO_NET_F_GUEST_CSUM feature has been negotiated, the
> +If the VIRTIO_NET_F_GUEST_CSUM feature (regardless of whether
> +VIRTIO_NET_F_GUEST_FULLY_CSUM has been negotiated) has been negotiated, the
>  device MAY set the VIRTIO_NET_HDR_F_DATA_VALID bit in
>  \field{flags}, if so, the device MUST validate the packet
>  checksum (in case of multiple encapsulated protocols, one level
> @@ -1633,6 +1694,7 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
>  #define VIRTIO_NET_F_GUEST_UFO        10
>  #define VIRTIO_NET_F_GUEST_USO4       54
>  #define VIRTIO_NET_F_GUEST_USO6       55
> +#define VIRTIO_NET_F_GUEST_FULLY_CSUM 64
>  
>  #define VIRTIO_NET_CTRL_GUEST_OFFLOADS       5
>   #define VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET   0
> diff --git a/device-types/net/device-conformance.tex b/device-types/net/device-conformance.tex
> index 52526e4..43b3921 100644
> --- a/device-types/net/device-conformance.tex
> +++ b/device-types/net/device-conformance.tex
> @@ -16,4 +16,5 @@
>  \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
>  \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Inner Header Hash}
>  \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Device Statistics}
> +\item \ref{devicenormative:Device Types / Network Device / Device Initialization / Device Delivers Fully Checksummed Packets}
>  \end{itemize}
> diff --git a/device-types/net/driver-conformance.tex b/device-types/net/driver-conformance.tex
> index c693c4f..c9b6d1b 100644
> --- a/device-types/net/driver-conformance.tex
> +++ b/device-types/net/driver-conformance.tex
> @@ -16,4 +16,5 @@
>  \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
>  \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Inner Header Hash}
>  \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Device Statistics}
> +\item \ref{drivernormative:Device Types / Network Device / Device Initialization / Device Delivers Fully Checksummed Packets}
>  \end{itemize}
> diff --git a/introduction.tex b/introduction.tex
> index cfa6633..fc99597 100644
> --- a/introduction.tex
> +++ b/introduction.tex
> @@ -145,6 +145,9 @@ \section{Normative References}\label{sec:Normative References}
>      Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP
>      14, RFC 8174, DOI 10.17487/RFC8174, May 2017
>          \newline\url{http://www.ietf.org/rfc/rfc8174.txt}\\
> +	\phantomsection\label{intro:xdp}\textbf{[XDP]} &
> +    eXpress Data Path(XDP) provides a high performance, programmable network data path in the Linux kernel.
> +	\newline\url{https://prototype-kernel.readthedocs.io/en/latest/networking/XDP/}\\
>  \end{longtable}
>  
>  \section{Non-Normative References}
> -- 
> 2.19.1.6.gb485710b



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]