OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-dev message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [PATCH v4] virtio_net: support split header


On Wed, 22 Jun 2022 02:08:19 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Wed, Jun 22, 2022 at 11:52:30AM +0800, Jason Wang wrote:
> >
> > å 2022/6/21 18:54, Michael S. Tsirkin åé:
> > > On Mon, Jun 20, 2022 at 11:13:47AM +0800, Xuan Zhuo wrote:
> > > > On Fri, 17 Jun 2022 05:58:11 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > On Fri, Jun 17, 2022 at 03:16:01PM +0800, Xuan Zhuo wrote:
> > > > > > The purpose of this feature is to split the header and the payload of
> > > > > > the packet.
> > > > > >
> > > > > > |                    receive buffer                                    |
> > > > > > |                       0th descriptor             | 1th descriptor    |
> > > > > > | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->|   payload         |
> > > > > >
> > > > > > We can use a buffer plus a separate page when allocating the receive
> > > > > > buffer. In this way, we can ensure that all payloads can be
> > > > > > independently in a page, which is very beneficial for the zerocopy
> > > > > > implemented by the upper layer.
> > > > > >
> > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > Reviewed-by: Heng Qi <hengqi@linux.alibaba.com>
> > > > > > Reviewed-by: Kangjie Xu <kangjie.xu@linux.alibaba.com>
> > >
> > > Could you describe the zerocopy in more detail? I'd like to understand
> > > e.g. whether it mixes with mergeable buffers.
> > >
> > > I actually have an idea. Driver could specify an offset from start of buffer
> > > at which payload is written. This works with mergeable and without,
> >
> >
> > I don't see any difference compared to the current proposal. The only
> > difference is whether or not we should explicit tell the device about the
> > offset.
>
> The current proposal seems to rely on descriptor framing,
> this is using an offset instead.

Yes, we can use offset instead, and don't use descriptor in spec. But in
implementation, we must use multiple descriptors to combine hdr buf and payload
buf(page).

Thanks.

>
> >
> > > and if header is > offset then flag in packet will simply not be set.
> > > And I think we will want another knob to specify an offset for packets
> > > which can't be split.
> > >
> > > What do you think?
> > >
> > >
> > >
> > > > > > ---
> > > > > > v4:
> > > > > >      1. fix typo @Cornelia Huck @Jason Wang
> > > > > >      2. do not split header for IP fragmentation packet. @Jason Wang
> > > > > >
> > > > > > v3:
> > > > > >      1. Fix some syntax issues
> > > > > >      2. Fix some terminology issues
> > > > > >      3. It is not unified with ip alignment, so ip alignment is not included
> > > > > >      4. Make it clear that the device must support four types, in the case of
> > > > > >      successful negotiation.
> > > > > >
> > > > > >   conformance.tex |  2 ++
> > > > > >   content.tex     | 88 +++++++++++++++++++++++++++++++++++++++++++++++++
> > > > > >   2 files changed, 90 insertions(+)
> > > > > >
> > > > > > diff --git a/conformance.tex b/conformance.tex
> > > > > > index 663e7c3..6f561fb 100644
> > > > > > --- a/conformance.tex
> > > > > > +++ b/conformance.tex
> > > > > > @@ -149,6 +149,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
> > > > > >   \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode}
> > > > > >   \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Offloads State Configuration / Setting Offloads State}
> > > > > >   \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) }
> > > > > > +\item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Header}
> > > > > >   \end{itemize}
> > > > > >
> > > > > >   \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / Driver Conformance / Block Driver Conformance}
> > > > > > @@ -411,6 +412,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
> > > > > >   \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Gratuitous Packet Sending}
> > > > > >   \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode}
> > > > > >   \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) / RSS processing}
> > > > > > +\item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Header}
> > > > > >   \end{itemize}
> > > > > >
> > > > > >   \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / Device Conformance / Block Device Conformance}
> > > > > > diff --git a/content.tex b/content.tex
> > > > > > index 7508dd1..78e0ce8 100644
> > > > > > --- a/content.tex
> > > > > > +++ b/content.tex
> > > > > > @@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits
> > > > > >   \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
> > > > > >       channel.
> > > > > >
> > > > > > +\item[VIRTIO_NET_F_SPLIT_HEADER (55)] Device supports splitting the protocol
> > > > > > +    header and the payload.
> > > > > > +
> > > > >
> > > > > What is split exactly? The name makes it sound like header is split,
> > > > > and you repeat this everywhere. I suspect it's actually something like
> > > > > protocol split.
> > > >
> > > > This name comes from other network cards, if you think other names are better, I
> > > > also feel ok.
> > > So ethtool calls it "TCP header/data split" and the macro is
> > > ETHTOOL_TCP_DATA_SPLIT.
> >
> >
> > Just FYI, e810 called it "header split" and it support split at various
> > layers (l2/l3/l4 or even inner header).
> >
> >
> > >
> > > > > So something like "supports TCP/UDP protocol split
> > > > > - writing out protocol payload
> > > > > and header parts of incoming packets into separate buffers"?
> > > > I don't want to stress tcp/udp here I want to leave room for other protocols in
> > > > the future.
> > > Then we'll add more feature flags (we'll need to, anyway).
> > > Separating TCP and UDP is a good idea, e.g. ethtool only seems
> > > to support TCP ...
> > >
> > > > > I propose we just add 4 bits:
> > > > >
> > > > > VIRTIO_NET_F_SPLIT_TCP4
> > > > > VIRTIO_NET_F_SPLIT_TCP6
> > > > > VIRTIO_NET_F_SPLIT_UDP4
> > > > > VIRTIO_NET_F_SPLIT_UDP6
> > > > >
> > > > > Accordingly below we say everywhere:
> > > > > 	if one of
> > > > > VIRTIO_NET_F_SPLIT_TCP4,VIRTIO_NET_F_SPLIT_TCP6,VIRTIO_NET_F_SPLIT_UDP4,VIRTIO_NET_F_SPLIT_UDP6
> > > > >
> > > > >
> > > > > >   \item[VIRTIO_NET_F_HOST_USO (56)] Device can receive USO packets. Unlike UFO
> > > > > >    (fragmenting the packet) the USO splits large UDP packet
> > > > > >    to several segments when each of these smaller packets has UDP header.
> > > > > > @@ -3131,6 +3134,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device
> > > > > >   \item[VIRTIO_NET_F_CTRL_MAC_ADDR] Requires VIRTIO_NET_F_CTRL_VQ.
> > > > > >   \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6.
> > > > > >   \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
> > > > > > +\item[VIRTIO_NET_F_SPLIT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ.
> > > > > >   \end{description}
> > > > > >
> > > > > >   \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits}
> > > > > > @@ -3362,6 +3366,7 @@ \subsection{Device Operation}\label{sec:Device Types / Network Device / Device O
> > > > > >   #define VIRTIO_NET_HDR_F_NEEDS_CSUM    1
> > > > > >   #define VIRTIO_NET_HDR_F_DATA_VALID    2
> > > > > >   #define VIRTIO_NET_HDR_F_RSC_INFO      4
> > > > > > +#define VIRTIO_NET_HDR_F_SPLIT_HEADER  8
> > > > > >           u8 flags;
> > > > > >   #define VIRTIO_NET_HDR_GSO_NONE        0
> > > > > >   #define VIRTIO_NET_HDR_GSO_TCPV4       1
> > > > > > @@ -4463,6 +4468,89 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
> > > > > >   according to the native endian of the guest rather than
> > > > > >   (necessarily when not using the legacy interface) little-endian.
> > > > > >
> > > > > > +\paragraph{Split Header}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Header}
> > > > > > +
> > > > > > +If the VIRTIO_NET_F_SPLIT_HEADER feature is negotiated,
> > > > > > +the device supports splitting the protocol header and the payload.
> > > > > > +The protocol header and the payload will be separated into different
> > > > > > +descriptors.
> > > > > Descriptors or buffers? I think for mergeable buffers you want
> > > > > buffers. For non mergeable trickier. Maybe just make these
> > > > > flags depend on mergeable?
> > > > >
> > > > >
> > > > > > +
> > > > > > +\subparagraph{Split Header}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Header / Setting Split Header}
> > > > > > +
> > > > > > +To configure the split header, the following layout structure and definitions
> > > > > > +are used:
> > > > > > +
> > > > > > +\begin{lstlisting}
> > > > > > +struct virtio_net_split_header_config {
> > > > > > +#define VIRTIO_NET_SPLIT_HEADER_TYPE_TCP4     (1 << 0)
> > > > > > +#define VIRTIO_NET_SPLIT_HEADER_TYPE_TCP6     (1 << 1)
> > > > > > +#define VIRTIO_NET_SPLIT_HEADER_TYPE_UDP4     (1 << 2)
> > > > > > +#define VIRTIO_NET_SPLIT_HEADER_TYPE_UDP6     (1 << 3)
> > > > > > +    le64 type;
> > > > > > +};
> > > > > > +
> > > > > > +#define VIRTIO_NET_CTRL_SPLIT_HEADER       6
> > > > > > + #define VIRTIO_NET_CTRL_SPLIT_HEADER_SET   0
> > > > > > +\end{lstlisting}
> > > > > > +
> > > > > > +The class VIRTIO_NET_CTRL_SPLIT_HEADER has one command:
> > > > > > +VIRTIO_NET_CTRL_SPLIT_HEADER_SET applies the new split header configuration.
> > > > > > +
> > > > > > +The driver can enable or disable split header for different protocols by
> > > > > > +setting or clearing corresponding bits in \field{type}.
> > > > > > +
> > > > > > +\begin{itemize}
> > > > > > +    \item VIRTIO_NET_SPLIT_HEADER_TYPE_TCP4: split after ipv4 tcp header
> > > > > > +    \item VIRTIO_NET_SPLIT_HEADER_TYPE_TCP6: split after ipv6 tcp header
> > > > > > +    \item VIRTIO_NET_SPLIT_HEADER_TYPE_UDP4: split after ipv4 udp header
> > > > > > +    \item VIRTIO_NET_SPLIT_HEADER_TYPE_UDP6: split after ipv6 udp header
> > > > > > +\end{itemize}
> > > > > and what is the default?
> > > > >
> > > > > > +
> > > > > > +\devicenormative{\subparagraph}{Setting Split Header}{Device Types / Network Device / Device Operation / Control Virtqueue / Split Header}
> > > > > > +
> > > > > > +A device MUST initialize \field{type} to 0, and MUST set it to 0
> > > > > > +upon device reset.
> > > > > > +
> > > > > > +If VIRTIO_NET_F_SPLIT_HEADER is negotiated, the device MUST support
> > > > > > +VIRTIO_NET_SPLIT_HEADER_TYPE_TCP4, VIRTIO_NET_SPLIT_HEADER_TYPE_TCP6,
> > > > > > +VIRTIO_NET_SPLIT_HEADER_TYPE_UDP4, VIRTIO_NET_SPLIT_HEADER_TYPE_UDP6.
> > > > > pls move or copy all MUST/SHALL etc to a conformance section.
> > > > >
> > > > >
> > > > >
> > > > > > +
> > > > > > +A device MUST NOT split the header in the following cases:
> > > > > if all of the following apply? if one of these applies?
> > > > >
> > > > > > +\begin{itemize}
> > > > > > +    \item the device does not recognize the protocol of the packet.
> > > > > > +    \item the packet is an IP fragmentation.
> > > > > > +    \item \field{type} does not include the protocol of the packet.
> > > > > > +    \item the buffer consists of only one descriptor.
> > > > > > +    \item the total size of the virtio net header and the protocol header exceeds
> > > > > > +        the size of the first descriptor.
> > > > > > +    \item when VIRTIO_NET_F_MRG_RXBUF is not negotiated and the size of the
> > > > > > +        payload exceeds the size of the descriptor chain starting from the 2nd
> > > > > > +        descriptor.
> > > > > > +\end{itemize}
> > > > >
> > > > > Pls spell out what it should do in this case.
> > > > >
> > > > > > +
> > > > > > +If the header is split by the device, the \field{flags} of structure
> > > > > > +virtio_net_hdr MUST contain VIRTIO_NET_HDR_F_SPLIT_HEADER.
> > > > > Not contain. This bit must be set.
> > > > Yes, will fix.
> > > >
> > > >
> > > > > > The protocol header
> > > > > > +MUST be on the buffer specified by the first descriptor,
> > > > > buffers?descriptors? this uses both of them, we need clarity.
> > > > Yes.
> > > >
> > > > > > following the virtio
> > > > > > +net header. The payload MUST start from the second descriptor. The device MUST
> > > > > > +set \field{hdr_len} of structure virtio_net_hdr to the length of the protocol
> > > > > > +header.
> > > > > > +
> > > > > > +If the header is split by the device, VIRTIO_NET_F_MRG_RXBUF is negotiated,
> > > > > > +and the received packet is spread over multiple buffers, when the device uses a
> > > > > > +buffer other than the first, if the buffer consists of multiple descriptors, the
> > > > > > +device MUST skip the first descriptor of the buffer, because the first
> > > > > > +descriptor is used to carry the protocol header.
> > > > > This is a layering violation in that device suddenly cares how
> > > > > buffers are described in the ring structure. We don't even guarantee
> > > > > we will have descriptors going forward ...
> > > >
> > > > The problem with using merge to implement this function is that the size of the
> > > > hdr buf prepared by the driver and the payload buffer are inconsistent. For
> > > > example, the size of the hdr buf is 128 and the payload buf is 4k.
> > > >
> > > > If you use merge to implement it, the avali queue will look like the following,
> > > > so if you want to process a package that cannot be split, but there is an hdr
> > > > buf in the queue, it will be very inappropriate to process.
> > > >
> > > > | 128 buf | 4k buf | 128 buf | 4k buf | 128 buf | 4k buf | 128 buf | 4k buf |
> > > >
> > > > If the size of all bufs in the queue is 4k, using one page to store a small
> > > > protocol header is also a waste of memory.
> > > Right. However
> > > - we can use learning/statistics tricks like we do currently to guess
> > >    whether splitting is good or not and size buffers accordingly
> >
> >
> > I'm not sure this is good design. And it basically means if the driver can't
> > do learning stuffs, header split doesn't make too much sense for them.
> >
> > What's more, taking Linux driver as an example, if we learn the packet
> > should be 64K, do we want to allocate 64K buffer? (which is not sub-optimal
> > because of the high order allocation). And in the future we may want to
> > support big ipv6 TCP packets (larger than 64K).
> >
> > Thanks
> >
> > > - if buffer s too big headers can be copied and buffer released (they are hot in cache
> > >    anyway)
> > >
> > > > So using descriptor to concatenate hdr buf and payload buf is a better way.
> > > Well with options there's no real limit on the header size, is there?
> > >
> > > > Do you have other better ideas?
> > > Not at the moment
> > >
> > > > Or maybe I don't understand what you mean.
> > > >
> > > >
> > > > As for the problem of desciptors without guarantees, it should be a very big
> > > > reform. I don't know what it will look like.
> > > >
> > > > But I think that there are always scenarios where a large buffer is formed by
> > > > multiple small buffers.
> > > >
> > > > > Switching to next descriptor before first one is used is also
> > > > > problematic since it's unclear what will the length be.
> > > > Yes, this is a problem, I don't have a clear definition of this behavior.
> > > >
> > > >
> > > > > How does driver find out the header length?
> > > >   +The device MUST
> > > >   +set \field{hdr_len} of structure virtio_net_hdr to the length of the protocol
> > > >   +header.
> > > > >
> > > > >
> > > > > I propose we just assume VIRTIO_NET_F_MRG_RXBUF is set, and
> > > > > drop the multiple descriptors in the buffer idea.
> > > > >
> > > > >
> > > > > What happens if header does not fit in a single buffer?
> > > >   +    \item the total size of the virtio net header and the protocol header exceeds
> > > >   +        the size of the first descriptor.
> > > might be tricky to always guarantee. what's the plan?
> > >
> > >
> > > > > How does driver
> > > > > find out where does the header end?
> > > >   +The device MUST
> > > >   +set \field{hdr_len} of structure virtio_net_hdr to the length of the protocol
> > > >   +header.
> > > So we need to clarify virtio_net_hdr fits in the 1st buffer, and audit
> > > references to hdr_len to ensure there are no contradictions.
> > > E.g. we say "the driver MUST NOT rely on \field{hdr_len} to be correct."
> > >
> > > > >
> > > > > > +
> > > > > > +\drivernormative{\subparagraph}{Setting Split Header}{Device Types / Network Device / Device Operation / Control Virtqueue / Split Header}
> > > > > > +
> > > > > > +If the VIRTIO_NET_HDR_F_SPLIT_HEADER bit in \field{flags} is set, the driver
> > > > > > +MUST treat the contents of \field{hdr_len} as the length of the protocol header
> > > > > > +inside the first descriptor.
> > > > > > +
> > > > > > +If the split header is enabled, the buffers submitted to receiveq by the
> > > > > > +driver SHOULD be composed of at least two descriptors. The buffer specified by
> > > > > > +the first descriptor SHOULD be able to accommodate the virtio net header and the
> > > > > > +protocol header.
> > > > > What does this mean? How is driver supposed to know?
> > > > >
> > > > >
> > > > > >   \subsubsection{Legacy Interface: Framing Requirements}\label{sec:Device
> > > > > >   Types / Network Device / Legacy Interface: Framing Requirements}
> > > > > > --
> > > > > > 2.31.0
>


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]