OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-dev message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [virtio-dev] [PATCH] message framing: make ANY_LAYOUT implicit


On Mon, Sep 16, 2013 at 05:12:34PM +0930, Rusty Russell wrote:
> "Michael S. Tsirkin" <mst@redhat.com> writes:
> > This resolves issue VIRTIO-10
> >
> > This also creates a new section for legacy feature
> > bits which will be handy for VIRTIO-13 if that is
> > accepted.
> >
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> 
> This is very similar to the patch I came up with: I'm too slow!
> 
> Pasted below, in particular the 2.4.2.5.1 header needs to move up a few
> lines, too.
> 
> Cheers,
> Rusty.

Cool. A couple of comments below.

> > ---
> >  virtio-v1.0-wd01-part1-specification.txt | 38 ++++++++++++++++++++++----------
> >  1 file changed, 26 insertions(+), 12 deletions(-)
> >
> > diff --git a/virtio-v1.0-wd01-part1-specification.txt b/virtio-v1.0-wd01-part1-specification.txt
> > index 2679b5b..bb75e35 100644
> > --- a/virtio-v1.0-wd01-part1-specification.txt
> > +++ b/virtio-v1.0-wd01-part1-specification.txt
> > @@ -312,8 +312,8 @@ It is assumed that the host is already aware of the guest endian.
> >  
> >  2.1.4.2. Message Framing
> >  -----------------------
> > -The original intent of the specification was that message framing (the
> > -particular layout of descriptors) be independent of the contents of
> > +Generally, the intent of the specification is for message framing (the
> > +particular layout of descriptors) to be independent of the contents of
> >  the buffers. For example, a network transmit buffer consists of a 12
> >  byte header followed by the network packet. This could be most simply
> >  placed in the descriptor table as a 12 byte output descriptor followed
> > @@ -322,16 +322,21 @@ single 1526 byte output descriptor in the case where the header and
> >  packet are adjacent, or even three or more descriptors (possibly with
> >  loss of efficiency in that case).
> >  
> > -Regrettably, initial driver implementations used simple layouts, and
> > -devices came to rely on it, despite this specification wording[10]. It
> > -is thus recommended that drivers be conservative in their assumptions,
> > -unless the VIRTIO_F_ANY_LAYOUT feature is accepted. In addition, some
> > +In addition, some
> >  implementations may have large-but-reasonable restrictions on total
> >  descriptor size (such as based on IOV_MAX in the host OS). This has
> >  not been a problem in practice: little sympathy will be given to
> >  drivers which create unreasonably-sized descriptors such as by
> >  dividing a network packet into 1500 single-byte descriptors!
> >  
> > +2.1.4.2.1. Legacy Interfaces: A Note on Message Framing
> > +-----------------------
> > +Regrettably, initial driver implementations used simple layouts, and
> > +devices came to rely on it, despite this specification wording[10]. It
> > +is thus recommended that when using legacy interfaces,
> > +drivers should be conservative in their assumptions,
> > +unless the VIRTIO_F_ANY_LAYOUT feature is accepted.
> > +
> >  2.1.4.3. The Virtqueue Descriptor Table
> >  --------------------------------------
> >  
> > @@ -2980,9 +2985,6 @@ Currently there are five device-independent feature bits defined:
> >    using a timer if the device interrupts it when all the packets
> >    are transmitted.
> >  
> > -  VIRTIO_F_ANY_LAYOUT (27) This feature indicates that the device accepts arbitrary
> > -  descriptor layouts, as described in Section "2.1.4.2. Message Framing".
> > -
> >    VIRTIO_F_RING_INDIRECT_DESC (28) Negotiating this feature indicates
> >    that the driver can use descriptors with the VRING_DESC_F_INDIRECT
> >    flag set, as described in "2.1.4.3.1. Indirect Descriptors".
> > @@ -3002,9 +3004,21 @@ Currently there are five device-independent feature bits defined:
> >    compliant with this specification, and acknowledged by all device
> >    drivers.
> >  
> > -In addition, bit 30 is used by qemu's implementation to check for experimental
> > -early versions of virtio which did not perform correct feature negotiation,
> > -and should not be used.
> > +2.5.1 Legacy Interface: A Note on Reserved Feature Bits
> > +-------------------------
> > +
> > +When used through the legacy interface, transitional
> > +devices should advertise and recognize the following
> > +feature bits:
> > +
> > +VIRTIO_F_ANY_LAYOUT (27) This feature indicates that the device accepts arbitrary
> > +descriptor layouts, as described in Section
> > +"2.1.4.2.1. Legacy Interfaces: A Note on Message Framing".
> > +
> > +VIRTIO_F_BAD_FEATURE (30) This bit is used by devices
> > +to check for experimental early versions of virtio
> > +drivers which did not perform correct feature negotiation,
> > +and should not be supported or activated by drivers.
> >  
> >  2.6. virtio_ring.h
> >  =================
> > -- 
> > MST
> 
> commit 75f47c3fcc52192b7b6a7b6a77692cdb971dcb7e
> Author: Rusty Russell <rusty@au1.ibm.com>
> Date:   Mon Sep 9 17:40:32 2013 +0930
> 
>     Any layout is allowed (deprecate VIRTIO_F_ANY_LAYOUT)
>     
>     As per issue virtio-10.
>     
>     Signed-off-by: Rusty Russell <rusty@au1.ibm.com>
> 
> diff --git a/virtio-v1.0-wd01-part1-specification.txt b/virtio-v1.0-wd01-part1-specification.txt
> index 1e424d0..ceac80a 100644
> --- a/virtio-v1.0-wd01-part1-specification.txt
> +++ b/virtio-v1.0-wd01-part1-specification.txt
> @@ -268,25 +268,33 @@ is not a significant issue.
>  
>  2.1.4.2. Message Framing
>  -----------------------
> -The original intent of the specification was that message framing (the
> -particular layout of descriptors) be independent of the contents of
> -the buffers. For example, a network transmit buffer consists of a 12
> -byte header followed by the network packet. This could be most simply
> -placed in the descriptor table as a 12 byte output descriptor followed
> -by a 1514 byte output descriptor, but it could also consist of a
> -single 1526 byte output descriptor in the case where the header and
> -packet are adjacent, or even three or more descriptors (possibly with
> -loss of efficiency in that case).
> +The message framing (the particular layout of descriptors) is
> +independent of the contents of the buffers. For example, a network
> +transmit buffer consists of a 12 byte header followed by the network
> +packet. This could be most simply placed in the descriptor table as a
> +12 byte output descriptor followed by a 1514 byte output descriptor,
> +but it could also consist of a single 1526 byte output descriptor in
> +the case where the header and packet are adjacent, or even three or
> +more descriptors (possibly with loss of efficiency in that case).
> +
> +Note that, some implementations may have large-but-reasonable
> +restrictions on total descriptor size (such as based on IOV_MAX in the
> +host OS). This has not been a problem in practice: little sympathy
> +will be given to drivers which create unreasonably-sized descriptors
> +such as by dividing a network packet into 1500 single-byte
> +descriptors!
> +
> +2.1.4.2.1. Legacy Interface: Message Framing
> +-----------------------
>  
>  Regrettably, initial driver implementations used simple layouts, and
> -devices came to rely on it, despite this specification wording[10]. It
> -is thus recommended that drivers be conservative in their assumptions,
> -unless the VIRTIO_F_ANY_LAYOUT feature is accepted. In addition, some
> -implementations may have large-but-reasonable restrictions on total
> -descriptor size (such as based on IOV_MAX in the host OS). This has
> -not been a problem in practice: little sympathy will be given to
> -drivers which create unreasonably-sized descriptors such as by
> -dividing a network packet into 1500 single-byte descriptors!
> +devices came to rely on it, despite this specification wording.  In
> +addition, the specification for virtio_blk SCSI commands required
> +intuiting field lengths from frame boundaries (see "2.4.2.5.1 Legacy
> +Interface: Device Operation")
> +
> +It is thus recommended that

Maybe add:
" when using legacy interfaces, transitional "

to stress that it does not need to be checked in non legacy
setting.

> drivers be conservative in their
> +assumptions, unless the VIRTIO_F_ANY_LAYOUT feature is accepted.

>  
>  2.1.4.3. The Virtqueue Descriptor Table
>  --------------------------------------
> @@ -1193,7 +1201,7 @@ features.
>    VIRTIO_NET_F_MAC (5) Device has given MAC address.
>  
>    VIRTIO_NET_F_GSO (6) (Deprecated) device handles packets with
> -    any GSO type.[12]
> +    any GSO type.[11]
>  
>    VIRTIO_NET_F_GUEST_TSO4 (7) Guest can receive TSOv4.
>  

This generated a huge diff because of renumbering.
What I generally do is take out a chunk and commit,
then make renumbering a separate
commit, it does not need review :)

If I add a chunk, and numbering does not fit naturally,
I add it with a random huge number like 123 - this makes
it possible to add cross-references directly.
Again commit then re-number separately.

This also makes rebasing easier for people:
they can see the conflict is with a re-numbering patch.

> @@ -1267,16 +1275,16 @@ features.
>    packets by negotating the VIRTIO_NET_F_CSUM feature. This “
>    checksum offload” is a common feature on modern network cards.
>  
> -7. If that feature is negotiated[13], a driver can use TCP or UDP
> +7. If that feature is negotiated[12], a driver can use TCP or UDP
>    segmentation offload by negotiating the VIRTIO_NET_F_HOST_TSO4 (IPv4
>    TCP), VIRTIO_NET_F_HOST_TSO6 (IPv6 TCP) and VIRTIO_NET_F_HOST_UFO
>    (UDP fragmentation) features. It should not send TCP packets
>    requiring segmentation offload which have the Explicit Congestion
>    Notification bit set, unless the VIRTIO_NET_F_HOST_ECN feature is
> -  negotiated.[14]
> +  negotiated.[13]
>  
>  8. The converse features are also available: a driver can save
> -  the virtual device some work by negotiating these features.[15]
> +  the virtual device some work by negotiating these features.[14]
>     The VIRTIO_NET_F_GUEST_CSUM feature indicates that partially
>    checksummed packets can be received, and if it can do that then
>    the VIRTIO_NET_F_GUEST_TSO4, VIRTIO_NET_F_GUEST_TSO6,
> @@ -1328,7 +1336,7 @@ the different features the driver negotiated.
>      and
>  
>    • csum_offset indicates how many bytes after the csum_start the
> -    new (16 bit ones' complement) checksum should be placed.[16]
> +    new (16 bit ones' complement) checksum should be placed.[15]
>  
>  2. If the driver negotiated
>    VIRTIO_NET_F_HOST_TSO4, TSO6 or UFO, and the packet requires
> @@ -1341,21 +1349,21 @@ the different features the driver negotiated.
>  
>    • hdr_len is a hint to the device as to how much of the header
>      needs to be kept to copy into each packet, usually set to the
> -    length of the headers, including the transport header.[17]
> +    length of the headers, including the transport header.[16]
>  
>    • gso_size is the maximum size of each packet beyond that
>      header (ie. MSS).
>  
>    • If the driver negotiated the VIRTIO_NET_F_HOST_ECN feature,
>      the VIRTIO_NET_HDR_GSO_ECN bit may be set in “gso_type” as
> -    well, indicating that the TCP packet has the ECN bit set.[18]
> +    well, indicating that the TCP packet has the ECN bit set.[17]
>  
>  3. If the driver negotiated the VIRTIO_NET_F_MRG_RXBUF feature,
>    the num_buffers field is set to zero.
>  
>  4. The header and packet are added as one output buffer to the
>    transmitq, and the device is notified of the new entry (see "2.4.1.4.
> -  Notifying The Device").[19]
> +  Notifying The Device").[18]
>  
>  2.4.1.5.1.1. Packet Transmission Interrupt
>  -----------------------------------------
> @@ -1380,7 +1388,7 @@ VIRTIO_NET_F_GUEST_UFO features are used, the Guest will need to
>  accept packets of up to 65550 bytes long (the maximum size of a
>  TCP or UDP packet, plus the 14 byte ethernet header), otherwise
>  1514. bytes. So unless VIRTIO_NET_F_MRG_RXBUF is negotiated, every
> -buffer in the receive queue needs to be at least this length [20]
> +buffer in the receive queue needs to be at least this length [19]
>  
>  If VIRTIO_NET_F_MRG_RXBUF is negotiated, each buffer must be at
>  least the size of the struct virtio_net_hdr.
> @@ -1480,7 +1488,7 @@ off. The command-specific-data is one byte containing 0 (off) or
>  	 #define VIRTIO_NET_CTRL_MAC_TABLE_SET        0
>  
>  The device can filter incoming packets by any number of destination
> -MAC addresses.[21] This table is set using the class
> +MAC addresses.[20] This table is set using the class
>  VIRTIO_NET_CTRL_MAC and the command VIRTIO_NET_CTRL_MAC_TABLE_SET. The
>  command-specific-data is two variable length tables of 6-byte MAC
>  addresses. The first table contains unicast addresses, and the second
> @@ -1652,7 +1660,7 @@ the device (not necessarily in order). Each request is of form:
>  
>  The type of the request is either a read (VIRTIO_BLK_T_IN), a write
>  (VIRTIO_BLK_T_OUT), or a flush (VIRTIO_BLK_T_FLUSH or
> -VIRTIO_BLK_T_FLUSH_OUT[22]). If the device has VIRTIO_BLK_F_BARRIER
> +VIRTIO_BLK_T_FLUSH_OUT[21]). If the device has VIRTIO_BLK_F_BARRIER
>  feature the high bit (VIRTIO_BLK_T_BARRIER) indicates that this
>  request acts as a barrier and that all preceeding requests must be
>  complete before this one, and all following requests must not be
> @@ -1683,12 +1691,12 @@ error or VIRTIO_BLK_S_UNSUPP for a request unsupported by host:
>  	#define VIRTIO_BLK_S_IOERR     1
>  	#define VIRTIO_BLK_S_UNSUPP    2
>  
> +2.4.2.5.1 Legacy Interface: Device Operation
> +------------------------
>  Historically, devices assumed that the fields type, ioprio and sector
>  reside in a single, separate read-only buffer and the status field is
>  a separate read-only buffer of size 1 byte, by itself.
>  
> -2.4.2.5.1 Legacy Interface: Device Operation
> -------------------------
>  If the device has VIRTIO_BLK_F_SCSI feature, it can also support
>  scsi packet command requests, each of these requests is of form:
>  
> @@ -1826,7 +1834,7 @@ data and outgoing characters are placed in the transmit queue.
>  ------------------------
>  
>  1. For output, a buffer containing the characters is placed in
> -  the port's transmitq.[23]
> +  the port's transmitq.[22]
>  
>  2. When a buffer is used in the receiveq (signalled by an
>    interrupt), the contents is the input to the port associated
> @@ -1970,7 +1978,7 @@ configuration change interrupt.
>  2. To supply memory to the balloon (aka. inflate):
>  
>    (a) The driver constructs an array of addresses of unused memory
> -    pages. These addresses are divided by 4096[24] and the descriptor
> +    pages. These addresses are divided by 4096[23] and the descriptor
>      describing the resulting 32-bit array is added to the inflateq.
>  
>  3. To remove memory from the balloon (aka. deflate):
> @@ -1985,11 +1993,11 @@ configuration change interrupt.
>  
>    (c) Otherwise, the guest may begin to re-use pages previously
>      given to the balloon before the device has acknowledged their
> -    withdrawl. [25]
> +    withdrawl. [24]
>  
>  4. In either case, once the device has completed the inflation or
>    deflation, the “actual” field of the configuration should be
> -  updated to reflect the new number of pages in the balloon.[26]
> +  updated to reflect the new number of pages in the balloon.[25]
>  
>  2.4.5.6.1. Memory Statistics
>  ---------------------------
> @@ -2577,10 +2585,7 @@ contents of the event field. The following events are defined:
>  2.5. Reserved Feature Bits
>  =========================
>  
> -Currently there are four device-independent feature bits defined:
> -
> -  VIRTIO_F_ANY_LAYOUT (27) This feature indicates that the device accepts arbitrary
> -  descriptor layouts, as described in Section "2.1.4.2. Message Framing".
> +Currently there are three device-independent feature bits defined:
>  
>    VIRTIO_F_RING_INDIRECT_DESC (28) Negotiating this feature indicates
>    that the driver can use descriptors with the VRING_DESC_F_INDIRECT
> @@ -2621,6 +2626,10 @@ VIRTIO_F_NOTIFY_ON_EMPTY (24) Negotiating this feature
>    using a timer if the device interrupts it when all the packets
>    are transmitted.
>  
> +VIRTIO_F_ANY_LAYOUT (27) This feature indicates that the device
> +  accepts arbitrary descriptor layouts, as described in Section
> +  "2.1.4.2.1. Legacy Interface: Message Framing".
> +

Is this on top of a patch making NOTIFY_ON_EMPTY legacy?

We should also move BAD_FEATURE here.

>  2.6. virtio_ring.h
>  =================
>  
> @@ -2851,7 +2860,7 @@ altogether.
>  
>  Any change to configuration space, or new virtqueues, or
>  behavioural changes, should be indicated by negotiation of a new
> -feature bit. This establishes clarity[11] and avoids future expansion problems.
> +feature bit. This establishes clarity[10] and avoids future expansion problems.
>  
>  Clusters of functionality which are always implemented together
>  can use a single bit, but if one feature makes sense without the
> @@ -2895,34 +2904,28 @@ of this expected condition is necessary.
>  
>  [9] https://lists.linux-foundation.org/mailman/listinfo/virtualization
>  
> -[10] It was previously asserted that framing should be independent of message
> -contents, yet invariably drivers layed out messages in reliable ways and
> -devices assumed it.
> -In addition, the specifications for virtio_blk and virtio_scsi require
> -intuiting field lengths from frame boundaries.
> -
> -[11] Even if it does mean documenting design or implementation
> +[10] Even if it does mean documenting design or implementation
>  mistakes!
>  
>  
> -[12] It was supposed to indicate segmentation offload support, but
> +[11] It was supposed to indicate segmentation offload support, but
>  upon further investigation it became clear that multiple bits
>  were required.
>  
> -[13] ie. VIRTIO_NET_F_HOST_TSO* and VIRTIO_NET_F_HOST_UFO are
> +[12] ie. VIRTIO_NET_F_HOST_TSO* and VIRTIO_NET_F_HOST_UFO are
>  dependent on VIRTIO_NET_F_CSUM; a dvice which offers the offload
>  features must offer the checksum feature, and a driver which
>  accepts the offload features must accept the checksum feature.
>  Similar logic applies to the VIRTIO_NET_F_GUEST_TSO4 features
>  depending on VIRTIO_NET_F_GUEST_CSUM.
>  
> -[14] This is a common restriction in real, older network cards.
> +[13] This is a common restriction in real, older network cards.
>  
> -[15] For example, a network packet transported between two guests on
> +[14] For example, a network packet transported between two guests on
>  the same system may not require checksumming at all, nor segmentation,
>  if both guests are amenable.
>  
> -[16] For example, consider a partially checksummed TCP (IPv4) packet.
> +[15] For example, consider a partially checksummed TCP (IPv4) packet.
>  It will have a 14 byte ethernet header and 20 byte IP header
>  followed by the TCP header (with the TCP checksum field 16 bytes
>  into that header). csum_start will be 14+20 = 34 (the TCP
> @@ -2932,26 +2935,26 @@ of the TCP pseudo header, so that replacing it by the ones'
>  complement checksum of the TCP header and body will give the
>  correct result.
>  
> -[17] Due to various bugs in implementations, this field is not useful
> +[16] Due to various bugs in implementations, this field is not useful
>  as a guarantee of the transport header size.
>  
> -[18] This case is not handled by some older hardware, so is called out
> +[17] This case is not handled by some older hardware, so is called out
>  specifically in the protocol.
>  
> -[19] Note that the header will be two bytes longer for the
> +[18] Note that the header will be two bytes longer for the
>  VIRTIO_NET_F_MRG_RXBUF case.
>  
> -[20] Obviously each one can be split across multiple descriptor
> +[19] Obviously each one can be split across multiple descriptor
>  elements.
>  
> -[21] Since there are no guarentees, it can use a hash filter or
> +[20] Since there are no guarentees, it can use a hash filter or
>  silently switch to allmulti or promiscuous mode if it is given too
>  many addresses.
>  
> -[22] The FLUSH and FLUSH_OUT types are equivalent, the device does not
> +[21] The FLUSH and FLUSH_OUT types are equivalent, the device does not
>  distinguish between them
>  
> -[23] Because this is high importance and low bandwidth, the current
> +[22] Because this is high importance and low bandwidth, the current
>  Linux implementation polls for the buffer to be used, rather than
>  waiting for an interrupt, simplifying the implementation
>  significantly. However, for generic serial ports with the
> @@ -2959,9 +2962,9 @@ O_NONBLOCK flag set, the polling limitation is relaxed and the
>  consumed buffers are freed upon the next write or poll call or
>  when a port is closed or hot-unplugged.
>  
> -[24] This is historical, and independent of the guest page size
> +[23] This is historical, and independent of the guest page size
>  
> -[25] In this case, deflation advice is merely a courtesy
> +[24] In this case, deflation advice is merely a courtesy
>  
> -[26] As updates to configuration space are not atomic, this field
> +[25] As updates to configuration space are not atomic, this field
>  isn't particularly reliable, but can be used to diagnose buggy guests.


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]