OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [virtio-dev] [PATCH] message framing: make ANY_LAYOUT implicit


"Michael S. Tsirkin" <mst@redhat.com> writes:
> This resolves issue VIRTIO-10
>
> This also creates a new section for legacy feature
> bits which will be handy for VIRTIO-13 if that is
> accepted.
>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

This is very similar to the patch I came up with: I'm too slow!

Pasted below, in particular the 2.4.2.5.1 header needs to move up a few
lines, too.

Cheers,
Rusty.

> ---
>  virtio-v1.0-wd01-part1-specification.txt | 38 ++++++++++++++++++++++----------
>  1 file changed, 26 insertions(+), 12 deletions(-)
>
> diff --git a/virtio-v1.0-wd01-part1-specification.txt b/virtio-v1.0-wd01-part1-specification.txt
> index 2679b5b..bb75e35 100644
> --- a/virtio-v1.0-wd01-part1-specification.txt
> +++ b/virtio-v1.0-wd01-part1-specification.txt
> @@ -312,8 +312,8 @@ It is assumed that the host is already aware of the guest endian.
>  
>  2.1.4.2. Message Framing
>  -----------------------
> -The original intent of the specification was that message framing (the
> -particular layout of descriptors) be independent of the contents of
> +Generally, the intent of the specification is for message framing (the
> +particular layout of descriptors) to be independent of the contents of
>  the buffers. For example, a network transmit buffer consists of a 12
>  byte header followed by the network packet. This could be most simply
>  placed in the descriptor table as a 12 byte output descriptor followed
> @@ -322,16 +322,21 @@ single 1526 byte output descriptor in the case where the header and
>  packet are adjacent, or even three or more descriptors (possibly with
>  loss of efficiency in that case).
>  
> -Regrettably, initial driver implementations used simple layouts, and
> -devices came to rely on it, despite this specification wording[10]. It
> -is thus recommended that drivers be conservative in their assumptions,
> -unless the VIRTIO_F_ANY_LAYOUT feature is accepted. In addition, some
> +In addition, some
>  implementations may have large-but-reasonable restrictions on total
>  descriptor size (such as based on IOV_MAX in the host OS). This has
>  not been a problem in practice: little sympathy will be given to
>  drivers which create unreasonably-sized descriptors such as by
>  dividing a network packet into 1500 single-byte descriptors!
>  
> +2.1.4.2.1. Legacy Interfaces: A Note on Message Framing
> +-----------------------
> +Regrettably, initial driver implementations used simple layouts, and
> +devices came to rely on it, despite this specification wording[10]. It
> +is thus recommended that when using legacy interfaces,
> +drivers should be conservative in their assumptions,
> +unless the VIRTIO_F_ANY_LAYOUT feature is accepted.
> +
>  2.1.4.3. The Virtqueue Descriptor Table
>  --------------------------------------
>  
> @@ -2980,9 +2985,6 @@ Currently there are five device-independent feature bits defined:
>    using a timer if the device interrupts it when all the packets
>    are transmitted.
>  
> -  VIRTIO_F_ANY_LAYOUT (27) This feature indicates that the device accepts arbitrary
> -  descriptor layouts, as described in Section "2.1.4.2. Message Framing".
> -
>    VIRTIO_F_RING_INDIRECT_DESC (28) Negotiating this feature indicates
>    that the driver can use descriptors with the VRING_DESC_F_INDIRECT
>    flag set, as described in "2.1.4.3.1. Indirect Descriptors".
> @@ -3002,9 +3004,21 @@ Currently there are five device-independent feature bits defined:
>    compliant with this specification, and acknowledged by all device
>    drivers.
>  
> -In addition, bit 30 is used by qemu's implementation to check for experimental
> -early versions of virtio which did not perform correct feature negotiation,
> -and should not be used.
> +2.5.1 Legacy Interface: A Note on Reserved Feature Bits
> +-------------------------
> +
> +When used through the legacy interface, transitional
> +devices should advertise and recognize the following
> +feature bits:
> +
> +VIRTIO_F_ANY_LAYOUT (27) This feature indicates that the device accepts arbitrary
> +descriptor layouts, as described in Section
> +"2.1.4.2.1. Legacy Interfaces: A Note on Message Framing".
> +
> +VIRTIO_F_BAD_FEATURE (30) This bit is used by devices
> +to check for experimental early versions of virtio
> +drivers which did not perform correct feature negotiation,
> +and should not be supported or activated by drivers.
>  
>  2.6. virtio_ring.h
>  =================
> -- 
> MST

commit 75f47c3fcc52192b7b6a7b6a77692cdb971dcb7e
Author: Rusty Russell <rusty@au1.ibm.com>
Date:   Mon Sep 9 17:40:32 2013 +0930

    Any layout is allowed (deprecate VIRTIO_F_ANY_LAYOUT)
    
    As per issue virtio-10.
    
    Signed-off-by: Rusty Russell <rusty@au1.ibm.com>

diff --git a/virtio-v1.0-wd01-part1-specification.txt b/virtio-v1.0-wd01-part1-specification.txt
index 1e424d0..ceac80a 100644
--- a/virtio-v1.0-wd01-part1-specification.txt
+++ b/virtio-v1.0-wd01-part1-specification.txt
@@ -268,25 +268,33 @@ is not a significant issue.
 
 2.1.4.2. Message Framing
 -----------------------
-The original intent of the specification was that message framing (the
-particular layout of descriptors) be independent of the contents of
-the buffers. For example, a network transmit buffer consists of a 12
-byte header followed by the network packet. This could be most simply
-placed in the descriptor table as a 12 byte output descriptor followed
-by a 1514 byte output descriptor, but it could also consist of a
-single 1526 byte output descriptor in the case where the header and
-packet are adjacent, or even three or more descriptors (possibly with
-loss of efficiency in that case).
+The message framing (the particular layout of descriptors) is
+independent of the contents of the buffers. For example, a network
+transmit buffer consists of a 12 byte header followed by the network
+packet. This could be most simply placed in the descriptor table as a
+12 byte output descriptor followed by a 1514 byte output descriptor,
+but it could also consist of a single 1526 byte output descriptor in
+the case where the header and packet are adjacent, or even three or
+more descriptors (possibly with loss of efficiency in that case).
+
+Note that, some implementations may have large-but-reasonable
+restrictions on total descriptor size (such as based on IOV_MAX in the
+host OS). This has not been a problem in practice: little sympathy
+will be given to drivers which create unreasonably-sized descriptors
+such as by dividing a network packet into 1500 single-byte
+descriptors!
+
+2.1.4.2.1. Legacy Interface: Message Framing
+-----------------------
 
 Regrettably, initial driver implementations used simple layouts, and
-devices came to rely on it, despite this specification wording[10]. It
-is thus recommended that drivers be conservative in their assumptions,
-unless the VIRTIO_F_ANY_LAYOUT feature is accepted. In addition, some
-implementations may have large-but-reasonable restrictions on total
-descriptor size (such as based on IOV_MAX in the host OS). This has
-not been a problem in practice: little sympathy will be given to
-drivers which create unreasonably-sized descriptors such as by
-dividing a network packet into 1500 single-byte descriptors!
+devices came to rely on it, despite this specification wording.  In
+addition, the specification for virtio_blk SCSI commands required
+intuiting field lengths from frame boundaries (see "2.4.2.5.1 Legacy
+Interface: Device Operation")
+
+It is thus recommended that drivers be conservative in their
+assumptions, unless the VIRTIO_F_ANY_LAYOUT feature is accepted.
 
 2.1.4.3. The Virtqueue Descriptor Table
 --------------------------------------
@@ -1193,7 +1201,7 @@ features.
   VIRTIO_NET_F_MAC (5) Device has given MAC address.
 
   VIRTIO_NET_F_GSO (6) (Deprecated) device handles packets with
-    any GSO type.[12]
+    any GSO type.[11]
 
   VIRTIO_NET_F_GUEST_TSO4 (7) Guest can receive TSOv4.
 
@@ -1267,16 +1275,16 @@ features.
   packets by negotating the VIRTIO_NET_F_CSUM feature. This “
   checksum offload” is a common feature on modern network cards.
 
-7. If that feature is negotiated[13], a driver can use TCP or UDP
+7. If that feature is negotiated[12], a driver can use TCP or UDP
   segmentation offload by negotiating the VIRTIO_NET_F_HOST_TSO4 (IPv4
   TCP), VIRTIO_NET_F_HOST_TSO6 (IPv6 TCP) and VIRTIO_NET_F_HOST_UFO
   (UDP fragmentation) features. It should not send TCP packets
   requiring segmentation offload which have the Explicit Congestion
   Notification bit set, unless the VIRTIO_NET_F_HOST_ECN feature is
-  negotiated.[14]
+  negotiated.[13]
 
 8. The converse features are also available: a driver can save
-  the virtual device some work by negotiating these features.[15]
+  the virtual device some work by negotiating these features.[14]
    The VIRTIO_NET_F_GUEST_CSUM feature indicates that partially
   checksummed packets can be received, and if it can do that then
   the VIRTIO_NET_F_GUEST_TSO4, VIRTIO_NET_F_GUEST_TSO6,
@@ -1328,7 +1336,7 @@ the different features the driver negotiated.
     and
 
   • csum_offset indicates how many bytes after the csum_start the
-    new (16 bit ones' complement) checksum should be placed.[16]
+    new (16 bit ones' complement) checksum should be placed.[15]
 
 2. If the driver negotiated
   VIRTIO_NET_F_HOST_TSO4, TSO6 or UFO, and the packet requires
@@ -1341,21 +1349,21 @@ the different features the driver negotiated.
 
   • hdr_len is a hint to the device as to how much of the header
     needs to be kept to copy into each packet, usually set to the
-    length of the headers, including the transport header.[17]
+    length of the headers, including the transport header.[16]
 
   • gso_size is the maximum size of each packet beyond that
     header (ie. MSS).
 
   • If the driver negotiated the VIRTIO_NET_F_HOST_ECN feature,
     the VIRTIO_NET_HDR_GSO_ECN bit may be set in “gso_type” as
-    well, indicating that the TCP packet has the ECN bit set.[18]
+    well, indicating that the TCP packet has the ECN bit set.[17]
 
 3. If the driver negotiated the VIRTIO_NET_F_MRG_RXBUF feature,
   the num_buffers field is set to zero.
 
 4. The header and packet are added as one output buffer to the
   transmitq, and the device is notified of the new entry (see "2.4.1.4.
-  Notifying The Device").[19]
+  Notifying The Device").[18]
 
 2.4.1.5.1.1. Packet Transmission Interrupt
 -----------------------------------------
@@ -1380,7 +1388,7 @@ VIRTIO_NET_F_GUEST_UFO features are used, the Guest will need to
 accept packets of up to 65550 bytes long (the maximum size of a
 TCP or UDP packet, plus the 14 byte ethernet header), otherwise
 1514. bytes. So unless VIRTIO_NET_F_MRG_RXBUF is negotiated, every
-buffer in the receive queue needs to be at least this length [20]
+buffer in the receive queue needs to be at least this length [19]
 
 If VIRTIO_NET_F_MRG_RXBUF is negotiated, each buffer must be at
 least the size of the struct virtio_net_hdr.
@@ -1480,7 +1488,7 @@ off. The command-specific-data is one byte containing 0 (off) or
 	 #define VIRTIO_NET_CTRL_MAC_TABLE_SET        0
 
 The device can filter incoming packets by any number of destination
-MAC addresses.[21] This table is set using the class
+MAC addresses.[20] This table is set using the class
 VIRTIO_NET_CTRL_MAC and the command VIRTIO_NET_CTRL_MAC_TABLE_SET. The
 command-specific-data is two variable length tables of 6-byte MAC
 addresses. The first table contains unicast addresses, and the second
@@ -1652,7 +1660,7 @@ the device (not necessarily in order). Each request is of form:
 
 The type of the request is either a read (VIRTIO_BLK_T_IN), a write
 (VIRTIO_BLK_T_OUT), or a flush (VIRTIO_BLK_T_FLUSH or
-VIRTIO_BLK_T_FLUSH_OUT[22]). If the device has VIRTIO_BLK_F_BARRIER
+VIRTIO_BLK_T_FLUSH_OUT[21]). If the device has VIRTIO_BLK_F_BARRIER
 feature the high bit (VIRTIO_BLK_T_BARRIER) indicates that this
 request acts as a barrier and that all preceeding requests must be
 complete before this one, and all following requests must not be
@@ -1683,12 +1691,12 @@ error or VIRTIO_BLK_S_UNSUPP for a request unsupported by host:
 	#define VIRTIO_BLK_S_IOERR     1
 	#define VIRTIO_BLK_S_UNSUPP    2
 
+2.4.2.5.1 Legacy Interface: Device Operation
+------------------------
 Historically, devices assumed that the fields type, ioprio and sector
 reside in a single, separate read-only buffer and the status field is
 a separate read-only buffer of size 1 byte, by itself.
 
-2.4.2.5.1 Legacy Interface: Device Operation
-------------------------
 If the device has VIRTIO_BLK_F_SCSI feature, it can also support
 scsi packet command requests, each of these requests is of form:
 
@@ -1826,7 +1834,7 @@ data and outgoing characters are placed in the transmit queue.
 ------------------------
 
 1. For output, a buffer containing the characters is placed in
-  the port's transmitq.[23]
+  the port's transmitq.[22]
 
 2. When a buffer is used in the receiveq (signalled by an
   interrupt), the contents is the input to the port associated
@@ -1970,7 +1978,7 @@ configuration change interrupt.
 2. To supply memory to the balloon (aka. inflate):
 
   (a) The driver constructs an array of addresses of unused memory
-    pages. These addresses are divided by 4096[24] and the descriptor
+    pages. These addresses are divided by 4096[23] and the descriptor
     describing the resulting 32-bit array is added to the inflateq.
 
 3. To remove memory from the balloon (aka. deflate):
@@ -1985,11 +1993,11 @@ configuration change interrupt.
 
   (c) Otherwise, the guest may begin to re-use pages previously
     given to the balloon before the device has acknowledged their
-    withdrawl. [25]
+    withdrawl. [24]
 
 4. In either case, once the device has completed the inflation or
   deflation, the “actual” field of the configuration should be
-  updated to reflect the new number of pages in the balloon.[26]
+  updated to reflect the new number of pages in the balloon.[25]
 
 2.4.5.6.1. Memory Statistics
 ---------------------------
@@ -2577,10 +2585,7 @@ contents of the event field. The following events are defined:
 2.5. Reserved Feature Bits
 =========================
 
-Currently there are four device-independent feature bits defined:
-
-  VIRTIO_F_ANY_LAYOUT (27) This feature indicates that the device accepts arbitrary
-  descriptor layouts, as described in Section "2.1.4.2. Message Framing".
+Currently there are three device-independent feature bits defined:
 
   VIRTIO_F_RING_INDIRECT_DESC (28) Negotiating this feature indicates
   that the driver can use descriptors with the VRING_DESC_F_INDIRECT
@@ -2621,6 +2626,10 @@ VIRTIO_F_NOTIFY_ON_EMPTY (24) Negotiating this feature
   using a timer if the device interrupts it when all the packets
   are transmitted.
 
+VIRTIO_F_ANY_LAYOUT (27) This feature indicates that the device
+  accepts arbitrary descriptor layouts, as described in Section
+  "2.1.4.2.1. Legacy Interface: Message Framing".
+
 2.6. virtio_ring.h
 =================
 
@@ -2851,7 +2860,7 @@ altogether.
 
 Any change to configuration space, or new virtqueues, or
 behavioural changes, should be indicated by negotiation of a new
-feature bit. This establishes clarity[11] and avoids future expansion problems.
+feature bit. This establishes clarity[10] and avoids future expansion problems.
 
 Clusters of functionality which are always implemented together
 can use a single bit, but if one feature makes sense without the
@@ -2895,34 +2904,28 @@ of this expected condition is necessary.
 
 [9] https://lists.linux-foundation.org/mailman/listinfo/virtualization
 
-[10] It was previously asserted that framing should be independent of message
-contents, yet invariably drivers layed out messages in reliable ways and
-devices assumed it.
-In addition, the specifications for virtio_blk and virtio_scsi require
-intuiting field lengths from frame boundaries.
-
-[11] Even if it does mean documenting design or implementation
+[10] Even if it does mean documenting design or implementation
 mistakes!
 
 
-[12] It was supposed to indicate segmentation offload support, but
+[11] It was supposed to indicate segmentation offload support, but
 upon further investigation it became clear that multiple bits
 were required.
 
-[13] ie. VIRTIO_NET_F_HOST_TSO* and VIRTIO_NET_F_HOST_UFO are
+[12] ie. VIRTIO_NET_F_HOST_TSO* and VIRTIO_NET_F_HOST_UFO are
 dependent on VIRTIO_NET_F_CSUM; a dvice which offers the offload
 features must offer the checksum feature, and a driver which
 accepts the offload features must accept the checksum feature.
 Similar logic applies to the VIRTIO_NET_F_GUEST_TSO4 features
 depending on VIRTIO_NET_F_GUEST_CSUM.
 
-[14] This is a common restriction in real, older network cards.
+[13] This is a common restriction in real, older network cards.
 
-[15] For example, a network packet transported between two guests on
+[14] For example, a network packet transported between two guests on
 the same system may not require checksumming at all, nor segmentation,
 if both guests are amenable.
 
-[16] For example, consider a partially checksummed TCP (IPv4) packet.
+[15] For example, consider a partially checksummed TCP (IPv4) packet.
 It will have a 14 byte ethernet header and 20 byte IP header
 followed by the TCP header (with the TCP checksum field 16 bytes
 into that header). csum_start will be 14+20 = 34 (the TCP
@@ -2932,26 +2935,26 @@ of the TCP pseudo header, so that replacing it by the ones'
 complement checksum of the TCP header and body will give the
 correct result.
 
-[17] Due to various bugs in implementations, this field is not useful
+[16] Due to various bugs in implementations, this field is not useful
 as a guarantee of the transport header size.
 
-[18] This case is not handled by some older hardware, so is called out
+[17] This case is not handled by some older hardware, so is called out
 specifically in the protocol.
 
-[19] Note that the header will be two bytes longer for the
+[18] Note that the header will be two bytes longer for the
 VIRTIO_NET_F_MRG_RXBUF case.
 
-[20] Obviously each one can be split across multiple descriptor
+[19] Obviously each one can be split across multiple descriptor
 elements.
 
-[21] Since there are no guarentees, it can use a hash filter or
+[20] Since there are no guarentees, it can use a hash filter or
 silently switch to allmulti or promiscuous mode if it is given too
 many addresses.
 
-[22] The FLUSH and FLUSH_OUT types are equivalent, the device does not
+[21] The FLUSH and FLUSH_OUT types are equivalent, the device does not
 distinguish between them
 
-[23] Because this is high importance and low bandwidth, the current
+[22] Because this is high importance and low bandwidth, the current
 Linux implementation polls for the buffer to be used, rather than
 waiting for an interrupt, simplifying the implementation
 significantly. However, for generic serial ports with the
@@ -2959,9 +2962,9 @@ O_NONBLOCK flag set, the polling limitation is relaxed and the
 consumed buffers are freed upon the next write or poll call or
 when a port is closed or hot-unplugged.
 
-[24] This is historical, and independent of the guest page size
+[23] This is historical, and independent of the guest page size
 
-[25] In this case, deflation advice is merely a courtesy
+[24] In this case, deflation advice is merely a courtesy
 
-[26] As updates to configuration space are not atomic, this field
+[25] As updates to configuration space are not atomic, this field
 isn't particularly reliable, but can be used to diagnose buggy guests.



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]