[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [virtio-dev] [PATCH] message framing: make ANY_LAYOUT implicit
On Mon, Sep 16, 2013 at 05:12:34PM +0930, Rusty Russell wrote: > "Michael S. Tsirkin" <mst@redhat.com> writes: > > This resolves issue VIRTIO-10 > > > > This also creates a new section for legacy feature > > bits which will be handy for VIRTIO-13 if that is > > accepted. > > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > > This is very similar to the patch I came up with: I'm too slow! > > Pasted below, in particular the 2.4.2.5.1 header needs to move up a few > lines, too. > > Cheers, > Rusty. Cool. A couple of comments below. > > --- > > virtio-v1.0-wd01-part1-specification.txt | 38 ++++++++++++++++++++++---------- > > 1 file changed, 26 insertions(+), 12 deletions(-) > > > > diff --git a/virtio-v1.0-wd01-part1-specification.txt b/virtio-v1.0-wd01-part1-specification.txt > > index 2679b5b..bb75e35 100644 > > --- a/virtio-v1.0-wd01-part1-specification.txt > > +++ b/virtio-v1.0-wd01-part1-specification.txt > > @@ -312,8 +312,8 @@ It is assumed that the host is already aware of the guest endian. > > > > 2.1.4.2. Message Framing > > ----------------------- > > -The original intent of the specification was that message framing (the > > -particular layout of descriptors) be independent of the contents of > > +Generally, the intent of the specification is for message framing (the > > +particular layout of descriptors) to be independent of the contents of > > the buffers. For example, a network transmit buffer consists of a 12 > > byte header followed by the network packet. This could be most simply > > placed in the descriptor table as a 12 byte output descriptor followed > > @@ -322,16 +322,21 @@ single 1526 byte output descriptor in the case where the header and > > packet are adjacent, or even three or more descriptors (possibly with > > loss of efficiency in that case). > > > > -Regrettably, initial driver implementations used simple layouts, and > > -devices came to rely on it, despite this specification wording[10]. It > > -is thus recommended that drivers be conservative in their assumptions, > > -unless the VIRTIO_F_ANY_LAYOUT feature is accepted. In addition, some > > +In addition, some > > implementations may have large-but-reasonable restrictions on total > > descriptor size (such as based on IOV_MAX in the host OS). This has > > not been a problem in practice: little sympathy will be given to > > drivers which create unreasonably-sized descriptors such as by > > dividing a network packet into 1500 single-byte descriptors! > > > > +2.1.4.2.1. Legacy Interfaces: A Note on Message Framing > > +----------------------- > > +Regrettably, initial driver implementations used simple layouts, and > > +devices came to rely on it, despite this specification wording[10]. It > > +is thus recommended that when using legacy interfaces, > > +drivers should be conservative in their assumptions, > > +unless the VIRTIO_F_ANY_LAYOUT feature is accepted. > > + > > 2.1.4.3. The Virtqueue Descriptor Table > > -------------------------------------- > > > > @@ -2980,9 +2985,6 @@ Currently there are five device-independent feature bits defined: > > using a timer if the device interrupts it when all the packets > > are transmitted. > > > > - VIRTIO_F_ANY_LAYOUT (27) This feature indicates that the device accepts arbitrary > > - descriptor layouts, as described in Section "2.1.4.2. Message Framing". > > - > > VIRTIO_F_RING_INDIRECT_DESC (28) Negotiating this feature indicates > > that the driver can use descriptors with the VRING_DESC_F_INDIRECT > > flag set, as described in "2.1.4.3.1. Indirect Descriptors". > > @@ -3002,9 +3004,21 @@ Currently there are five device-independent feature bits defined: > > compliant with this specification, and acknowledged by all device > > drivers. > > > > -In addition, bit 30 is used by qemu's implementation to check for experimental > > -early versions of virtio which did not perform correct feature negotiation, > > -and should not be used. > > +2.5.1 Legacy Interface: A Note on Reserved Feature Bits > > +------------------------- > > + > > +When used through the legacy interface, transitional > > +devices should advertise and recognize the following > > +feature bits: > > + > > +VIRTIO_F_ANY_LAYOUT (27) This feature indicates that the device accepts arbitrary > > +descriptor layouts, as described in Section > > +"2.1.4.2.1. Legacy Interfaces: A Note on Message Framing". > > + > > +VIRTIO_F_BAD_FEATURE (30) This bit is used by devices > > +to check for experimental early versions of virtio > > +drivers which did not perform correct feature negotiation, > > +and should not be supported or activated by drivers. > > > > 2.6. virtio_ring.h > > ================= > > -- > > MST > > commit 75f47c3fcc52192b7b6a7b6a77692cdb971dcb7e > Author: Rusty Russell <rusty@au1.ibm.com> > Date: Mon Sep 9 17:40:32 2013 +0930 > > Any layout is allowed (deprecate VIRTIO_F_ANY_LAYOUT) > > As per issue virtio-10. > > Signed-off-by: Rusty Russell <rusty@au1.ibm.com> > > diff --git a/virtio-v1.0-wd01-part1-specification.txt b/virtio-v1.0-wd01-part1-specification.txt > index 1e424d0..ceac80a 100644 > --- a/virtio-v1.0-wd01-part1-specification.txt > +++ b/virtio-v1.0-wd01-part1-specification.txt > @@ -268,25 +268,33 @@ is not a significant issue. > > 2.1.4.2. Message Framing > ----------------------- > -The original intent of the specification was that message framing (the > -particular layout of descriptors) be independent of the contents of > -the buffers. For example, a network transmit buffer consists of a 12 > -byte header followed by the network packet. This could be most simply > -placed in the descriptor table as a 12 byte output descriptor followed > -by a 1514 byte output descriptor, but it could also consist of a > -single 1526 byte output descriptor in the case where the header and > -packet are adjacent, or even three or more descriptors (possibly with > -loss of efficiency in that case). > +The message framing (the particular layout of descriptors) is > +independent of the contents of the buffers. For example, a network > +transmit buffer consists of a 12 byte header followed by the network > +packet. This could be most simply placed in the descriptor table as a > +12 byte output descriptor followed by a 1514 byte output descriptor, > +but it could also consist of a single 1526 byte output descriptor in > +the case where the header and packet are adjacent, or even three or > +more descriptors (possibly with loss of efficiency in that case). > + > +Note that, some implementations may have large-but-reasonable > +restrictions on total descriptor size (such as based on IOV_MAX in the > +host OS). This has not been a problem in practice: little sympathy > +will be given to drivers which create unreasonably-sized descriptors > +such as by dividing a network packet into 1500 single-byte > +descriptors! > + > +2.1.4.2.1. Legacy Interface: Message Framing > +----------------------- > > Regrettably, initial driver implementations used simple layouts, and > -devices came to rely on it, despite this specification wording[10]. It > -is thus recommended that drivers be conservative in their assumptions, > -unless the VIRTIO_F_ANY_LAYOUT feature is accepted. In addition, some > -implementations may have large-but-reasonable restrictions on total > -descriptor size (such as based on IOV_MAX in the host OS). This has > -not been a problem in practice: little sympathy will be given to > -drivers which create unreasonably-sized descriptors such as by > -dividing a network packet into 1500 single-byte descriptors! > +devices came to rely on it, despite this specification wording. In > +addition, the specification for virtio_blk SCSI commands required > +intuiting field lengths from frame boundaries (see "2.4.2.5.1 Legacy > +Interface: Device Operation") > + > +It is thus recommended that Maybe add: " when using legacy interfaces, transitional " to stress that it does not need to be checked in non legacy setting. > drivers be conservative in their > +assumptions, unless the VIRTIO_F_ANY_LAYOUT feature is accepted. > > 2.1.4.3. The Virtqueue Descriptor Table > -------------------------------------- > @@ -1193,7 +1201,7 @@ features. > VIRTIO_NET_F_MAC (5) Device has given MAC address. > > VIRTIO_NET_F_GSO (6) (Deprecated) device handles packets with > - any GSO type.[12] > + any GSO type.[11] > > VIRTIO_NET_F_GUEST_TSO4 (7) Guest can receive TSOv4. > This generated a huge diff because of renumbering. What I generally do is take out a chunk and commit, then make renumbering a separate commit, it does not need review :) If I add a chunk, and numbering does not fit naturally, I add it with a random huge number like 123 - this makes it possible to add cross-references directly. Again commit then re-number separately. This also makes rebasing easier for people: they can see the conflict is with a re-numbering patch. > @@ -1267,16 +1275,16 @@ features. > packets by negotating the VIRTIO_NET_F_CSUM feature. This “ > checksum offload” is a common feature on modern network cards. > > -7. If that feature is negotiated[13], a driver can use TCP or UDP > +7. If that feature is negotiated[12], a driver can use TCP or UDP > segmentation offload by negotiating the VIRTIO_NET_F_HOST_TSO4 (IPv4 > TCP), VIRTIO_NET_F_HOST_TSO6 (IPv6 TCP) and VIRTIO_NET_F_HOST_UFO > (UDP fragmentation) features. It should not send TCP packets > requiring segmentation offload which have the Explicit Congestion > Notification bit set, unless the VIRTIO_NET_F_HOST_ECN feature is > - negotiated.[14] > + negotiated.[13] > > 8. The converse features are also available: a driver can save > - the virtual device some work by negotiating these features.[15] > + the virtual device some work by negotiating these features.[14] > The VIRTIO_NET_F_GUEST_CSUM feature indicates that partially > checksummed packets can be received, and if it can do that then > the VIRTIO_NET_F_GUEST_TSO4, VIRTIO_NET_F_GUEST_TSO6, > @@ -1328,7 +1336,7 @@ the different features the driver negotiated. > and > > • csum_offset indicates how many bytes after the csum_start the > - new (16 bit ones' complement) checksum should be placed.[16] > + new (16 bit ones' complement) checksum should be placed.[15] > > 2. If the driver negotiated > VIRTIO_NET_F_HOST_TSO4, TSO6 or UFO, and the packet requires > @@ -1341,21 +1349,21 @@ the different features the driver negotiated. > > • hdr_len is a hint to the device as to how much of the header > needs to be kept to copy into each packet, usually set to the > - length of the headers, including the transport header.[17] > + length of the headers, including the transport header.[16] > > • gso_size is the maximum size of each packet beyond that > header (ie. MSS). > > • If the driver negotiated the VIRTIO_NET_F_HOST_ECN feature, > the VIRTIO_NET_HDR_GSO_ECN bit may be set in “gso_type” as > - well, indicating that the TCP packet has the ECN bit set.[18] > + well, indicating that the TCP packet has the ECN bit set.[17] > > 3. If the driver negotiated the VIRTIO_NET_F_MRG_RXBUF feature, > the num_buffers field is set to zero. > > 4. The header and packet are added as one output buffer to the > transmitq, and the device is notified of the new entry (see "2.4.1.4. > - Notifying The Device").[19] > + Notifying The Device").[18] > > 2.4.1.5.1.1. Packet Transmission Interrupt > ----------------------------------------- > @@ -1380,7 +1388,7 @@ VIRTIO_NET_F_GUEST_UFO features are used, the Guest will need to > accept packets of up to 65550 bytes long (the maximum size of a > TCP or UDP packet, plus the 14 byte ethernet header), otherwise > 1514. bytes. So unless VIRTIO_NET_F_MRG_RXBUF is negotiated, every > -buffer in the receive queue needs to be at least this length [20] > +buffer in the receive queue needs to be at least this length [19] > > If VIRTIO_NET_F_MRG_RXBUF is negotiated, each buffer must be at > least the size of the struct virtio_net_hdr. > @@ -1480,7 +1488,7 @@ off. The command-specific-data is one byte containing 0 (off) or > #define VIRTIO_NET_CTRL_MAC_TABLE_SET 0 > > The device can filter incoming packets by any number of destination > -MAC addresses.[21] This table is set using the class > +MAC addresses.[20] This table is set using the class > VIRTIO_NET_CTRL_MAC and the command VIRTIO_NET_CTRL_MAC_TABLE_SET. The > command-specific-data is two variable length tables of 6-byte MAC > addresses. The first table contains unicast addresses, and the second > @@ -1652,7 +1660,7 @@ the device (not necessarily in order). Each request is of form: > > The type of the request is either a read (VIRTIO_BLK_T_IN), a write > (VIRTIO_BLK_T_OUT), or a flush (VIRTIO_BLK_T_FLUSH or > -VIRTIO_BLK_T_FLUSH_OUT[22]). If the device has VIRTIO_BLK_F_BARRIER > +VIRTIO_BLK_T_FLUSH_OUT[21]). If the device has VIRTIO_BLK_F_BARRIER > feature the high bit (VIRTIO_BLK_T_BARRIER) indicates that this > request acts as a barrier and that all preceeding requests must be > complete before this one, and all following requests must not be > @@ -1683,12 +1691,12 @@ error or VIRTIO_BLK_S_UNSUPP for a request unsupported by host: > #define VIRTIO_BLK_S_IOERR 1 > #define VIRTIO_BLK_S_UNSUPP 2 > > +2.4.2.5.1 Legacy Interface: Device Operation > +------------------------ > Historically, devices assumed that the fields type, ioprio and sector > reside in a single, separate read-only buffer and the status field is > a separate read-only buffer of size 1 byte, by itself. > > -2.4.2.5.1 Legacy Interface: Device Operation > ------------------------- > If the device has VIRTIO_BLK_F_SCSI feature, it can also support > scsi packet command requests, each of these requests is of form: > > @@ -1826,7 +1834,7 @@ data and outgoing characters are placed in the transmit queue. > ------------------------ > > 1. For output, a buffer containing the characters is placed in > - the port's transmitq.[23] > + the port's transmitq.[22] > > 2. When a buffer is used in the receiveq (signalled by an > interrupt), the contents is the input to the port associated > @@ -1970,7 +1978,7 @@ configuration change interrupt. > 2. To supply memory to the balloon (aka. inflate): > > (a) The driver constructs an array of addresses of unused memory > - pages. These addresses are divided by 4096[24] and the descriptor > + pages. These addresses are divided by 4096[23] and the descriptor > describing the resulting 32-bit array is added to the inflateq. > > 3. To remove memory from the balloon (aka. deflate): > @@ -1985,11 +1993,11 @@ configuration change interrupt. > > (c) Otherwise, the guest may begin to re-use pages previously > given to the balloon before the device has acknowledged their > - withdrawl. [25] > + withdrawl. [24] > > 4. In either case, once the device has completed the inflation or > deflation, the “actual” field of the configuration should be > - updated to reflect the new number of pages in the balloon.[26] > + updated to reflect the new number of pages in the balloon.[25] > > 2.4.5.6.1. Memory Statistics > --------------------------- > @@ -2577,10 +2585,7 @@ contents of the event field. The following events are defined: > 2.5. Reserved Feature Bits > ========================= > > -Currently there are four device-independent feature bits defined: > - > - VIRTIO_F_ANY_LAYOUT (27) This feature indicates that the device accepts arbitrary > - descriptor layouts, as described in Section "2.1.4.2. Message Framing". > +Currently there are three device-independent feature bits defined: > > VIRTIO_F_RING_INDIRECT_DESC (28) Negotiating this feature indicates > that the driver can use descriptors with the VRING_DESC_F_INDIRECT > @@ -2621,6 +2626,10 @@ VIRTIO_F_NOTIFY_ON_EMPTY (24) Negotiating this feature > using a timer if the device interrupts it when all the packets > are transmitted. > > +VIRTIO_F_ANY_LAYOUT (27) This feature indicates that the device > + accepts arbitrary descriptor layouts, as described in Section > + "2.1.4.2.1. Legacy Interface: Message Framing". > + Is this on top of a patch making NOTIFY_ON_EMPTY legacy? We should also move BAD_FEATURE here. > 2.6. virtio_ring.h > ================= > > @@ -2851,7 +2860,7 @@ altogether. > > Any change to configuration space, or new virtqueues, or > behavioural changes, should be indicated by negotiation of a new > -feature bit. This establishes clarity[11] and avoids future expansion problems. > +feature bit. This establishes clarity[10] and avoids future expansion problems. > > Clusters of functionality which are always implemented together > can use a single bit, but if one feature makes sense without the > @@ -2895,34 +2904,28 @@ of this expected condition is necessary. > > [9] https://lists.linux-foundation.org/mailman/listinfo/virtualization > > -[10] It was previously asserted that framing should be independent of message > -contents, yet invariably drivers layed out messages in reliable ways and > -devices assumed it. > -In addition, the specifications for virtio_blk and virtio_scsi require > -intuiting field lengths from frame boundaries. > - > -[11] Even if it does mean documenting design or implementation > +[10] Even if it does mean documenting design or implementation > mistakes! > > > -[12] It was supposed to indicate segmentation offload support, but > +[11] It was supposed to indicate segmentation offload support, but > upon further investigation it became clear that multiple bits > were required. > > -[13] ie. VIRTIO_NET_F_HOST_TSO* and VIRTIO_NET_F_HOST_UFO are > +[12] ie. VIRTIO_NET_F_HOST_TSO* and VIRTIO_NET_F_HOST_UFO are > dependent on VIRTIO_NET_F_CSUM; a dvice which offers the offload > features must offer the checksum feature, and a driver which > accepts the offload features must accept the checksum feature. > Similar logic applies to the VIRTIO_NET_F_GUEST_TSO4 features > depending on VIRTIO_NET_F_GUEST_CSUM. > > -[14] This is a common restriction in real, older network cards. > +[13] This is a common restriction in real, older network cards. > > -[15] For example, a network packet transported between two guests on > +[14] For example, a network packet transported between two guests on > the same system may not require checksumming at all, nor segmentation, > if both guests are amenable. > > -[16] For example, consider a partially checksummed TCP (IPv4) packet. > +[15] For example, consider a partially checksummed TCP (IPv4) packet. > It will have a 14 byte ethernet header and 20 byte IP header > followed by the TCP header (with the TCP checksum field 16 bytes > into that header). csum_start will be 14+20 = 34 (the TCP > @@ -2932,26 +2935,26 @@ of the TCP pseudo header, so that replacing it by the ones' > complement checksum of the TCP header and body will give the > correct result. > > -[17] Due to various bugs in implementations, this field is not useful > +[16] Due to various bugs in implementations, this field is not useful > as a guarantee of the transport header size. > > -[18] This case is not handled by some older hardware, so is called out > +[17] This case is not handled by some older hardware, so is called out > specifically in the protocol. > > -[19] Note that the header will be two bytes longer for the > +[18] Note that the header will be two bytes longer for the > VIRTIO_NET_F_MRG_RXBUF case. > > -[20] Obviously each one can be split across multiple descriptor > +[19] Obviously each one can be split across multiple descriptor > elements. > > -[21] Since there are no guarentees, it can use a hash filter or > +[20] Since there are no guarentees, it can use a hash filter or > silently switch to allmulti or promiscuous mode if it is given too > many addresses. > > -[22] The FLUSH and FLUSH_OUT types are equivalent, the device does not > +[21] The FLUSH and FLUSH_OUT types are equivalent, the device does not > distinguish between them > > -[23] Because this is high importance and low bandwidth, the current > +[22] Because this is high importance and low bandwidth, the current > Linux implementation polls for the buffer to be used, rather than > waiting for an interrupt, simplifying the implementation > significantly. However, for generic serial ports with the > @@ -2959,9 +2962,9 @@ O_NONBLOCK flag set, the polling limitation is relaxed and the > consumed buffers are freed upon the next write or poll call or > when a port is closed or hot-unplugged. > > -[24] This is historical, and independent of the guest page size > +[23] This is historical, and independent of the guest page size > > -[25] In this case, deflation advice is merely a courtesy > +[24] In this case, deflation advice is merely a courtesy > > -[26] As updates to configuration space are not atomic, this field > +[25] As updates to configuration space are not atomic, this field > isn't particularly reliable, but can be used to diagnose buggy guests.
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]