OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [virtio-comment] virtio spec change proposal





From: "Michael S. Tsirkin" <mst@redhat.com>
To: "Yuri Benditovich" <ybendito@redhat.com>
Cc: virtio-comment@lists.oasis-open.org, "Jason Wang" <jasowang@redhat.com>
Sent: Wednesday, October 10, 2018 10:21:34 PM
Subject: Re: [virtio-comment] virtio spec change proposal

On Wed, Oct 10, 2018 at 01:10:33PM -0400, Yuri Benditovich wrote:
>
>
> âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
>
>     From: "Michael S. Tsirkin" <mst@redhat.com>
>     To: "Yuri Benditovich" <ybendito@redhat.com>
>     Cc: virtio-comment@lists.oasis-open.org
>     Sent: Wednesday, October 10, 2018 5:28:13 PM
>     Subject: Re: [virtio-comment] virtio spec change proposal
>
>     On Wed, Oct 10, 2018 at 06:37:57AM -0400, Yuri Benditovich wrote:
>     >
>     >
>     >
>     âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
>     >
>     >     From: "Michael S. Tsirkin" <mst@redhat.com>
>     >     To: "Yuri Benditovich" <ybendito@redhat.com>
>     >     Cc: virtio-comment@lists.oasis-open.org
>     >     Sent: Wednesday, October 10, 2018 4:54:07 AM
>     >     Subject: Re: [virtio-comment] virtio spec change proposal
>     >
>     >     On Tue, Oct 09, 2018 at 06:26:53PM -0400, Yuri Benditovich wrote:
>     >     >
>     >     >
>     >     >
>     >    
>     âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
>     >     >
>     >     >     From: "Michael S. Tsirkin" <mst@redhat.com>
>     >     >     To: "Yuri Benditovich" <ybendito@redhat.com>
>     >     >     Cc: virtio-comment@lists.oasis-open.org
>     >     >     Sent: Tuesday, October 9, 2018 6:08:00 PM
>     >     >     Subject: Re: [virtio-comment] virtio spec change proposal
>     >     >
>     >     >     On Tue, Oct 02, 2018 at 09:43:25AM -0400, Yuri Benditovich
>     wrote:
>     >     >     > Hello all,
>     >     >     >
>     >     >     > We'd like to propose an extension of VIRTIO specification 1.0
>     >     >     > Please let us know if there are some objections.
>     >     >
>     >     >     I like the proposal. Some comments below, hope this helps with
>     >     >     preparing the spec patch.
>     >     >
>     >     >     > In case there is no objections we will prepare and send
>     respective
>     >     patch.
>     >     >     >
>     >     >     > Feature: RSC (receive side coalescing) compatibility with
>     Windows
>     >     HCK
>     >     >     (hardware
>     >     >     > compatibility kit) requiremenets.
>     >     >     >
>     >     >     > Description:
>     >     >     >
>     >     >     > 1. Current state:
>     >     >     > VIRTIO device supports RSC, i.e. able to coalesce TCP/UDP v4
>     and v6
>     >     >     fragments
>     >     >     > and build complete packet from them.
>     >     >     > This feature is controlled by VIRTIO_NET_F_GUEST_TSO4,
>     >     >     VIRTIO_NET_F_GUEST_TSO6,
>     >     >     > VIRTIO_NET_F_GUEST_UFO feature bits.
>     >     >     > When it is enabled and received packet was built from
>     fragments,
>     >     the
>     >     >     device
>     >     >     > indicates that by setting field 'gso_type' field to one of
>     >     >     > VIRTIO_NET_HDR_GSO_TCPV4, VIRTIO_NET_HDR_GSO_TCPV6 or
>     >     >     VIRTIO_NET_HDR_GSO_UDP.
>     >     >     > Functionally this is good enough, but such an implementation
>     is not
>     >     >     compatible
>     >     >     > with Windows certification requirements, resulting failures
>     in
>     >     Windows
>     >     >     > certification tests.
>     >     >     >
>     >     >     > 2. Expectations of Windows HCK testing software.
>     >     >     > Rules for coalescing are described in
>     >     >     > https://docs.microsoft.com/en-us/windows-hardware/drivers/
>     network/
>     >     >     > rules-for-coalescing-tcp-ip-packets
>     >     >     > Guidelines for indication of coalesced segments
>     >     >     > https://docs.microsoft.com/en-us/windows-hardware/drivers/
>     network/
>     >     >     > indicating-coalesced-segments
>     >     >     > These guidelines are applicable to the network driver, but
>     the
>     >     network
>     >     >     driver
>     >     >     > shall be able to extract
>     >     >     > the required information from the received packet. The
>     information
>     >     >     required for
>     >     >     > the driver defined in
>     >     >     > https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi
>     /
>     >     content/
>     >     >     ndis/
>     >     >     > ns-ndis-_ndis_rsc_nbl_info
>     >     >     > and includes following per-packet information:
>     >     >     > - number of coalesced segments that were combined in the
>     packet
>     >     >     > - number of duplicated ACKs that were encountered while
>     received
>     >     packet
>     >     >     was
>     >     >     > combined
>     >     >     >
>     >     >     > Current implementation of RSC does not provide such
>     information in
>     >     the
>     >     >     packet.
>     >     >     >
>     >     >     > 3. Existing workaround
>     >     >     > There is a workaround in the field for the problem described
>     above.
>     >     >     > It uses feature bits 41 and 42 for control of TCP v4 and TCP
>     v6 RSC
>     >     and
>     >     >     > provides required information in the virtio header.
>     >     >     > For that it extends existing structure of virtio header from
>     >     regular 12
>     >     >     bytes
>     >     >     > to 16 bytes in case one or both of these feature bits are
>     >     acknowledged.
>     >     >     > The disadvantage of this solution is that such an extension
>     of
>     >     packet
>     >     >     header
>     >     >     > may complicate further extensions of the specification.
>     >     >
>     >     >     So do we need to reserve bits 41 and 42 for now, as well?
>     >     >     Can be a separate patch.
>     >     >
>     >     > Yes, this would be good to reserve them (if this change
>     >     > is acceptable) or define them for the same RSC info
>     >     > purpose with extended header (as the production driver expects
>     >     >  from 2016). If the change will be accepted, we'll change
>     >     > the driver and this will take several years until we
>     >     > consider only obsolete drivers use it).
>     >     >
>     >     >
>     >     >
>     >     >     > 4. Proposal of specification change.
>     >     >     >
>     >     >     > a) Taking in account the workaround (3) declare feature bits
>     41 and
>     >     42 as
>     >     >     > reserved.
>     >     >     >
>     >     >     > b) Define feature bits 43 for extended RSC information in
>     coalesced
>     >     >     packets
>     >     >     > (VIRTIO_NET_F_GUEST_RSC)
>     >     >
>     >     >     Maybe VIRTIO_NET_F_GUEST_RSC_INFO? This does not control
>     whether to
>     >     do RSC.
>     >     >     Or maybe VIRTIO_NET_F_GUEST_RSC_EXTENDED? This allows
>     coalescing
>     >     >     of duplicate acks and variable sized packets ...
>     >     >
>     >     >
>     >     > VIRTIO_NET_F_GUEST_RSC_INFO seems
>     >
>     >     My point is it's not just info, it also controls whether card
>     >     will coalesce acks etc, right?
>     >
>     >
>     > OK, let's call it VIRTIO_NET_F_GUEST_RSC_EXTENDED.
>     >
>     >
>     >
>     >
>     >     >
>     >     >     > c) Define additional flag bit 2 (value = 4) that the device
>     can set
>     >     in
>     >     >     'flags'
>     >     >     > field of existing virtio header with meaning of
>     >     >     VIRTIO_NET_HDR_F_RSC_INFO. The
>     >     >     > device sets this bit only in coalesced packets and only if
>     feature
>     >     bits
>     >     >     43 was
>     >     >     > acknowledged.
>     >     >
>     >     >     And not valid for driver, right?
>     >     >
>     >     > Can you please clarify, what is not valid for driver?
>     >
>     >     I mean for when driver inits the header for the outgoing packets.
>     >
>     >     > I mean is case the feature (43) is reported by host/device
>     >     > and acked by the driver (for example, the driver of the day
>     >     > never acks this feature being unaware of it). But current
>     >     > drivers never look at mask 0x04 in flags.
>     >
>     >     I mean VIRTIO_NET_HDR_F_RSC_INFO is never set by driver.
>     >
>     > Of course the driver must not use this flag on TX and
>     > the device must ignore it on TX.
>     > I would not like to state 'if the driver sets this flag
>     > on TX the behaviour is unpredictable'.
>
>
>     We can state driver MUST NOT set it.
>
>     >
>     >
>     >
>     >     >
>     >     >
>     >     >     > If VIRTIO_NET_HDR_F_RSC_INFO is set in the packet, the
>     device:
>     >     >     >
>     >     >     > c.1) populates 16-bit value of 'number of coalesced segments'
>     in
>     >     >     'csum_start'
>     >     >     > field of virtio header
>     >     >     >
>     >     >     > c.2) populates 16-bits value of 'number of duplicated ACKs'
>     in
>     >     >     'csum_offset'
>     >     >     > field of virtio header
>     >     >
>     >     >     Maybe its a good time to document behaviour when the bit
>     >     >     is not set:  
>     >     >     - duplicate acks not coalesced (number = 0)
>     >     >     - only full MTU segments coalesced (number = round_up(len /
>     mtu))
>     >     >
>     >
>     > No problem.
>     >
>     >
>     >
>     >     >
>     >     > If duplicate acks even coalesced this is not reported and
>     >     > we can't guess what exactly happens.
>     >
>     >     I don't think current devices coalesce them.
>     >     We never specified the actual semantics but I think
>     >     it was also the assumption that it matches Linux GSO and
>     >     IIUC that implies behaviour that does not discard information.
>     >
>     >     I think if you can't keep count you should only coalesce
>     >     MTU sized packets. This ensures that
>     >     - acks are not coalesced
>     >     - packets can be re-fragmented without loss
>     >
>     >     and this in turn allows things like bridging within guest
>     >     so keep the feature enabled.
>     >
>     >
>     >
>     >
>     >     >     > d) The device should populate proper checksum in coalesced
>     packets
>     >     >     > (VIRTIO_NET_HDR_F_DATA_VALID should be set). If due to some
>     reason
>     >     the
>     >     >     checksum
>     >     >     > is not populated, VIRTIO_NET_HDR_F_DATA_VALID must be not
>     set. In
>     >     case
>     >     >     > VIRTIO_NET_HDR_F_DATA_VALID is not set and
>     >     VIRTIO_NET_HDR_F_RSC_INFO is
>     >     >     set in
>     >     >     > the packet, the driver shall compute packet checksum without
>     rely
>     >     on
>     >     >     checksum
>     >     >     > information in 'csum_start' and 'csum_offset'.
>     >     >
>     >     >     More specifically, please formulate exactly what should device
>     do.
>     >     >
>     >     > As in existing spec, the device should populate the checksum
>     >     > and report it is valid. But not must, IMO.
>     >
>     >     Confused.
>     >     This is an incoming packet with VIRTIO_NET_HDR_F_RSC_INFO.
>     >     Where do you want device to populate the checksum?
>     >
>     >
>     > Checksum (as always) shall be populated in the reported coalesced
>     > TCP packet (IP and TCP checksum) or part of it.
>     > Somebody shall do that anyway - or the device before
>     > completing RX buffer to the guest or driver if there
>     > is no indication that the checksum is good.
>
>     Well we can't very well have driver just overwrite the checksum without
>     anyone validating it, right? That would defeat it's purpose, won't it?
>     VIRTIO_NET_HDR_F_DATA_VALID means basically "checksum is wrong but
>     ignore that, data is fine". Without that, I guess there is the
>     assumption that device validated the checksums and dropped packets with
>     an invalid checksum so driver can overwrite it?
>
> VIRTIO_NET_HDR_F_DATA_VALID means basically "checksum is wrong but
> ignore that, data is fine"??????

Sorry I mean the checksum fields in the header.


> And if it not set, what does it mean then? That the data is bad?

Depends on VIRTIO_NET_HDR_F_NEEDS_CSUM.
If that is set then you can use csum_start/csum_offset.

If that is not set then you need to parse the packet to
find the checksum.


> 5.1.6.4 in existing spec:
>
> If the VIRTIO_NET_F_GUEST_CSUM feature was negotiated,
>
> the VIRTIO_NET_HDR_F_DATA_VALID bit in flags can be set:
>
> if so, device has validated the packet checksum.

Yea :( Well IIRC an undocumented implied part here that hypervisors rely
on is that right now if you use this flag it's ok for device to corrupt
the checksum after validation, driver is assumed to ignore the checksum
in this case. If bridging, it re-calculates checksum on TX.
I dod not understand how can we know the checksum
is corrupted and expect the driver will ignore it.
This is the case for VIRTIO_NET_HDR_F_NEEDS_CSUM, IMO
And why this is good to corrupt the CS??

Jason implemented that part.  Right Jason?

We should I think get around to documenting this at some point.
And I wonder whether that extra bit can also be allowed with the
RSC flag. If yes then it's in a sense a reverse of what
you are writing. IOW with DATA_VALID off, can this mean
that packet is fully checksummed?
Inversion does not add any new information.
If this helps us to encode everything we need - no problem.

My intention was to introduce extra bit
only with extended RSC to make clear that checksum
fields in the header do not hold checksum info.
Such a way we maintain backward compatibility.
Otherwise this may create some problems to existing drivers.

>
>
> Of course, the driver can just overwrite the checksum
> without any validation (NOT today's driver).
> But this does not make sense to do the same work twice,
> this requires significant resources at 60Gbps.
> Just indicate whether the device is responsible
> for the checksum or not.
>
> If the device drops packets with invalid checksum, this
> present serious problem in certification. If the
> software wants to send packets with invalid checksum,
> it has a right to. If the device knows to check the checksum,
> it shall indicate the result of the check, that's all.
> More than that shall be configurable, IMO.

The virtio spec allows devices to pass in packets with
an invalid checksum. That is fine. But is there a need
to also *coalesce* such packets? It would be much
clearer in my mind if we ask that packets are
only coalesced after checksum is validate and
found to be valid.

According to
https://docs.microsoft.com/en-us/windows-hardware/drivers/network/exception-conditions-that-terminate-coalescing
Invalid checksum should terminate coalescing, deliver previous healthy coalesced segments,
one caused exception deliver separately as non-coalesced (GSO_NONE).
All checksum headers fields are clear indicating that there is nothing good about checksum
Let's just say it in virtio terms: the new flag must not be
set together with VIRTIO_NET_HDR_F_NEEDS_CSUM - does
this summarize it correctly?
No, I think.
AFAIU you want to make device checksumming of coalesced packet optional.
OK.
Then it shall indicate GSO_4 + RSC + DATA_VALID - checksum ok
Then it shall indicate GSO_4  + RSC + VIRTIO_NET_HDR_F_NEEDS_CSUM - packet ok, I did not checkum it, DIY
(driver will parse entire packet anyway)
GSO_NONE + no flags - invalid checksum in packet, delivered separately
GSO_NONE + no RSC + other combinations - as before 
 

 >     > https://docs.microsoft.com/en-us/windows-hardware/drivers/network/
>     > rules-for-coalescing-tcp-ip-packets
>     >
>     > Now it is less confusing?
>
>     That seems to describe an interface between driver and OS.
>     Not sure how it's relevant.
>
> At this link there is almost nothing about driver,
> there is formal rules how packets must be coalesced
> and this states that the packets shall be provided
> with properly calculated checksum.
>


Is there a problem if we just ask that devices don't coalesce
packets if checksum does not match?
Commented above, not a problem, real case


>
>     >
>     >     >
>     >     >
>     >     >     > e) new feature VIRTIO_NET_F_GUEST_RSC depends on
>     >     VIRTIO_NET_F_CTRL_VQ,
>     >     >     > VIRTIO_NET_F_CTRL_GUEST_OFFLOADS and (VIRTIO_NET_F_GUEST_TSO4
>     or
>     >     >     > VIRTIO_NET_F_GUEST_TSO6).
>
>     Wrt dependency, how does it interfact with dynamic guest offloads?
>     I guess we need to allow it anyway, and then have it be ignored
>     without respective offloads?
>
> No problem, the driver will not enable it without
> dynamic offloads as this is the OS requirement to
> control VIRTIO_NET_F_GUEST_TSO4 and and VIRTIO_NET_F_GUEST_TSO6
> dynamically.
> So, if one of them is turned off,
> this turns off also extended RSC for respective type of packets.
> (related to 5.1.6.5.6.1 Setting Offloads State)
>
> Are there still some problems or I can start with the patch?
>
> Thanks,
> Yuri

I think you can start with a github issue and a patch, these are all
minor things, that can be resolved during patch review.


>
>     >     >     >
>     >     >     > Thanks,
>     >     >     > Yuri Benditovich,
>     >     >     > Red Hat, Inc.
>     >     >     >
>     >     >
>     >     >     This publicly archived list offers a means to provide input to
>     the
>     >     >     OASIS Virtual I/O Device (VIRTIO) TC.
>     >     >
>     >     >     In order to verify user consent to the Feedback License terms
>     and
>     >     >     to minimize spam in the list archive, subscription is required
>     >     >     before posting.
>     >     >
>     >     >     Subscribe: virtio-comment-subscribe@lists.oasis-open.org
>     >     >     Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
>     >     >     List help: virtio-comment-help@lists.oasis-open.org
>     >     >     List archive: https://lists.oasis-open.org/archives/
>     virtio-comment/
>     >     >     Feedback License: https://www.oasis-open.org/who/ipr/
>     >     feedback_license.pdf
>     >     >     List Guidelines: https://www.oasis-open.org/policies-guidelines
>     /
>     >     >     mailing-lists
>     >     >     Committee: https://www.oasis-open.org/committees/virtio/
>     >     >     Join OASIS: https://www.oasis-open.org/join/
>     >     >
>     >     >
>     >     >
>     >
>     >
>
>     This publicly archived list offers a means to provide input to the
>     OASIS Virtual I/O Device (VIRTIO) TC.
>
>     In order to verify user consent to the Feedback License terms and
>     to minimize spam in the list archive, subscription is required
>     before posting.
>
>     Subscribe: virtio-comment-subscribe@lists.oasis-open.org
>     Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
>     List help: virtio-comment-help@lists.oasis-open.org
>     List archive: https://lists.oasis-open.org/archives/virtio-comment/
>     Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
>     List Guidelines: https://www.oasis-open.org/policies-guidelines/
>     mailing-lists
>     Committee: https://www.oasis-open.org/committees/virtio/
>     Join OASIS: https://www.oasis-open.org/join/
>
>
>



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]