OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-dev message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [virtio] Re: [PATCH v8 08/16] packed virtqueues: more efficient virtqueue layout


[..]
>> On 02/16/2018 08:24 AM, Michael S. Tsirkin wrote:
>>> Performance analysis of this is in my kvm forum 2016 presentation.  The
>>> idea is to have a r/w descriptor in a ring structure, replacing the used
>>> and available ring, index and descriptor buffer.
>>>
>>> This is also easier for devices to implement than the 1.0 layout.
>>> Several more enhancements will be necessary to actually make this
>>> efficient for devices to use.
>>>
>>> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>>> ---
>>>  content.tex     |  28 ++-
>>>  packed-ring.tex | 646 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>  2 files changed, 671 insertions(+), 3 deletions(-)
>>>  create mode 100644 packed-ring.tex
[..]
>>> +
>>> +\subsection{Element Address and Length}
>>> +\label{sec:Packed Virtqueues / Element Address and Length}
>>> +
>>> +In an available descriptor, Element Address corresponds to the
>>> +physical address of the buffer element. The length of the element assumed
>>> +to be physically contigious is stored in Element Length.
>>> +
>>> +In a used descriptor, Element Address is unused. Element Length
>>> +specifies the length of the buffer that has been initialized
>>> +(written to) by the device.
>>> +
>>> +Element length is reserved for used descriptors without the
>>> +VIRTQ_DESC_F_WRITE flag, and is ignored by drivers.
>>> +
>>> +\subsection{Scatter-Gather Support}
>>
>> [Consistent wording] Both types of virtqueues support scatter-gather
>> but the term is used only for packed. Maybe we could unify the wording.
>> Or is this something for later?
> 
> I'll take a look but this can be safely done later too.
> 
> 

Agreed.

[..]
>>> +
>>> +\subsection{Multi-buffer requests}
>>> +\label{sec:Packed Virtqueues / Multi-descriptor batches}
>>> +Some devices combine multiple buffers as part of processing of a
>>> +single request.  These devices always mark the first descriptor
>>> +in the request used after the rest of the descriptors in the
>>> +request has been written out into the ring. This guarantees that
>>> +the driver will never observe a partial request in the ring.
>>> +
>>
>> I see you've changed s/in the request available/in the request used/.
>> But I still don't understand this paragraph. I will try to figure
>> it out later (and will come back to you if I fail).
> 
> FYI this applies to mergeable buffers for the network device.
> 
> 

Yeah, was my understanding to, but I will have to look into the
details starting from there. Will come back to you if I can't
clear it up for myself.

[..]
>>> +
>>> +\devicenormative{\subsection}{The Virtqueue Descriptor Table}{Basic Facilities of a Virtio Device / Packed Virtqueues / The Virtqueue Descriptor Table}
>>> +A device MUST NOT write to a device-readable buffer, and a device SHOULD NOT
>>> +read a device-writable buffer.
>>> +A device MUST NOT use a descriptor unless it observes
>>> +VIRTQ_DESC_F_AVAIL bit in its \field{flags} being changed.
>>
>> I don't really understand this. How does the device observe
>> the VIRTQ_DESC_F_AVAIL bit being changed?
> 
> By reading the descriptor.
> 

:) My point is: to observe a change one usually either needs at
least one reading before and at least one reading after the change,
or one needs to know that a certain reading means change. The latter
is possible if we know that at the beginning of the time frame under
consideration (t_0) only a certain set of values,let's say B like before,
is possible, and after the change only a certain other set of values
let's say A like after, is possible, and A and B are disjunctive (
$A \cap B = \emtyset$).

I guess here the latter is supposed to be the case. But then I think
we need a more detailed description here. Please see also my other email
(response to Jens).

[..]
>>> +Suppression Structure Format}.
>>> +
>>> +\begin{note}
>>> +For optimal performance, a driver MAY disable interrupts while processing
>>> +the used buffers, but beware the problem of missing interrupts between
>>> +emptying the ring and reenabling interrupts.  This is usually handled by
>>> +re-checking for more used buffers after interrups are re-enabled:
>>> +\end{note}
>>> +
>>> +\begin{lstlisting}
>>> +vq->driver_event.flags = 0x2;
>>> +
>>> +for (;;) {
>>> +        struct virtq_desc *d = vq->desc[vq->next_used];
>>> +
>>> +        flags = d->flags;
>>> +        bool avail = flags & (1 << VIRTQ_DESC_F_AVAIL);
>>> +        bool used = flags & (1 << VIRTQ_DESC_F_USED);
>>> +
>>> +        if (avail != used) {
>>
>> I don't understand the condition which is AFAIU supposed to
>> correspond to the descriptor *not* being used.
> 
> So avail == used means used. avail != used means available.
> 

Please see the follow up with Jens.

>>> +                vq->driver_event.flags = 0x1;
>>> +                memory_barrier();
>>> +
>>> +                flags = d->flags;
>>> +                bool avail = flags & (1 << VIRTQ_DESC_F_AVAIL);
>>> +                bool used = flags & (1 << VIRTQ_DESC_F_USED);
>>> +                if (avail != used) {
>>> +                        break;
>>> +                }
>>> +
>>> +                vq->driver_event.flags = 0x2;
>>> +        }
>>> +
>>> +	read_memory_barrier();
[..]
>> I'm pretty much confused on how this scheme with the available
>> and used wrap counters (or device and driver wrap counters is
>> supposed to work). A working implementation in C would really help
>> me to understand this.
> 
> DPDK based implementation has been posted.
>

Thank you very much for the hint. Slipped past me unfortunately.

Regards,
Halil
 
>>> +        process_buffer(d);
>>> +        vq->next_used++;
>>> +        if (vq->next_used >= vq->size) {
>>> +                vq->next_used = 0;
>>> +        }
>>> +}
>>> +\end{lstlisting}
>>>
> 
> ---------------------------------------------------------------------
> To unsubscribe from this mail list, you must leave the OASIS TC that 
> generates this mail.  Follow this link to all your TCs in OASIS at:
> https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php 
> 



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]