OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-dev message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [virtio-dev] Re: [virtio] [PATCH v7 08/11] packed virtqueues: more efficient virtqueue layout


On Tue, Jan 30, 2018 at 09:40:35PM +0200, Michael S. Tsirkin wrote:
> On Tue, Jan 30, 2018 at 02:50:44PM +0100, Cornelia Huck wrote:
> > On Tue, 23 Jan 2018 02:01:07 +0200
> > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > 
> > > Performance analysis of this is in my kvm forum 2016 presentation.  The
> > > idea is to have a r/w descriptor in a ring structure, replacing the used
> > > and available ring, index and descriptor buffer.
> > > 
> > > This is also easier for devices to implement than the 1.0 layout.
> > > Several more enhancements will be necessary to actually make this
> > > efficient for devices to use.
> > > 
> > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > ---
> > >  content.tex     |  25 ++-
> > >  packed-ring.tex | 678 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > >  2 files changed, 700 insertions(+), 3 deletions(-)
> > >  create mode 100644 packed-ring.tex
> > 
> > (...)
> > 
> > > +\subsubsection{Driver notifications}
> > > +\label{sec:Packed Virtqueues / Driver notifications}
> > > +Whenever not suppressed by Device Event Suppression,
> > > +driver is required to notify the device after
> > > +making changes to the virtqueue.
> > > +
> > > +Some devices benefit from ability to find out the number of
> > > +available descriptors in the ring, and whether to send
> > > +interrupts to drivers without accessing virtqueue in memory:
> > > +for efficiency or as a debugging aid.
> > > +
> > > +To help with these optimizations, driver notifications
> > > +to the device include the following information:
> > > +
> > > +\begin{itemize}
> > > +\item VQ number
> > > +\item Offset (in units of descriptor size) within the ring
> > > +      where the next available descriptor will be written
> > > +\item Wrap Counter referring to the next available
> > > +      descriptor
> > > +\end{itemize}
> > > +
> > > +Note that driver can trigger multiple notifications even without
> > > +making any more changes to the ring. These would then have
> > > +identical \field{Offset} and \field{Wrap Counter} values.
> > 
> > (...)
> > 
> > > +\subsection{Driver Notification Format}\label{sec:Basic
> > > +Facilities of a Virtio Device / Packed Virtqueues / Driver Notification Format}
> > > +
> > > +The following structure is used to notify device of
> > > +device events - i.e. available descriptors:
> > > +
> > > +\begin{lstlisting}
> > > +__le16 vqn;
> > > +__le16 next_off : 15;
> > > +int    next_wrap : 1;
> > > +\end{lstlisting}
> > 
> > (...)
> > 
> > > +\subsubsection{Notifying The Device}\label{sec:Basic Facilities
> > > +of a Virtio Device / Packed Virtqueues / Supplying Buffers to The Device / Notifying The Device}
> > > +
> > > +The actual method of device notification is bus-specific, but generally
> > > +it can be expensive.  So the device MAY suppress such notifications if it
> > > +doesn't need them, using the Driver Event Suppression structure
> > > +as detailed in section \ref{sec:Basic
> > > +Facilities of a Virtio Device / Packed Virtqueues / Event
> > > +Suppression Structure Format}.
> > > +
> > > +The driver has to be careful to expose the new \field{flags}
> > > +value before checking if notifications are suppressed.
> > 
> > This is all I could find regarding notifications, and it leaves me
> > puzzled how notifications are actually supposed to work; especially,
> > where that driver notification structure is supposed to be relayed.
> > 
> > I'm obviously coming from a ccw perspective, but I don't think that pci
> > is all that different (well, hopefully).
> > 
> > Up to now, we notified for a certain virtqueue -- i.e., the device
> > driver notified the device that there is something to process for a
> > certain queue. ccw uses the virtqueue number in a gpr for a hypercall,
> > pci seems to use a write to the config space IIUC. With the packed
> > layout, we have more payload per notification. We should be able to put
> > it in the same gpr than the virtqueue for ccw (if needed, with some
> > compat magic, or with a new hypercall, which would be ugly but doable).
> > Not sure how this is supposed to work with pci.
> > 
> > Has there been any prototyping done to implement this in qemu + KVM?
> > I'm unsure how this will work with ioeventfds, which just trigger.
> 
> The PCI MMIO version would just trigger on access to a specific
> address, ignoring all data in there. PIO would need something
> like a data mask so it can ignore everything except the vq #.
> 
> This is helpful for hardware offloads but I'm open to
> making this PCI specific or deferring until we have
> explicit support for hardware offloads.
> 
> What do you think?
> 

Hi,

I prefer to keep it (at least for PCI) and refine it if
necessary.

Because one of the important goals of packed ring is to
be hardware friendly. Supporting tail pointer is one of
the important things to make it hardware friendly. More
details could be found in Kully's below mail (I've done
some slight reformatting):

----- START -----

why tail pointer is good for hardware implementation:

Assuming no tail pointer:

1. Hardware would have to speculatively read descriptors
   and check their validity by checking that DESC_HW=1.

1.1 Yes Hardware could request a large number of
    descriptors at a time, making the PCIe read
    response transfer (i.e. read descriptors) an
    efficient PCIe transfer.

The problems are as follows:

2. Issue 1: Wasting PCIe bandwidth

2.1 Although the PCIe read responses may be efficient
    transfers, if they contain invalid descriptors
    (DESC_HW=0), we have wasted PCIe bandwidth. This
    can be a problem when trying to maximize the
    performance possible from a design.

3. Issue 2: Wasting Hardware memory resources

3.1 When issuing PCIe read requests for descriptors,
    the hardware must reserve inadvance memory to
    store the descriptors.

3.2 Given PCIe read latencies can be in the order
    of 1us, this memory is reserved for that length
    of time.

3.3 For hardware, 1us is a very long time and for
    FPGAs, memory is not as plentiful/cheap as in
    a PC.

3.4 So reserving memory for descriptors that may
    end up being invalid is a waste. Ultimately,
    this could effect performance if a large number
    of invalid descriptors are being read.

So it is better for hardware to know which queues
(and hence guests) have descriptors available and
fetch those only.

The argument above is biased towards Tx (transfer
of packets from guest to device) but does also apply
for Rx.

Tail pointer resides in the hardware and so the
hardware always knows how many descriptors are
available for each queue (no need to waste PCIe
bandwidth to determine this) and so can fetch
only those valid descriptors.

----- END -----

Best regards,
Tiwei Bie


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]