OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: RE: [virtio-comment] [RFC PATCH] VIRTIO_F_USED_EVENT_AUTO_DISABLE: add new used buffer notification suppression mechanism


> From: virtio-comment@lists.oasis-open.org <virtio-comment@lists.oasis-
> open.org> On Behalf Of Xiaoguang Wang
> Sent: Thursday, January 25, 2024 8:00 AM
> 
> Currently for the PCI transport, used buffer notification suppression
> informations are located in guest memory, for example,
> pvirtq_event_suppress for packed ring and virtq_avail for split ring, then PCI
> devices simulated by hypervisors, such as QEMU, can vist these suppression
> informations directly before issuing used event, the only overheads may be
> just proper memory barriers, but for some hardware PCI devices offered by
> DPU(Data Processing Unit), DPU would need to submit DMAs to copy guest
> suppression informations to DPU internal cache or memory, then check
> whether used event should be notified after examing this internal cache or
> memory, which is obvious expensive and time-consuming. Worsely, if device
> has written some used descriptors, and the above DMA operation has not
> been completed yet, device can not determine whether to issue used event,
> until it gets newest used buffer notification suppression information.
> 
> To help improve this sitiuations, add a new feature:
> VIRTIO_F_USED_EVENT_AUTO_DISABLE. When this feature bit is negotiated,
> devices will place new notification area for used buffer event enable
> notification in PCI bar specfied by virtio_pci_notify_cap, also for better used
> buffer notification coalescing, the device will disable later used buffer
> notifications automatically after sending one used buffer notification to driver,
> and after the driver finishes handling current used buffer event, it needs to
> notify the device to enable used buffer notification again by writing above new
> notification area in PCI bar. Device will receive this notification reliably, and
> device knows that it can issue used event now, but it may choose appropriate
> timing to send used events according to its notification coalescing policy , now
> it also does not need to DMA guest memory to obtain notification
> suppression.
> 
Good description describing the problem and solution well.
Yes, we discussed this requirement already before at [1].

[1] https://lore.kernel.org/virtio-comment/1689820586.8879051-7-xuanzhuo@linux.alibaba.com/T/#mce485d2c7187b800575c5a63d83b2e9e77f474cc

> Signed-off-by: Xiaoguang Wang <lege.wang@jaguarmicro.com>
> ---
>  content.tex       | 34 +++++++++++++++++++++++++++++++++
>  transport-pci.tex | 48
> +++++++++++++++++++++++++++++++++++++++++++++--
>  2 files changed, 80 insertions(+), 2 deletions(-)
> 
> diff --git a/content.tex b/content.tex
> index 0a62dce..5374510 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -448,6 +448,7 @@ \section{Driver Notifications} \label{sec:Basic Facilities
> of a Virtio Device /  The driver is sometimes required to send an available
> buffer  notification to the device.
> 
> +\subsection{Available Buffer Notifications} \label{sec:Basic Facilities
> +of a Virtio Device / Driver notifications / Available Buffer
> +Notifications}
>  When VIRTIO_F_NOTIFICATION_DATA has not been negotiated,  this
> notification contains either a virtqueue index if
> VIRTIO_F_NOTIF_CONFIG_DATA is not negotiated or device supplied
> virtqueue @@ -488,6 +489,34 @@ \section{Driver Notifications}
> \label{sec:Basic Facilities of a Virtio Device /  has been negotiated, these
> notifications would then have  identical \field{next_off} and \field{next_wrap}
> values.
> 
> +\subsection{Used Buffer Event Enable Notifications} \label{sec:Basic
> +Facilities of a Virtio Device / Driver notifications / Used Buffer Event Enable
> Notifications} This kind of notification is only necessary when
> VIRTIO_F_USED_EVENT_AUTO_DISABLE is negotiated.
> +When this feature bit is negotiated, the device will disable later used
Remove the word "later".

> +buffer notification automatically after sending one used buffer
> +notification to the driver, and after the driver finishes handling
> +current used buffer event, it needs to notify the device to enable used buffer
> notification again.
> +

> +The notification method and supplying any such virtqueue notification
> +config data is transport specific, currently driver notifications to
> +the device include the following information:
> +
> +\begin{description}
> +\item [vq_index or vq_notif_config_data] Either virtqueue index or device
> +      supplied queue notification config data corresponding to a virtqueue.
> +\item [next_off] Offset
> +      within the ring where the next used ring entry will be written.
> +      When VIRTIO_F_RING_PACKED has not been negotiated this refers to the
> +      15 least significant bits of the used index.
> +      When VIRTIO_F_RING_PACKED has been negotiated this refers to the
> offset
> +      (in units of descriptor entries)
> +      within the descriptor ring where the next used descriptor will be written.
> +\item [next_wrap] Wrap Counter.
> +      With VIRTIO_F_RING_PACKED this is the wrap counter
> +      referring to the next used descriptor.
> +      Without VIRTIO_F_RING_PACKED this is the most significant bit
> +      (bit 15) of the used index.
> +\end{description}
> +
This forces the driver to process _all_ used buffer notifications before enabling them again.
There is a race condition that used ring updates are occurring from the device (without notification) and driver updates the value.
For example, 
1. driver posted descriptors from index 0 to 9.
2. Used buffer notification arrived for 0 to 3.
3. Driver processes used buffer ring from 0 to 3 and enables used buffer notification for index 4.
4. While #3 is happening, the device already updated the used buffer entries.
5. So device view of used ring entry is messed up as it already passed that point.

Hence a better driver used buffer notification should be,

1. index upto which it has processed
2. explicit enable/disable bit to enable the notification

These two fields clearly communicate to the device, upto how much used buffer is consumed and understand the driver consumption rate.
This way, there is no ambiguity for device on where to write next used ring entry. 
(where to write next used entry is not driven by the notification enablement).

And the explicit enable bit indicates notifications are enabled back.

Technically, only enable bit is needed, but doing #1 is helpful for the device to understand driver consumption pace.
And for that may be current notification can be made 64-bits or as suggested in your patch to have new location.

>  \input{shared-mem.tex}
> 
>  \section{Exporting Objects}\label{sec:Basic Facilities of a Virtio Device /
> Exporting Objects} @@ -872,6 +901,11 @@ \chapter{Reserved Feature
> Bits}\label{sec:Reserved Feature Bits}
>  	\ref{devicenormative:Basic Facilities of a Virtio Device / Feature Bits}
> for
>  	handling features reserved for future use.
> 
> +  \item[VIRTIO_F_USED_EVENT_AUTO_DISABLE(42)] This feature indicates
> + that the device  will disable later used buffer notification
> + automatically after sending one used buffer  notification to the driver.
> +  See \ref{sec:Basic Facilities of a Virtio Device / Driver
> + notifications / Used Buffer Event Enable Notifications}
The wording "disable later" is confusing, does not read proper.

May be to reword it as,

This feature indicates that the device will disable used buffer notification automatically after sending one used buffer notification.

> +
>  \end{description}
> 
>  \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits} diff
> --git a/transport-pci.tex b/transport-pci.tex index a5c6719..21b80dc 100644
> --- a/transport-pci.tex
> +++ b/transport-pci.tex
> @@ -325,6 +325,8 @@ \subsubsection{Common configuration structure
> layout}\label{sec:Virtio Transport
>          /* About the administration virtqueue. */
>          le16 admin_queue_index;         /* read-only for driver */
>          le16 admin_queue_num;         /* read-only for driver */
> +
> +        le16 queue_intr_notify_off;     /* read-only for driver */
>  };
I think it should be renamed to queue_used_buf_notify_off; 
This is because notifications are slightly worded differently than interrupts.

>  \end{lstlisting}
> 
> @@ -428,6 +430,14 @@ \subsubsection{Common configuration structure
> layout}\label{sec:Virtio Transport
>  	The value 0 indicates no supported administration virtqueues.
>  	This field is valid only if VIRTIO_F_ADMIN_VQ has been
>  	negotiated.
> +
> +\item[\field{queue_intr_notify_off}]
> +        The driver reads this to calculate the offset from start of Used Buffer
> Event Enable Notification structure at
> +        which this virtqueue is located.
> +        \begin{note} this is \em{not} an offset in bytes.
> +        See \ref{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device
> Layout / Notification capability} below.
> +        \end{note}
> +
This is strange. The notification is in the PCI BAR, right?
it can be done simple as just the BAR offset.

>  \end{description}
> 
>  \devicenormative{\paragraph}{Common configuration structure
> layout}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout /
> Common configuration structure layout} @@ -498,7 +508,7 @@
> \subsubsection{Common configuration structure layout}\label{sec:Virtio
> Transport  \drivernormative{\paragraph}{Common configuration structure
> layout}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout /
> Common configuration structure layout}
> 
>  The driver MUST NOT write to \field{device_feature}, \field{num_queues}, -
> \field{config_generation}, \field{queue_notify_off} or
> +\field{config_generation}, \field{queue_notify_off},
> +\field{queue_intr_notify_off} or
>  \field{queue_notif_config_data}.
> 
>  If VIRTIO_F_RING_PACKED has been negotiated, @@ -541,6 +551,7 @@
> \subsubsection{Notification structure layout}\label{sec:Virtio Transport
> Options  struct virtio_pci_notify_cap {
>          struct virtio_pci_cap cap;
>          le32 notify_off_multiplier; /* Multiplier for queue_notify_off. */
> +        le32 intr_notify_off_multiplier; /* Multiplier for
> + queue_intr_notify_off. Only if VIRTIO_F_USED_EVENT_AUTO_DISABLE is
> + negotiated. */
>  };
>  \end{lstlisting}
> 
I would really like to avoid extending this structure which is not in the PCI extended capability.
And we are almost full with 256B limit.
It is the time to have the PCI extended cap defined and place this field there.
If this multiplier is not useful, maybe it should be added only if needed. 
Did you add it because to match the current flexibility of the spec, or your hw actually needs this multiplier?

> @@ -560,10 +571,26 @@ \subsubsection{Notification structure
> layout}\label{sec:Virtio Transport Options  the same Queue Notify address for
> all queues.
>  \end{note}
> 
> +\field{intr_notify_off_multiplier} is combined with the
> +\field{queue_intr_notify_off} to derive the Queue Used Buffer Event Enable
> Notify address within a BAR for a virtqueue:
> +
> +\begin{lstlisting}
> +        cap.offset + queue_intr_notify_off * intr_notify_off_multiplier
> +\end{lstlisting}
> +
> +The \field{cap.offset} and \field{intr_notify_off_multiplier} are taken
> +from the notification capability structure above, and the
> +\field{queue_intr_notify_off} is taken from the common configuration
> structure.
> +
> +\begin{note}
> +For example, if \field{intr_notifier_off_multiplier} is 0, the device
> +uses the same Queue Used Buffer Event Enable Notify address for all queues.
> +\end{note}
> +
>  \devicenormative{\paragraph}{Notification capability}{Virtio Transport
> Options / Virtio Over PCI Bus / PCI Device Layout / Notification capability}  The
> device MUST present at least one notification capability.
> 
> -For devices not offering VIRTIO_F_NOTIFICATION_DATA:
> +For devices not offering VIRTIO_F_NOTIFICATION_DATA or
> VIRTIO_F_USED_EVENT_AUTO_DISABLE:
> 
>  The \field{cap.offset} MUST be 2-byte aligned.
> 
> @@ -596,6 +623,23 @@ \subsubsection{Notification structure
> layout}\label{sec:Virtio Transport Options  cap.length >= queue_notify_off *
> notify_off_multiplier + 4  \end{lstlisting}
> 
> +For devices offering VIRTIO_F_USED_EVENT_AUTO_DISABLE::
> +
> +The device MUST either present \field{intr_notify_off_multiplier} as a
> +number that is a power of 2 that is also a multiple 4, or present
> +\field{intr_notify_off_multiplier} as 0.
> +
> +The \field{cap.offset} MUST be 4-byte aligned.
> +
> +The value \field{cap.length} presented by the device MUST be at least 8
> +and MUST be large enough to support queue notification offsets for all
> +supported queues in all possible configurations.
> +
> +For all queues, the value \field{cap.length} presented by the device MUST
> satisfy:
> +\begin{lstlisting}
> +cap.length >= queue_intr_notify_off * intr_notify_off_multiplier + 4
> +\end{lstlisting}
> +
>  \subsubsection{ISR status capability}\label{sec:Virtio Transport Options /
> Virtio Over PCI Bus / PCI Device Layout / ISR status capability}
> 
>  The VIRTIO_PCI_CAP_ISR_CFG capability
> --
> 2.43.0
> 
> 
> This publicly archived list offers a means to provide input to the
> OASIS Virtual I/O Device (VIRTIO) TC.
> 
> In order to verify user consent to the Feedback License terms and
> to minimize spam in the list archive, subscription is required
> before posting.
> 
> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> List help: virtio-comment-help@lists.oasis-open.org
> List archive: https://lists.oasis-open.org/archives/virtio-comment/
> Feedback License: https://www.oasis-
> open.org/who/ipr/feedback_license.pdf
> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/virtio/
> Join OASIS: https://www.oasis-open.org/join/



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]