OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

virtio message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: Re: [virtio] [PATCH v2] virtio-blk: restore VIRTIO_BLK_F_FLUSH and VIRTIO_BLK_F_CONFIG_WCE

On Thu, Jul 02, 2015 at 09:57:07AM +0200, Paolo Bonzini wrote:
> VIRTIO_BLK_F_CONFIG_WCE is important in order to achieve good performance
> (up to 2x, though more realistically +30-40%) in latency-bound workloads.
> However, it was removed by mistake together with VIRTIO_BLK_F_FLUSH.
> In addition, even removing VIRTIO_BLK_F_FLUSH was probably not a great
> idea, because it simplifies simple drivers (e.g. firmware) that are okay
> with a writethrough cache but still need data to persist after power loss.
> What really should have been removed is just the possibility that devices
> not propose VIRTIO_BLK_F_FLUSH, but even that only deserves a "SHOULD" in
> the new world of conformance statements.
> Restore these, with the following changes:
> * clarify and use conformance statements in order to define writeback
> and writethrough caching according to what is commonly done by high-end
> storage.
> * clarify (with conformance statements) the influence of the
> VIRTIO_BLK_F_FLUSH feature on caching and how to proceed if only one of
> * strengthen the requirement for persisting writes after a
> VIRTIO_BLK_T_FLUSH request.  This is now a MUST.
> This fixes VIRTIO-144.
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  content.tex | 76 ++++++++++++++++++++++++++++++++++++++++++-------------------
>  1 file changed, 52 insertions(+), 24 deletions(-)
> diff --git a/content.tex b/content.tex
> index 8de905f..4e03844 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -3723,26 +3723,31 @@ device except where noted.
>  \item[VIRTIO_BLK_F_BLK_SIZE (6)] Block size of disk is in \field{blk_size}.
> +\item[VIRTIO_BLK_F_FLUSH (9)] Cache flush command support.
> +
>  \item[VIRTIO_BLK_F_TOPOLOGY (10)] Device exports information on optimal I/O
>      alignment.
> +
> +\item[VIRTIO_BLK_F_CONFIG_WCE (11)] Device can toggle its cache between writeback
> +    and writethrough modes.
>  \end{description}
> +Devices SHOULD always propose VIRTIO_BLK_F_FLUSH, and MUST propose it
> +if they propose VIRTIO_BLK_F_CONFIG_WCE.  Drivers SHOULD only negotiate
> +these features if they are capable of sending VIRTIO_BLK_T_FLUSH commands.
> +

Sorry this is a non-normative section, so you can't say MUST and SHOULD
here. You'll need to add a normative one, and link from conformance
It's ok to have a bit of duplication between normative
and non-normative ones if this helps readability.

>  \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Block Device / Feature bits / Legacy Interface: Feature bits}
>  \begin{description}
>  \item[VIRTIO_BLK_F_BARRIER (0)] Device supports request barriers.
>  \item[VIRTIO_BLK_F_SCSI (7)] Device supports scsi packet commands.
> -
> -\item[VIRTIO_BLK_F_FLUSH (9)] Cache flush command support.
> -

Did you mean to remove this one?

> -\item[VIRTIO_BLK_F_CONFIG_WCE (11)] Device can toggle its cache between writeback
> -    and writethrough modes.
>  \end{description}
> -VIRTIO_BLK_F_FLUSH was also called VIRTIO_BLK_F_WCE: Legacy drivers
> -MUST only negotiate this feature if they are capable of sending
> -VIRTIO_BLK_T_FLUSH commands.
> +\begin{note} Naming of features
> +  In the legacy interface, VIRTIO_BLK_F_FLUSH was also
> +  called VIRTIO_BLK_F_WCE.
> +\end{note}

This one is ok.

>  \subsection{Device configuration layout}\label{sec:Device Types / Block Device / Device configuration layout}
> @@ -3771,7 +3776,7 @@ struct virtio_blk_config {
>                  // optimal (suggested maximum) I/O size in blocks
>                  le32 opt_io_size;
>          } topology;
> -        u8 reserved;
> +        u8 writeback;
>  };
>  \end{lstlisting}
> @@ -3801,21 +3806,27 @@ according to the native endian of the guest rather than
>    \field{topology} struct can be read to determine the physical block size and optimal
>    I/O lengths for the driver to use. This also does not affect the units
>    in the protocol, only performance.
> -\end{enumerate}
> -
> -\subsubsection{Legacy Interface: Device Initialization}\label{sec:Device Types / Block Device / Device Initialization / Legacy Interface: Device Initialization}
> -The \field{reserved} field used to be called \field{writeback}.  If the
> -VIRTIO_BLK_F_CONFIG_WCE feature is offered, the cache mode can be
> -read from \field{writeback}; the
> -driver can also write to the field in order to toggle the cache
> -between writethrough (0) and writeback (1) mode.  If the feature is
> -not available, the driver can instead look at the result of
> -negotiating VIRTIO_BLK_F_FLUSH: the cache will be in writeback mode
> -after reset if and only if VIRTIO_BLK_F_FLUSH is negotiated.
> +\item If the VIRTIO_BLK_F_CONFIG_WCE feature is negotiated, the cache
> +  mode can be read from \field{writeback}.  0 corresponds to a writethrough
> +  cache, 1 to a writeback cache\footnote{Consistent with
> +    \ref{sec:Device Types / Block Device / Device Operation},
> +    a writethrough cache can be defined broadly as a cache that commits
> +    writes to non-volatile storage before reporting their completion.
> +    For example, a battery-backed writeback cache actually counts as
> +    writethrough according to this definition.}.

This seems a bit too much text for a footnote, I'd still prefer it
as a simple text note.

> +
> +\item If the VIRTIO_BLK_F_CONFIG_WCE feature is not available, the driver
> +  can deduct the cache mode by looking at the result of negotiating
> +  VIRTIO_BLK_F_FLUSH: the cache will be in writeback mode if and only if
> +  VIRTIO_BLK_F_FLUSH is negotiated.
> +\end{enumerate}
> -Some older legacy devices did not operate in writethrough mode even
> -after a driver announced lack of support for VIRTIO_BLK_F_FLUSH.
> +If the VIRTIO_BLK_F_FLUSH feature is not negotiated while
> +VIRTIO_BLK_F_CONFIG_WCE is, \field{writeback} MUST read 0 immediately
> +after feature negotiation.

This is confusing.
Previously you simply said writeback is only valid if
VIRTIO_BLK_F_CONFIG_WCE is negotiated.
So let's put the mention of VIRTIO_BLK_F_CONFIG_WCE in brackets
or remove.

Also, what's the reset value if it is negotiated?

Also, should we prohibit driver from writing the field

>  Alternatively, a device MAY simply fail
> +feature negotiation if the driver proposes VIRTIO_BLK_F_CONFIG_WCE but
> +does not propose VIRTIO_BLK_F_FLUSH.

Alternatively to what? We can't have a mandatory behaviour but also
make it optional. And driver does not propose features, it negotiates

Maybe driver must/should negotiate some features or do flushes?

Or maybe device must fail negotiation if it meets some criteria?

>  \subsection{Device Operation}\label{sec:Device Types / Block Device / Device Operation}
> @@ -3871,8 +3882,25 @@ A device MUST set the \field{status} byte to VIRTIO_BLK_S_IOERR
>  for a write request if the VIRTIO_BLK_F_RO feature if offered, and MUST NOT
>  write any data.
> -Upon receipt of a VIRTIO_BLK_T_FLUSH request, the device SHOULD ensure
> -that any writes which were completed are committed to non-volatile storage.
> +If the \field{VIRTIO_BLK_F_FLUSH} feature is not negotiated, the
> +device SHOULD ensure that all writes are committed to non-volatile
> +storage before reporting completion.  It MUST do so if it proposed
> +\field{VIRTIO_BLK_F_FLUSH}.  Failure to do so can cause data loss.

The 1st one is a SHOULD because not everyone cares about data loss.
But why is the second one a MUST? Same logic seems to apply.

Same below.

> +
> +If the \field{VIRTIO_BLK_F_FLUSH} feature is negotiated, upon receipt
> +of a VIRTIO_BLK_T_FLUSH request, the device MUST ensure that any writes
> +which were completed are committed to non-volatile storage.

Aren't there any driver requirements?
E.g. with one mode driver may assume data is safe
when complete, in another is should issue flush requests
to commit it to storage.

> +
> +If the VIRTIO_BLK_F_CONFIG_WCE feature is negotiated and the
> +\field{writeback} field in configuration space is 0, the device MUST
> +ensure that all writes are committed to non-volatile storage before
> +reporting completion.
> +
> +If the VIRTIO_BLK_F_CONFIG_WCE feature is negotiated, the driver MAY
> +switch to writethrough or writeback mode by writing respectively 0 and
> +1 to the \field{writeback} field.  After writing a 0 to \field{writeback},
> +the driver MUST NOT assume that writes submitted while the field was 1

since the last VIRTIO_BLK_F_FLUSH completed, correct?

> have
> +been committed to non-volatile storage.

>  \subsubsection{Legacy Interface: Device Operation}\label{sec:Device Types / Block Device / Device Operation / Legacy Interface: Device Operation}
>  When using the legacy interface, transitional devices and drivers
> -- 
> 2.4.3
> ---------------------------------------------------------------------
> To unsubscribe from this mail list, you must leave the OASIS TC that 
> generates this mail.  Follow this link to all your TCs in OASIS at:
> https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php 

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]