OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [virtio] [PATCH v5] virtio-blk: restore VIRTIO_BLK_F_FLUSH and VIRTIO_BLK_F_CONFIG_WCE


On Mon, Aug 10, 2015 at 03:33:49PM +0200, Paolo Bonzini wrote:
> VIRTIO_BLK_F_CONFIG_WCE is important in order to achieve good performance
> (up to 2x, though more realistically +30-40%) in latency-bound workloads.
> However, it was removed by mistake together with VIRTIO_BLK_F_FLUSH.
> 
> In addition, even removing VIRTIO_BLK_F_FLUSH was probably not a great
> idea, because it simplifies simple drivers (e.g. firmware) that are okay
> with a writethrough cache but still need data to persist after power loss.
> What really should have been removed is just the possibility that devices
> not propose VIRTIO_BLK_F_FLUSH, but even that only deserves a "SHOULD" in
> the new world of conformance statements.
> 
> Restore these, with the following changes:
> 
> * clarify and use conformance statements in order to define writeback
> and writethrough caching according to what is commonly done by high-end
> storage.
> 
> * clarify (with conformance statements) the influence of the
> VIRTIO_BLK_F_FLUSH feature on caching and how to proceed if only one of
> VIRTIO_BLK_F_FLUSH and VIRTIO_BLK_F_CONFIG_WCE is negotiated.
> 
> * strengthen the requirement for persisting writes to MUST after
> a VIRTIO_BLK_T_FLUSH request (and in other cases too involving the
> new features).
> 
> The suggested behavior upon feature negotiation is okay for the Linux
> implementation of virtio1, even after the implementation is modified to
> support the two new features.
> 
> VIRTIO-144
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> 	v4->v5:
> 	- s/permanent device backend storage/persistent device backend storage/
> 	- consider case where device backend storage is not persistent
> 
> 	v3->v4:
> 	- do not mention FEATURES_OK wrt initialization of \field{writeback}
>         - s/non-volatile storage/permanent device backend storage/
>         - s/propose/offer/
>         - use MUST/SHOULD/MAY in the legacy interface sections
>         - clarify undefined behavior
>         - clarify workaround for lack of writethrough emulation in
>           older legacy devices
> 
> 	Remaining issues:
> 	- do we need to explicitly allow non-volatile storage
> 	(tmpfs, cache=unsafe, etc.)?
> 	- is it RECOMMENDED ("SHOULD") or OPTIONAL ("MAY") that devices
> 	keep data safe when they do not offer VIRTIO_BLK_F_FLUSH?  Note that
> 	the driver cannot expect anything even if we say "SHOULD".
> 
>  conformance.tex |   2 +
>  content.tex     | 136 +++++++++++++++++++++++++++++++++++++++++++++++---------
>  2 files changed, 117 insertions(+), 21 deletions(-)
> 
> diff --git a/conformance.tex b/conformance.tex
> index 7b7df32..f59e360 100644
> --- a/conformance.tex
> +++ b/conformance.tex
> @@ -105,6 +105,7 @@ A network driver MUST conform to the following normative statements:
>  A block driver MUST conform to the following normative statements:
>  
>  \begin{itemize}
> +\item \ref{drivernormative:Device Types / Block Device / Device Initialization}
>  \item \ref{drivernormative:Device Types / Block Device / Device Operation}
>  \end{itemize}
>  
> @@ -224,6 +225,7 @@ A network device MUST conform to the following normative statements:
>  A block device MUST conform to the following normative statements:
>  
>  \begin{itemize}
> +\item \ref{devicenormative:Device Types / Block Device / Device Initialization}
>  \item \ref{devicenormative:Device Types / Block Device / Device Operation}
>  \end{itemize}
>  
> diff --git a/content.tex b/content.tex
> index 3672588..93386d5 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -4041,8 +4041,13 @@ device except where noted.
>  
>  \item[VIRTIO_BLK_F_BLK_SIZE (6)] Block size of disk is in \field{blk_size}.
>  
> +\item[VIRTIO_BLK_F_FLUSH (9)] Cache flush command support.
> +
>  \item[VIRTIO_BLK_F_TOPOLOGY (10)] Device exports information on optimal I/O
>      alignment.
> +
> +\item[VIRTIO_BLK_F_CONFIG_WCE (11)] Device can toggle its cache between writeback
> +    and writethrough modes.
>  \end{description}
>  
>  \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Block Device / Feature bits / Legacy Interface: Feature bits}
> @@ -4051,16 +4056,12 @@ device except where noted.
>  \item[VIRTIO_BLK_F_BARRIER (0)] Device supports request barriers.
>  
>  \item[VIRTIO_BLK_F_SCSI (7)] Device supports scsi packet commands.
> -
> -\item[VIRTIO_BLK_F_FLUSH (9)] Cache flush command support.
> -
> -\item[VIRTIO_BLK_F_CONFIG_WCE (11)] Device can toggle its cache between writeback
> -    and writethrough modes.
>  \end{description}
>  
> -VIRTIO_BLK_F_FLUSH was also called VIRTIO_BLK_F_WCE: Legacy drivers
> -MUST only negotiate this feature if they are capable of sending
> -VIRTIO_BLK_T_FLUSH commands.
> +\begin{note}
> +  In the legacy interface, VIRTIO_BLK_F_FLUSH was also
> +  called VIRTIO_BLK_F_WCE.
> +\end{note}
>  
>  \subsection{Device configuration layout}\label{sec:Device Types / Block Device / Device configuration layout}
>  
> @@ -4089,7 +4090,7 @@ struct virtio_blk_config {
>                  // optimal (suggested maximum) I/O size in blocks
>                  le32 opt_io_size;
>          } topology;
> -        u8 reserved;
> +        u8 writeback;
>  };
>  \end{lstlisting}
>  
> @@ -4119,21 +4120,56 @@ according to the native endian of the guest rather than
>    \field{topology} struct can be read to determine the physical block size and optimal
>    I/O lengths for the driver to use. This also does not affect the units
>    in the protocol, only performance.
> +
> +\item If the VIRTIO_BLK_F_CONFIG_WCE feature is negotiated, the cache
> +  mode can be read or set through the \field{writeback} field.  0 corresponds
> +  to a writethrough cache, 1 to a writeback cache\footnote{Consistent with
> +    \ref{devicenormative:Device Types / Block Device / Device Operation},
> +    a writethrough cache can be defined broadly as a cache that commits
> +    writes to persistent device backend storage before reporting their
> +    completion. For example, a battery-backed writeback cache actually
> +    counts as writethrough according to this definition.}.  The cache mode
> +  after reset is undefined.

Is it really undefined, or can it be either
writeback or writethrough, and can be determined by reading the
\field{writeback} field reset value?


>  \end{enumerate}
>  
> +\drivernormative{\subsubsection}{Device Initialization}{Device Types / Block Device / Device Initialization}
> +
> +Drivers SHOULD NOT negotiate VIRTIO_BLK_F_FLUSH if they are incapable of
> +sending VIRTIO_BLK_T_FLUSH commands.
> +
> +If the VIRTIO_BLK_F_CONFIG_WCE feature is not negotiated, but
> +VIRTIO_BLK_F_FLUSH was offered by the device, the driver MAY deduce
> +the presence of a writethrough cache.  Otherwise, the driver SHOULD
> +assume presence of a writeback cache.
> +
> +\devicenormative{\subsubsection}{Device Initialization}{Device Types / Block Device / Device Initialization}
> +
> +Devices SHOULD always offer VIRTIO_BLK_F_FLUSH, and MUST offer it
> +if they offer VIRTIO_BLK_F_CONFIG_WCE.
> +
> +If VIRTIO_BLK_F_CONFIG_WCE is negotiated but the VIRTIO_BLK_F_FLUSH feature
> +is not, the device MUST initialize \field{writeback} to 0.
> +
>  \subsubsection{Legacy Interface: Device Initialization}\label{sec:Device Types / Block Device / Device Initialization / Legacy Interface: Device Initialization}
>  
> -The \field{reserved} field used to be called \field{writeback}.  If the
> -VIRTIO_BLK_F_CONFIG_WCE feature is offered, the cache mode can be
> -read from \field{writeback}; the
> -driver can also write to the field in order to toggle the cache
> -between writethrough (0) and writeback (1) mode.  If the feature is
> -not available, the driver can instead look at the result of
> -negotiating VIRTIO_BLK_F_FLUSH: the cache will be in writeback mode
> -after reset if and only if VIRTIO_BLK_F_FLUSH is negotiated.
> +Because legacy devices do not have FEATURES_OK, transitional devices
> +MUST implement slightly different behavior around feature negotiation
> +when used through the legacy interface.  In particular:
> +
> +\begin{itemize}
> +\item they MUST NOT modify \field{writeback} as a result
> +  of a driver's write to the status register.
> +  Some legacy drivers wrote to \field{writeback} between
> +  setting the DRIVER status bit and setting the DRIVER_OK status bit.
> +  Transitional devices MUST NOT overwrite \field{writeback} when
> +  the DRIVER_OK status bit is set.
> +
> +\item they MUST NOT modify \field{writeback} as a result of
> +  a driver's write to the guest features registers, for example if the
> +  driver sets the VIRTIO_BLK_F_CONFIG_WCE guest feature bit but does not
> +  set the VIRTIO_BLK_F_FLUSH bit.
> +\end{itemize}
>  
> -Some older legacy devices did not operate in writethrough mode even
> -after a driver announced lack of support for VIRTIO_BLK_F_FLUSH.
>  
>  \subsection{Device Operation}\label{sec:Device Types / Block Device / Device Operation}
>  
> @@ -4183,14 +4219,66 @@ A driver SHOULD accept the VIRTIO_BLK_F_RO feature if offered.
>  A driver MUST set \field{sector} to 0 for a VIRTIO_BLK_T_FLUSH request.
>  A driver SHOULD NOT include any data in a VIRTIO_BLK_T_FLUSH request.
>  
> +If the VIRTIO_BLK_F_CONFIG_WCE feature is negotiated, the driver MAY
> +switch to writethrough or writeback mode by writing respectively 0 and
> +1 to the \field{writeback} field.  After writing a 0 to \field{writeback},
> +the driver MUST NOT assume that any volatile writes have been committed
> +to persistent device backend storage.
> +
>  \devicenormative{\subsubsection}{Device Operation}{Device Types / Block Device / Device Operation}
>  
>  A device MUST set the \field{status} byte to VIRTIO_BLK_S_IOERR
>  for a write request if the VIRTIO_BLK_F_RO feature if offered, and MUST NOT
>  write any data.
>  
> -Upon receipt of a VIRTIO_BLK_T_FLUSH request, the device SHOULD ensure
> -that any writes which were completed are committed to non-volatile storage.
> +A write is considered volatile when it is submitted; the contents of
> +sectors covered by a volatile write are undefined in persistent device
> +backend storage until the write becomes stable.  A write becomes stable
> +once it is completed and one or more of the following conditions is true:
> +
> +\begin{enumerate}
> +\item\label{item:flush1} neither VIRTIO_BLK_F_CONFIG_WCE nor
> +  VIRTIO_BLK_F_FLUSH feature were negotiated, but VIRTIO_BLK_F_FLUSH was
> +  offered by the device;
> +
> +\item\label{item:flush2} the VIRTIO_BLK_F_CONFIG_WCE feature was negotiated and the
> +  \field{writeback} field in configuration space was 0 \textbf{all the time between
> +  the submission of the write and its completion};
> +
> +\item\label{item:flush3} a VIRTIO_BLK_T_FLUSH request is sent \textbf{after the write is
> +  completed} and is completed itself.
> +\end{enumerate}
> +
> +If the device is backed by persistent storage, the device MUST ensure that
> +stable writes are committed to it, before reporting completion of the write
> +(cases~\ref{item:flush1} and~\ref{item:flush2}) or the flush
> +(case~\ref{item:flush3}).  Failure to do so can cause data loss
> +in case of a crash.
> +
> +If the driver changes \field{writeback} between the submission of the write
> +and its completion, the write could be either volatile or stable when
> +its completion is reported; in other words, the exact behavior is undefined.
> +
> +% According to the device requirements for device initialization:
> +%   Offer(CONFIG_WCE) => Offer(FLUSH).
> +%
> +% After reversing the implication:
> +%   not Offer(FLUSH) => not Offer(CONFIG_WCE).
> +
> +If VIRTIO_BLK_F_FLUSH was not offered by the
> +  device\footnote{Note that in this case, according to
> +  \ref{devicenormative:Device Types / Block Device / Device Initialization},
> +  the device will not have offered VIRTIO_BLK_F_CONFIG_WCE either.}, the
> +device MAY also commit writes to persistent device backend storage before
> +reporting their completion.  Unlike case~\ref{item:flush1}, however, this
> +is not an absolute requirement of the specification.
> +
> +\begin{note}
> +  An implementation that does not offer VIRTIO_BLK_F_FLUSH and does not commit
> +  completed writes will not be resilient to data loss in case of crashes.
> +  Not offering VIRTIO_BLK_F_FLUSH is an absolute requirement
> +  for implementations that do not wish to be safe against such data losses.
> +\end{note}
>  
>  \subsubsection{Legacy Interface: Device Operation}\label{sec:Device Types / Block Device / Device Operation / Legacy Interface: Device Operation}
>  When using the legacy interface, transitional devices and drivers
> @@ -4233,6 +4321,12 @@ serve as data consistency guarantee.  Only a VIRTIO_BLK_T_FLUSH request
>  does that.
>  \end{note}
>  
> +Some older legacy devices did not commit completed writes to persistent
> +device backend storage when VIRTIO_BLK_F_FLUSH was offered but not
> +negotiated.  In order to work around this, the driver MAY set the
> +\field{writeback} to 0 (if available) or it MAY send an explicit flush
> +request after every completed write.
> +
>  If the device has VIRTIO_BLK_F_SCSI feature, it can also support
>  scsi packet command requests, each of these requests is of form:
>  
> -- 
> 2.4.3
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe from this mail list, you must leave the OASIS TC that 
> generates this mail.  Follow this link to all your TCs in OASIS at:
> https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php 


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]