[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [virtio] [PATCH v5] virtio-blk: restore VIRTIO_BLK_F_FLUSH and VIRTIO_BLK_F_CONFIG_WCE
On Mon, Aug 10, 2015 at 03:33:49PM +0200, Paolo Bonzini wrote: > VIRTIO_BLK_F_CONFIG_WCE is important in order to achieve good performance > (up to 2x, though more realistically +30-40%) in latency-bound workloads. > However, it was removed by mistake together with VIRTIO_BLK_F_FLUSH. > > In addition, even removing VIRTIO_BLK_F_FLUSH was probably not a great > idea, because it simplifies simple drivers (e.g. firmware) that are okay > with a writethrough cache but still need data to persist after power loss. > What really should have been removed is just the possibility that devices > not propose VIRTIO_BLK_F_FLUSH, but even that only deserves a "SHOULD" in > the new world of conformance statements. > > Restore these, with the following changes: > > * clarify and use conformance statements in order to define writeback > and writethrough caching according to what is commonly done by high-end > storage. > > * clarify (with conformance statements) the influence of the > VIRTIO_BLK_F_FLUSH feature on caching and how to proceed if only one of > VIRTIO_BLK_F_FLUSH and VIRTIO_BLK_F_CONFIG_WCE is negotiated. > > * strengthen the requirement for persisting writes to MUST after > a VIRTIO_BLK_T_FLUSH request (and in other cases too involving the > new features). > > The suggested behavior upon feature negotiation is okay for the Linux > implementation of virtio1, even after the implementation is modified to > support the two new features. > > VIRTIO-144 > > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > --- > v4->v5: > - s/permanent device backend storage/persistent device backend storage/ > - consider case where device backend storage is not persistent > > v3->v4: > - do not mention FEATURES_OK wrt initialization of \field{writeback} > - s/non-volatile storage/permanent device backend storage/ > - s/propose/offer/ > - use MUST/SHOULD/MAY in the legacy interface sections > - clarify undefined behavior > - clarify workaround for lack of writethrough emulation in > older legacy devices > > Remaining issues: > - do we need to explicitly allow non-volatile storage > (tmpfs, cache=unsafe, etc.)? > - is it RECOMMENDED ("SHOULD") or OPTIONAL ("MAY") that devices > keep data safe when they do not offer VIRTIO_BLK_F_FLUSH? Note that > the driver cannot expect anything even if we say "SHOULD". > > conformance.tex | 2 + > content.tex | 136 +++++++++++++++++++++++++++++++++++++++++++++++--------- > 2 files changed, 117 insertions(+), 21 deletions(-) > > diff --git a/conformance.tex b/conformance.tex > index 7b7df32..f59e360 100644 > --- a/conformance.tex > +++ b/conformance.tex > @@ -105,6 +105,7 @@ A network driver MUST conform to the following normative statements: > A block driver MUST conform to the following normative statements: > > \begin{itemize} > +\item \ref{drivernormative:Device Types / Block Device / Device Initialization} > \item \ref{drivernormative:Device Types / Block Device / Device Operation} > \end{itemize} > > @@ -224,6 +225,7 @@ A network device MUST conform to the following normative statements: > A block device MUST conform to the following normative statements: > > \begin{itemize} > +\item \ref{devicenormative:Device Types / Block Device / Device Initialization} > \item \ref{devicenormative:Device Types / Block Device / Device Operation} > \end{itemize} > > diff --git a/content.tex b/content.tex > index 3672588..93386d5 100644 > --- a/content.tex > +++ b/content.tex > @@ -4041,8 +4041,13 @@ device except where noted. > > \item[VIRTIO_BLK_F_BLK_SIZE (6)] Block size of disk is in \field{blk_size}. > > +\item[VIRTIO_BLK_F_FLUSH (9)] Cache flush command support. > + > \item[VIRTIO_BLK_F_TOPOLOGY (10)] Device exports information on optimal I/O > alignment. > + > +\item[VIRTIO_BLK_F_CONFIG_WCE (11)] Device can toggle its cache between writeback > + and writethrough modes. > \end{description} > > \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Block Device / Feature bits / Legacy Interface: Feature bits} > @@ -4051,16 +4056,12 @@ device except where noted. > \item[VIRTIO_BLK_F_BARRIER (0)] Device supports request barriers. > > \item[VIRTIO_BLK_F_SCSI (7)] Device supports scsi packet commands. > - > -\item[VIRTIO_BLK_F_FLUSH (9)] Cache flush command support. > - > -\item[VIRTIO_BLK_F_CONFIG_WCE (11)] Device can toggle its cache between writeback > - and writethrough modes. > \end{description} > > -VIRTIO_BLK_F_FLUSH was also called VIRTIO_BLK_F_WCE: Legacy drivers > -MUST only negotiate this feature if they are capable of sending > -VIRTIO_BLK_T_FLUSH commands. > +\begin{note} > + In the legacy interface, VIRTIO_BLK_F_FLUSH was also > + called VIRTIO_BLK_F_WCE. > +\end{note} > > \subsection{Device configuration layout}\label{sec:Device Types / Block Device / Device configuration layout} > > @@ -4089,7 +4090,7 @@ struct virtio_blk_config { > // optimal (suggested maximum) I/O size in blocks > le32 opt_io_size; > } topology; > - u8 reserved; > + u8 writeback; > }; > \end{lstlisting} > > @@ -4119,21 +4120,56 @@ according to the native endian of the guest rather than > \field{topology} struct can be read to determine the physical block size and optimal > I/O lengths for the driver to use. This also does not affect the units > in the protocol, only performance. > + > +\item If the VIRTIO_BLK_F_CONFIG_WCE feature is negotiated, the cache > + mode can be read or set through the \field{writeback} field. 0 corresponds > + to a writethrough cache, 1 to a writeback cache\footnote{Consistent with > + \ref{devicenormative:Device Types / Block Device / Device Operation}, > + a writethrough cache can be defined broadly as a cache that commits > + writes to persistent device backend storage before reporting their > + completion. For example, a battery-backed writeback cache actually > + counts as writethrough according to this definition.}. The cache mode > + after reset is undefined. Is it really undefined, or can it be either writeback or writethrough, and can be determined by reading the \field{writeback} field reset value? > \end{enumerate} > > +\drivernormative{\subsubsection}{Device Initialization}{Device Types / Block Device / Device Initialization} > + > +Drivers SHOULD NOT negotiate VIRTIO_BLK_F_FLUSH if they are incapable of > +sending VIRTIO_BLK_T_FLUSH commands. > + > +If the VIRTIO_BLK_F_CONFIG_WCE feature is not negotiated, but > +VIRTIO_BLK_F_FLUSH was offered by the device, the driver MAY deduce > +the presence of a writethrough cache. Otherwise, the driver SHOULD > +assume presence of a writeback cache. > + > +\devicenormative{\subsubsection}{Device Initialization}{Device Types / Block Device / Device Initialization} > + > +Devices SHOULD always offer VIRTIO_BLK_F_FLUSH, and MUST offer it > +if they offer VIRTIO_BLK_F_CONFIG_WCE. > + > +If VIRTIO_BLK_F_CONFIG_WCE is negotiated but the VIRTIO_BLK_F_FLUSH feature > +is not, the device MUST initialize \field{writeback} to 0. > + > \subsubsection{Legacy Interface: Device Initialization}\label{sec:Device Types / Block Device / Device Initialization / Legacy Interface: Device Initialization} > > -The \field{reserved} field used to be called \field{writeback}. If the > -VIRTIO_BLK_F_CONFIG_WCE feature is offered, the cache mode can be > -read from \field{writeback}; the > -driver can also write to the field in order to toggle the cache > -between writethrough (0) and writeback (1) mode. If the feature is > -not available, the driver can instead look at the result of > -negotiating VIRTIO_BLK_F_FLUSH: the cache will be in writeback mode > -after reset if and only if VIRTIO_BLK_F_FLUSH is negotiated. > +Because legacy devices do not have FEATURES_OK, transitional devices > +MUST implement slightly different behavior around feature negotiation > +when used through the legacy interface. In particular: > + > +\begin{itemize} > +\item they MUST NOT modify \field{writeback} as a result > + of a driver's write to the status register. > + Some legacy drivers wrote to \field{writeback} between > + setting the DRIVER status bit and setting the DRIVER_OK status bit. > + Transitional devices MUST NOT overwrite \field{writeback} when > + the DRIVER_OK status bit is set. > + > +\item they MUST NOT modify \field{writeback} as a result of > + a driver's write to the guest features registers, for example if the > + driver sets the VIRTIO_BLK_F_CONFIG_WCE guest feature bit but does not > + set the VIRTIO_BLK_F_FLUSH bit. > +\end{itemize} > > -Some older legacy devices did not operate in writethrough mode even > -after a driver announced lack of support for VIRTIO_BLK_F_FLUSH. > > \subsection{Device Operation}\label{sec:Device Types / Block Device / Device Operation} > > @@ -4183,14 +4219,66 @@ A driver SHOULD accept the VIRTIO_BLK_F_RO feature if offered. > A driver MUST set \field{sector} to 0 for a VIRTIO_BLK_T_FLUSH request. > A driver SHOULD NOT include any data in a VIRTIO_BLK_T_FLUSH request. > > +If the VIRTIO_BLK_F_CONFIG_WCE feature is negotiated, the driver MAY > +switch to writethrough or writeback mode by writing respectively 0 and > +1 to the \field{writeback} field. After writing a 0 to \field{writeback}, > +the driver MUST NOT assume that any volatile writes have been committed > +to persistent device backend storage. > + > \devicenormative{\subsubsection}{Device Operation}{Device Types / Block Device / Device Operation} > > A device MUST set the \field{status} byte to VIRTIO_BLK_S_IOERR > for a write request if the VIRTIO_BLK_F_RO feature if offered, and MUST NOT > write any data. > > -Upon receipt of a VIRTIO_BLK_T_FLUSH request, the device SHOULD ensure > -that any writes which were completed are committed to non-volatile storage. > +A write is considered volatile when it is submitted; the contents of > +sectors covered by a volatile write are undefined in persistent device > +backend storage until the write becomes stable. A write becomes stable > +once it is completed and one or more of the following conditions is true: > + > +\begin{enumerate} > +\item\label{item:flush1} neither VIRTIO_BLK_F_CONFIG_WCE nor > + VIRTIO_BLK_F_FLUSH feature were negotiated, but VIRTIO_BLK_F_FLUSH was > + offered by the device; > + > +\item\label{item:flush2} the VIRTIO_BLK_F_CONFIG_WCE feature was negotiated and the > + \field{writeback} field in configuration space was 0 \textbf{all the time between > + the submission of the write and its completion}; > + > +\item\label{item:flush3} a VIRTIO_BLK_T_FLUSH request is sent \textbf{after the write is > + completed} and is completed itself. > +\end{enumerate} > + > +If the device is backed by persistent storage, the device MUST ensure that > +stable writes are committed to it, before reporting completion of the write > +(cases~\ref{item:flush1} and~\ref{item:flush2}) or the flush > +(case~\ref{item:flush3}). Failure to do so can cause data loss > +in case of a crash. > + > +If the driver changes \field{writeback} between the submission of the write > +and its completion, the write could be either volatile or stable when > +its completion is reported; in other words, the exact behavior is undefined. > + > +% According to the device requirements for device initialization: > +% Offer(CONFIG_WCE) => Offer(FLUSH). > +% > +% After reversing the implication: > +% not Offer(FLUSH) => not Offer(CONFIG_WCE). > + > +If VIRTIO_BLK_F_FLUSH was not offered by the > + device\footnote{Note that in this case, according to > + \ref{devicenormative:Device Types / Block Device / Device Initialization}, > + the device will not have offered VIRTIO_BLK_F_CONFIG_WCE either.}, the > +device MAY also commit writes to persistent device backend storage before > +reporting their completion. Unlike case~\ref{item:flush1}, however, this > +is not an absolute requirement of the specification. > + > +\begin{note} > + An implementation that does not offer VIRTIO_BLK_F_FLUSH and does not commit > + completed writes will not be resilient to data loss in case of crashes. > + Not offering VIRTIO_BLK_F_FLUSH is an absolute requirement > + for implementations that do not wish to be safe against such data losses. > +\end{note} > > \subsubsection{Legacy Interface: Device Operation}\label{sec:Device Types / Block Device / Device Operation / Legacy Interface: Device Operation} > When using the legacy interface, transitional devices and drivers > @@ -4233,6 +4321,12 @@ serve as data consistency guarantee. Only a VIRTIO_BLK_T_FLUSH request > does that. > \end{note} > > +Some older legacy devices did not commit completed writes to persistent > +device backend storage when VIRTIO_BLK_F_FLUSH was offered but not > +negotiated. In order to work around this, the driver MAY set the > +\field{writeback} to 0 (if available) or it MAY send an explicit flush > +request after every completed write. > + > If the device has VIRTIO_BLK_F_SCSI feature, it can also support > scsi packet command requests, each of these requests is of form: > > -- > 2.4.3 > > > --------------------------------------------------------------------- > To unsubscribe from this mail list, you must leave the OASIS TC that > generates this mail. Follow this link to all your TCs in OASIS at: > https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]