OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: [PATCH] virtio-blk: restore VIRTIO_BLK_F_FLUSH and VIRTIO_BLK_F_CONFIG_WCE


VIRTIO_BLK_F_CONFIG_WCE is important in order to achieve good performance
(up to 2x, though more realistically +30-40%) in latency-bound workloads.
However, it was removed by mistake together with VIRTIO_BLK_F_FLUSH.

In addition, even removing VIRTIO_BLK_F_FLUSH was probably not a great
idea, because it simplifies simple drivers (e.g. firmware) that are okay
with a writethrough cache but still need data to persist after power loss.
What really should have been removed is just the possibility that devices
not propose VIRTIO_BLK_F_FLUSH, but even that only deserves a "SHOULD" in
the new world of conformance statements.

Restore these, with the following changes:

* clarify and use conformance statements in order to define writeback
and writethrough caching according to what is commonly done by high-end
storage.

* clarify (with conformance statements) the influence of the
VIRTIO_BLK_F_FLUSH feature on caching and how to proceed if only one of
VIRTIO_BLK_F_FLUSH and VIRTIO_BLK_F_CONFIG_WCE is negotiated.

* strengthen the requirement for persisting writes to MUST after
a VIRTIO_BLK_T_FLUSH request (and in other cases too involving the
new features).

The suggested behavior upon feature negotiation is okay for the Linux
implementation of virtio1, even after the implementation is modified to
support the two new features.

This fixes VIRTIO-144.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 conformance.tex |   2 +
 content.tex     | 130 +++++++++++++++++++++++++++++++++++++++++++++++---------
 2 files changed, 111 insertions(+), 21 deletions(-)

diff --git a/conformance.tex b/conformance.tex
index 29c6ba8..8753cd6 100644
--- a/conformance.tex
+++ b/conformance.tex
@@ -102,6 +102,7 @@ A network driver MUST conform to the following normative statements:
 A block driver MUST conform to the following normative statements:
 
 \begin{itemize}
+\item \ref{drivernormative:Device Types / Block Device / Device Initialization}
 \item \ref{drivernormative:Device Types / Block Device / Device Operation}
 \end{itemize}
 
@@ -205,6 +206,7 @@ A network device MUST conform to the following normative statements:
 A block device MUST conform to the following normative statements:
 
 \begin{itemize}
+\item \ref{devicenormative:Device Types / Block Device / Device Initialization}
 \item \ref{devicenormative:Device Types / Block Device / Device Operation}
 \end{itemize}
 
diff --git a/content.tex b/content.tex
index 3b12263..e58a0bd 100644
--- a/content.tex
+++ b/content.tex
@@ -3723,8 +3723,13 @@ device except where noted.
 
 \item[VIRTIO_BLK_F_BLK_SIZE (6)] Block size of disk is in \field{blk_size}.
 
+\item[VIRTIO_BLK_F_FLUSH (9)] Cache flush command support.
+
 \item[VIRTIO_BLK_F_TOPOLOGY (10)] Device exports information on optimal I/O
     alignment.
+
+\item[VIRTIO_BLK_F_CONFIG_WCE (11)] Device can toggle its cache between writeback
+    and writethrough modes.
 \end{description}
 
 \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Block Device / Feature bits / Legacy Interface: Feature bits}
@@ -3733,16 +3738,12 @@ device except where noted.
 \item[VIRTIO_BLK_F_BARRIER (0)] Device supports request barriers.
 
 \item[VIRTIO_BLK_F_SCSI (7)] Device supports scsi packet commands.
-
-\item[VIRTIO_BLK_F_FLUSH (9)] Cache flush command support.
-
-\item[VIRTIO_BLK_F_CONFIG_WCE (11)] Device can toggle its cache between writeback
-    and writethrough modes.
 \end{description}
 
-VIRTIO_BLK_F_FLUSH was also called VIRTIO_BLK_F_WCE: Legacy drivers
-MUST only negotiate this feature if they are capable of sending
-VIRTIO_BLK_T_FLUSH commands.
+\begin{note}
+  In the legacy interface, VIRTIO_BLK_F_FLUSH was also
+  called VIRTIO_BLK_F_WCE.
+\end{note}
 
 \subsection{Device configuration layout}\label{sec:Device Types / Block Device / Device configuration layout}
 
@@ -3771,7 +3772,7 @@ struct virtio_blk_config {
                 // optimal (suggested maximum) I/O size in blocks
                 le32 opt_io_size;
         } topology;
-        u8 reserved;
+        u8 writeback;
 };
 \end{lstlisting}
 
@@ -3801,21 +3802,52 @@ according to the native endian of the guest rather than
   \field{topology} struct can be read to determine the physical block size and optimal
   I/O lengths for the driver to use. This also does not affect the units
   in the protocol, only performance.
+
+\item If the VIRTIO_BLK_F_CONFIG_WCE feature is negotiated, the cache
+  mode can be read or set through the \field{writeback} field.  0 corresponds
+  to a writethrough cache, 1 to a writeback cache\footnote{Consistent with
+    \ref{devicenormative:Device Types / Block Device / Device Operation},
+    a writethrough cache can be defined broadly as a cache that commits
+    writes to non-volatile storage before reporting their completion.
+    For example, a battery-backed writeback cache actually counts as
+    writethrough according to this definition.}.  The cache mode after
+  reset is undefined.
 \end{enumerate}
 
+\drivernormative{\subsubsection}{Device Initialization}{Device Types / Block Device / Device Initialization}
+
+Drivers SHOULD NOT negotiate VIRTIO_BLK_F_FLUSH if they are incapable of
+sending VIRTIO_BLK_T_FLUSH commands.
+
+If the VIRTIO_BLK_F_CONFIG_WCE feature is not negotiated, but
+VIRTIO_BLK_F_FLUSH was proposed by the device, the driver MAY deduce
+the presence of a writethrough cache.  Otherwise, the driver SHOULD
+assume presence of a writeback cache.
+
+\devicenormative{\subsubsection}{Device Initialization}{Device Types / Block Device / Device Initialization}
+
+Devices SHOULD always propose VIRTIO_BLK_F_FLUSH, and MUST propose it
+if they propose VIRTIO_BLK_F_CONFIG_WCE.
+
+If VIRTIO_BLK_F_CONFIG_WCE is negotiated but the VIRTIO_BLK_F_FLUSH feature
+is not, the device MUST set \field{writeback} to 0 as soon as the driver
+sets the FEATURES_OK status bit.
+
 \subsubsection{Legacy Interface: Device Initialization}\label{sec:Device Types / Block Device / Device Initialization / Legacy Interface: Device Initialization}
 
-The \field{reserved} field used to be called \field{writeback}.  If the
-VIRTIO_BLK_F_CONFIG_WCE feature is offered, the cache mode can be
-read from \field{writeback}; the
-driver can also write to the field in order to toggle the cache
-between writethrough (0) and writeback (1) mode.  If the feature is
-not available, the driver can instead look at the result of
-negotiating VIRTIO_BLK_F_FLUSH: the cache will be in writeback mode
-after reset if and only if VIRTIO_BLK_F_FLUSH is negotiated.
+Because legacy devices do not have FEATURES_OK, legacy device behavior
+differs around feature negotiation; legacy devices never modify
+\field{writeback} as a result of a driver's write to the status register.
+In particular, some legacy drivers wrote to \field{writeback} between
+setting the DRIVER status bit and setting the DRIVER_OK status bit.
+It would be a bug for the device to overwrite \field{writeback} when
+the DRIVER_OK status bit is set.
+
+Legacy devices must also never modify \field{writeback} as a result of
+a driver's write to the guest features registers, for example if the
+driver sets the VIRTIO_BLK_F_CONFIG_WCE guest feature bit but does not
+set the VIRTIO_BLK_F_FLUSH bit.
 
-Some older legacy devices did not operate in writethrough mode even
-after a driver announced lack of support for VIRTIO_BLK_F_FLUSH.
 
 \subsection{Device Operation}\label{sec:Device Types / Block Device / Device Operation}
 
@@ -3865,14 +3897,67 @@ A driver SHOULD accept the VIRTIO_BLK_F_RO feature if offered.
 A driver MUST set \field{sector} to 0 for a VIRTIO_BLK_T_FLUSH request.
 A driver SHOULD NOT include any data in a VIRTIO_BLK_T_FLUSH request.
 
+If the VIRTIO_BLK_F_CONFIG_WCE feature is negotiated, the driver MAY
+switch to writethrough or writeback mode by writing respectively 0 and
+1 to the \field{writeback} field.  After writing a 0 to \field{writeback},
+the driver MUST NOT assume that any volatile writes have been committed
+to non-volatile storage.
+
 \devicenormative{\subsubsection}{Device Operation}{Device Types / Block Device / Device Operation}
 
 A device MUST set the \field{status} byte to VIRTIO_BLK_S_IOERR
 for a write request if the VIRTIO_BLK_F_RO feature if offered, and MUST NOT
 write any data.
 
-Upon receipt of a VIRTIO_BLK_T_FLUSH request, the device SHOULD ensure
-that any writes which were completed are committed to non-volatile storage.
+A write is considered volatile when it is submitted; the contents of
+sectors covered by a volatile are undefined until the write becomes stable.
+A write becomes stable once it is completed and one or more of the
+following conditions is true:
+
+\begin{enumerate}
+\item\label{item:flush1} neither VIRTIO_BLK_F_CONFIG_WCE nor
+  VIRTIO_BLK_F_FLUSH feature were negotiated, but VIRTIO_BLK_F_FLUSH was
+  proposed by the device;
+
+\item\label{item:flush2} the VIRTIO_BLK_F_CONFIG_WCE feature was negotiated and the
+  \field{writeback} field in configuration space was 0 \textbf{all the time between
+  the submission of the write and its completion};
+
+\item\label{item:flush3} a VIRTIO_BLK_T_FLUSH request is sent \textbf{after the write is
+  completed} and is completed itself.
+\end{enumerate}
+
+The device MUST ensure that stable writes are committed to
+non-volatile storage, before reporting completion of the write
+(cases~\ref{item:flush1} and~\ref{item:flush2}) or the flush
+(case~\ref{item:flush3}).  Failure to do so can cause data loss
+in case of a crash.
+
+% ** Not using MAY/MAY NOT intentionally, this is not optional behavior **
+If the driver changes \field{writeback} between the submission of the write
+and its completion, the write can be either volatile or stable when
+its completion is reported; the exact behavior is undefined.
+
+% According to the device requirements for device initialization:
+%   Propose(CONFIG_WCE) => Propose(FLUSH).
+%
+% After reversing the implication:
+%   not Propose(FLUSH) => not Propose(CONFIG_WCE).
+
+If VIRTIO_BLK_F_FLUSH was not proposed by the
+  device\footnote{Note that in this case, according to
+  \ref{devicenormative:Device Types / Block Device / Device Initialization},
+  the device will not have proposed VIRTIO_BLK_F_CONFIG_WCE either.}, the
+device MAY also commit writes to non-volatile storage before reporting
+their completion.  Unlike case~\ref{item:flush1}, however, this is not
+an absolute requirement of the specification.
+
+\begin{note}
+  An implementation that does not propose VIRTIO_BLK_F_FLUSH and does not commit
+  completed writes will not be resilient to data loss in case of crashes.
+  Not proposing VIRTIO_BLK_F_FLUSH is an absolute requirement
+  for implementations that do not wish to be safe against such data losses.
+\end{note}
 
 \subsubsection{Legacy Interface: Device Operation}\label{sec:Device Types / Block Device / Device Operation / Legacy Interface: Device Operation}
 When using the legacy interface, transitional devices and drivers
@@ -3907,6 +3992,9 @@ serve as data consistency guarantee.  Only a VIRTIO_BLK_T_FLUSH request
 does that.
 \end{note}
 
+Some older legacy devices did not commit completed writes to non-volatile
+storage, even if VIRTIO_BLK_F_FLUSH was proposed but not negotiated.
+
 If the device has VIRTIO_BLK_F_SCSI feature, it can also support
 scsi packet command requests, each of these requests is of form:
 
-- 
2.4.3



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]