[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [virtio] [PATCH v2] virtio-blk: restore VIRTIO_BLK_F_FLUSH and VIRTIO_BLK_F_CONFIG_WCE
On 02/07/2015 11:27, Michael S. Tsirkin wrote: > On Thu, Jul 02, 2015 at 09:57:07AM +0200, Paolo Bonzini wrote: >> VIRTIO_BLK_F_CONFIG_WCE is important in order to achieve good performance >> (up to 2x, though more realistically +30-40%) in latency-bound workloads. >> However, it was removed by mistake together with VIRTIO_BLK_F_FLUSH. >> >> In addition, even removing VIRTIO_BLK_F_FLUSH was probably not a great >> idea, because it simplifies simple drivers (e.g. firmware) that are okay >> with a writethrough cache but still need data to persist after power loss. >> What really should have been removed is just the possibility that devices >> not propose VIRTIO_BLK_F_FLUSH, but even that only deserves a "SHOULD" in >> the new world of conformance statements. >> >> Restore these, with the following changes: >> >> * clarify and use conformance statements in order to define writeback >> and writethrough caching according to what is commonly done by high-end >> storage. >> >> * clarify (with conformance statements) the influence of the >> VIRTIO_BLK_F_FLUSH feature on caching and how to proceed if only one of >> VIRTIO_BLK_F_FLUSH and VIRTIO_BLK_F_CONFIG_WCE is negotiated. >> >> * strengthen the requirement for persisting writes after a >> VIRTIO_BLK_T_FLUSH request. This is now a MUST. >> >> This fixes VIRTIO-144. >> >> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> >> --- >> content.tex | 76 ++++++++++++++++++++++++++++++++++++++++++------------------- >> 1 file changed, 52 insertions(+), 24 deletions(-) >> >> diff --git a/content.tex b/content.tex >> index 8de905f..4e03844 100644 >> --- a/content.tex >> +++ b/content.tex >> @@ -3723,26 +3723,31 @@ device except where noted. >> >> \item[VIRTIO_BLK_F_BLK_SIZE (6)] Block size of disk is in \field{blk_size}. >> >> +\item[VIRTIO_BLK_F_FLUSH (9)] Cache flush command support. >> + >> \item[VIRTIO_BLK_F_TOPOLOGY (10)] Device exports information on optimal I/O >> alignment. >> + >> +\item[VIRTIO_BLK_F_CONFIG_WCE (11)] Device can toggle its cache between writeback >> + and writethrough modes. >> \end{description} >> >> +Devices SHOULD always propose VIRTIO_BLK_F_FLUSH, and MUST propose it >> +if they propose VIRTIO_BLK_F_CONFIG_WCE. Drivers SHOULD only negotiate >> +these features if they are capable of sending VIRTIO_BLK_T_FLUSH commands. >> + > > Sorry this is a non-normative section, so you can't say MUST and SHOULD > here. You'll need to add a normative one, and link from conformance > sections. > It's ok to have a bit of duplication between normative > and non-normative ones if this helps readability. Ok, will move this to a normative subsubsection. >> \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Block Device / Feature bits / Legacy Interface: Feature bits} >> >> \begin{description} >> \item[VIRTIO_BLK_F_BARRIER (0)] Device supports request barriers. >> >> \item[VIRTIO_BLK_F_SCSI (7)] Device supports scsi packet commands. >> - >> -\item[VIRTIO_BLK_F_FLUSH (9)] Cache flush command support. >> - > > Did you mean to remove this one? Yes, it's not legacy anymore. >> \subsection{Device configuration layout}\label{sec:Device Types / Block Device / Device configuration layout} >> >> @@ -3771,7 +3776,7 @@ struct virtio_blk_config { >> // optimal (suggested maximum) I/O size in blocks >> le32 opt_io_size; >> } topology; >> - u8 reserved; >> + u8 writeback; >> }; >> \end{lstlisting} >> >> @@ -3801,21 +3806,27 @@ according to the native endian of the guest rather than >> \field{topology} struct can be read to determine the physical block size and optimal >> I/O lengths for the driver to use. This also does not affect the units >> in the protocol, only performance. >> -\end{enumerate} >> - >> -\subsubsection{Legacy Interface: Device Initialization}\label{sec:Device Types / Block Device / Device Initialization / Legacy Interface: Device Initialization} >> >> -The \field{reserved} field used to be called \field{writeback}. If the >> -VIRTIO_BLK_F_CONFIG_WCE feature is offered, the cache mode can be >> -read from \field{writeback}; the >> -driver can also write to the field in order to toggle the cache >> -between writethrough (0) and writeback (1) mode. If the feature is >> -not available, the driver can instead look at the result of >> -negotiating VIRTIO_BLK_F_FLUSH: the cache will be in writeback mode >> -after reset if and only if VIRTIO_BLK_F_FLUSH is negotiated. >> +\item If the VIRTIO_BLK_F_CONFIG_WCE feature is negotiated, the cache >> + mode can be read from \field{writeback}. 0 corresponds to a writethrough >> + cache, 1 to a writeback cache\footnote{Consistent with >> + \ref{sec:Device Types / Block Device / Device Operation}, >> + a writethrough cache can be defined broadly as a cache that commits >> + writes to non-volatile storage before reporting their completion. >> + For example, a battery-backed writeback cache actually counts as >> + writethrough according to this definition.}. > > This seems a bit too much text for a footnote, I'd still prefer it > as a simple text note. I can't do this inside an {enumerate}. It is long-ish, but it's just documenting what existing hardware does. >> + >> +\item If the VIRTIO_BLK_F_CONFIG_WCE feature is not available, the driver >> + can deduct the cache mode by looking at the result of negotiating >> + VIRTIO_BLK_F_FLUSH: the cache will be in writeback mode if and only if >> + VIRTIO_BLK_F_FLUSH is negotiated. >> +\end{enumerate} >> >> -Some older legacy devices did not operate in writethrough mode even >> -after a driver announced lack of support for VIRTIO_BLK_F_FLUSH. >> +If the VIRTIO_BLK_F_FLUSH feature is not negotiated while >> +VIRTIO_BLK_F_CONFIG_WCE is, \field{writeback} MUST read 0 immediately >> +after feature negotiation. > > This is confusing. > Previously you simply said writeback is only valid if > VIRTIO_BLK_F_CONFIG_WCE is negotiated. This paragraph says "If the VIRTIO_BLK_F_FLUSH feature is not negotiated *while VIRTIO_BLK_F_CONFIG_WCE is*". > Also, what's the reset value if it is negotiated? It is not specified. The device can make it persistent or reset it to a user-specified value. > Also, should we prohibit driver from writing the field > if VIRTIO_BLK_F_FLUSH is off? It doesn't really matter I think. >> Alternatively, a device MAY simply fail >> +feature negotiation if the driver proposes VIRTIO_BLK_F_CONFIG_WCE but >> +does not propose VIRTIO_BLK_F_FLUSH. > > Alternatively to what? I'll change the previous sentence to "\field{writeback} MUST read 0 immediately after successful feature negotiation" (adding "successful"). Then, failing feature negotiation can be an alternative because you don't reach successful negotiation. > We can't have a mandatory behaviour but also > make it optional. And driver does not propose features, it negotiates > them. Will fix. > Or maybe device must fail negotiation if it meets some criteria? Yes, but just "MAY". >> \subsection{Device Operation}\label{sec:Device Types / Block Device / Device Operation} >> >> @@ -3871,8 +3882,25 @@ A device MUST set the \field{status} byte to VIRTIO_BLK_S_IOERR >> for a write request if the VIRTIO_BLK_F_RO feature if offered, and MUST NOT >> write any data. >> >> -Upon receipt of a VIRTIO_BLK_T_FLUSH request, the device SHOULD ensure >> -that any writes which were completed are committed to non-volatile storage. >> +If the \field{VIRTIO_BLK_F_FLUSH} feature is not negotiated, the >> +device SHOULD ensure that all writes are committed to non-volatile >> +storage before reporting completion. It MUST do so if it proposed >> +\field{VIRTIO_BLK_F_FLUSH}. Failure to do so can cause data loss. > > The 1st one is a SHOULD because not everyone cares about data loss. > But why is the second one a MUST? Same logic seems to apply. Because then the driver can _know_ that it is safe from data losses, by looking at proposed features. > Same below. > >> + >> +If the \field{VIRTIO_BLK_F_FLUSH} feature is negotiated, upon receipt >> +of a VIRTIO_BLK_T_FLUSH request, the device MUST ensure that any writes >> +which were completed are committed to non-volatile storage. > > Aren't there any driver requirements? > E.g. with one mode driver may assume data is safe > when complete, in another is should issue flush requests > to commit it to storage. I don't follow this. Can you rephrase? >> + >> +If the VIRTIO_BLK_F_CONFIG_WCE feature is negotiated and the >> +\field{writeback} field in configuration space is 0, the device MUST >> +ensure that all writes are committed to non-volatile storage before >> +reporting completion. >> + >> +If the VIRTIO_BLK_F_CONFIG_WCE feature is negotiated, the driver MAY >> +switch to writethrough or writeback mode by writing respectively 0 and >> +1 to the \field{writeback} field. After writing a 0 to \field{writeback}, >> +the driver MUST NOT assume that writes submitted while the field was 1 > > since the last VIRTIO_BLK_F_FLUSH completed, correct? More precisely "submitted while the field was 1 and completed after the last VIRTIO_BLK_F_FLUSH was submitted". Does it have to be specified explicitly? Paolo >> have +been committed to non-volatile storage. > > > >> >> \subsubsection{Legacy Interface: Device Operation}\label{sec:Device Types / Block Device / Device Operation / Legacy Interface: Device Operation} >> When using the legacy interface, transitional devices and drivers >> -- >> 2.4.3 >> >> >> --------------------------------------------------------------------- >> To unsubscribe from this mail list, you must leave the OASIS TC that >> generates this mail. Follow this link to all your TCs in OASIS at: >> https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]