OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [virtio] [PATCH v2] virtio-blk: restore VIRTIO_BLK_F_FLUSH and VIRTIO_BLK_F_CONFIG_WCE



On 02/07/2015 11:27, Michael S. Tsirkin wrote:
> On Thu, Jul 02, 2015 at 09:57:07AM +0200, Paolo Bonzini wrote:
>> VIRTIO_BLK_F_CONFIG_WCE is important in order to achieve good performance
>> (up to 2x, though more realistically +30-40%) in latency-bound workloads.
>> However, it was removed by mistake together with VIRTIO_BLK_F_FLUSH.
>>
>> In addition, even removing VIRTIO_BLK_F_FLUSH was probably not a great
>> idea, because it simplifies simple drivers (e.g. firmware) that are okay
>> with a writethrough cache but still need data to persist after power loss.
>> What really should have been removed is just the possibility that devices
>> not propose VIRTIO_BLK_F_FLUSH, but even that only deserves a "SHOULD" in
>> the new world of conformance statements.
>>
>> Restore these, with the following changes:
>>
>> * clarify and use conformance statements in order to define writeback
>> and writethrough caching according to what is commonly done by high-end
>> storage.
>>
>> * clarify (with conformance statements) the influence of the
>> VIRTIO_BLK_F_FLUSH feature on caching and how to proceed if only one of
>> VIRTIO_BLK_F_FLUSH and VIRTIO_BLK_F_CONFIG_WCE is negotiated.
>>
>> * strengthen the requirement for persisting writes after a
>> VIRTIO_BLK_T_FLUSH request.  This is now a MUST.
>>
>> This fixes VIRTIO-144.
>>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>>  content.tex | 76 ++++++++++++++++++++++++++++++++++++++++++-------------------
>>  1 file changed, 52 insertions(+), 24 deletions(-)
>>
>> diff --git a/content.tex b/content.tex
>> index 8de905f..4e03844 100644
>> --- a/content.tex
>> +++ b/content.tex
>> @@ -3723,26 +3723,31 @@ device except where noted.
>>  
>>  \item[VIRTIO_BLK_F_BLK_SIZE (6)] Block size of disk is in \field{blk_size}.
>>  
>> +\item[VIRTIO_BLK_F_FLUSH (9)] Cache flush command support.
>> +
>>  \item[VIRTIO_BLK_F_TOPOLOGY (10)] Device exports information on optimal I/O
>>      alignment.
>> +
>> +\item[VIRTIO_BLK_F_CONFIG_WCE (11)] Device can toggle its cache between writeback
>> +    and writethrough modes.
>>  \end{description}
>>  
>> +Devices SHOULD always propose VIRTIO_BLK_F_FLUSH, and MUST propose it
>> +if they propose VIRTIO_BLK_F_CONFIG_WCE.  Drivers SHOULD only negotiate
>> +these features if they are capable of sending VIRTIO_BLK_T_FLUSH commands.
>> +
> 
> Sorry this is a non-normative section, so you can't say MUST and SHOULD
> here. You'll need to add a normative one, and link from conformance
> sections.
> It's ok to have a bit of duplication between normative
> and non-normative ones if this helps readability.

Ok, will move this to a normative subsubsection.

>>  \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Block Device / Feature bits / Legacy Interface: Feature bits}
>>  
>>  \begin{description}
>>  \item[VIRTIO_BLK_F_BARRIER (0)] Device supports request barriers.
>>  
>>  \item[VIRTIO_BLK_F_SCSI (7)] Device supports scsi packet commands.
>> -
>> -\item[VIRTIO_BLK_F_FLUSH (9)] Cache flush command support.
>> -
> 
> Did you mean to remove this one?

Yes, it's not legacy anymore.

>>  \subsection{Device configuration layout}\label{sec:Device Types / Block Device / Device configuration layout}
>>  
>> @@ -3771,7 +3776,7 @@ struct virtio_blk_config {
>>                  // optimal (suggested maximum) I/O size in blocks
>>                  le32 opt_io_size;
>>          } topology;
>> -        u8 reserved;
>> +        u8 writeback;
>>  };
>>  \end{lstlisting}
>>  
>> @@ -3801,21 +3806,27 @@ according to the native endian of the guest rather than
>>    \field{topology} struct can be read to determine the physical block size and optimal
>>    I/O lengths for the driver to use. This also does not affect the units
>>    in the protocol, only performance.
>> -\end{enumerate}
>> -
>> -\subsubsection{Legacy Interface: Device Initialization}\label{sec:Device Types / Block Device / Device Initialization / Legacy Interface: Device Initialization}
>>  
>> -The \field{reserved} field used to be called \field{writeback}.  If the
>> -VIRTIO_BLK_F_CONFIG_WCE feature is offered, the cache mode can be
>> -read from \field{writeback}; the
>> -driver can also write to the field in order to toggle the cache
>> -between writethrough (0) and writeback (1) mode.  If the feature is
>> -not available, the driver can instead look at the result of
>> -negotiating VIRTIO_BLK_F_FLUSH: the cache will be in writeback mode
>> -after reset if and only if VIRTIO_BLK_F_FLUSH is negotiated.
>> +\item If the VIRTIO_BLK_F_CONFIG_WCE feature is negotiated, the cache
>> +  mode can be read from \field{writeback}.  0 corresponds to a writethrough
>> +  cache, 1 to a writeback cache\footnote{Consistent with
>> +    \ref{sec:Device Types / Block Device / Device Operation},
>> +    a writethrough cache can be defined broadly as a cache that commits
>> +    writes to non-volatile storage before reporting their completion.
>> +    For example, a battery-backed writeback cache actually counts as
>> +    writethrough according to this definition.}.
> 
> This seems a bit too much text for a footnote, I'd still prefer it
> as a simple text note.

I can't do this inside an {enumerate}.  It is long-ish, but it's just
documenting what existing hardware does.

>> +
>> +\item If the VIRTIO_BLK_F_CONFIG_WCE feature is not available, the driver
>> +  can deduct the cache mode by looking at the result of negotiating
>> +  VIRTIO_BLK_F_FLUSH: the cache will be in writeback mode if and only if
>> +  VIRTIO_BLK_F_FLUSH is negotiated.
>> +\end{enumerate}
>>  
>> -Some older legacy devices did not operate in writethrough mode even
>> -after a driver announced lack of support for VIRTIO_BLK_F_FLUSH.
>> +If the VIRTIO_BLK_F_FLUSH feature is not negotiated while
>> +VIRTIO_BLK_F_CONFIG_WCE is, \field{writeback} MUST read 0 immediately
>> +after feature negotiation.
> 
> This is confusing.
> Previously you simply said writeback is only valid if
> VIRTIO_BLK_F_CONFIG_WCE is negotiated.

This paragraph says "If the VIRTIO_BLK_F_FLUSH feature is not negotiated
*while VIRTIO_BLK_F_CONFIG_WCE is*".

> Also, what's the reset value if it is negotiated?

It is not specified.  The device can make it persistent or reset it to a
user-specified value.

> Also, should we prohibit driver from writing the field
> if VIRTIO_BLK_F_FLUSH is off?

It doesn't really matter I think.

>>  Alternatively, a device MAY simply fail
>> +feature negotiation if the driver proposes VIRTIO_BLK_F_CONFIG_WCE but
>> +does not propose VIRTIO_BLK_F_FLUSH.
> 
> Alternatively to what?

I'll change the previous sentence to "\field{writeback} MUST read 0
immediately after successful feature negotiation" (adding "successful").
 Then, failing feature negotiation can be an alternative because you
don't reach successful negotiation.

> We can't have a mandatory behaviour but also
> make it optional. And driver does not propose features, it negotiates
> them.

Will fix.

> Or maybe device must fail negotiation if it meets some criteria?

Yes, but just "MAY".

>>  \subsection{Device Operation}\label{sec:Device Types / Block Device / Device Operation}
>>  
>> @@ -3871,8 +3882,25 @@ A device MUST set the \field{status} byte to VIRTIO_BLK_S_IOERR
>>  for a write request if the VIRTIO_BLK_F_RO feature if offered, and MUST NOT
>>  write any data.
>>  
>> -Upon receipt of a VIRTIO_BLK_T_FLUSH request, the device SHOULD ensure
>> -that any writes which were completed are committed to non-volatile storage.
>> +If the \field{VIRTIO_BLK_F_FLUSH} feature is not negotiated, the
>> +device SHOULD ensure that all writes are committed to non-volatile
>> +storage before reporting completion.  It MUST do so if it proposed
>> +\field{VIRTIO_BLK_F_FLUSH}.  Failure to do so can cause data loss.
> 
> The 1st one is a SHOULD because not everyone cares about data loss.
> But why is the second one a MUST? Same logic seems to apply.

Because then the driver can _know_ that it is safe from data losses, by
looking at proposed features.

> Same below.
> 
>> +
>> +If the \field{VIRTIO_BLK_F_FLUSH} feature is negotiated, upon receipt
>> +of a VIRTIO_BLK_T_FLUSH request, the device MUST ensure that any writes
>> +which were completed are committed to non-volatile storage.
> 
> Aren't there any driver requirements?
> E.g. with one mode driver may assume data is safe
> when complete, in another is should issue flush requests
> to commit it to storage.

I don't follow this.  Can you rephrase?

>> +
>> +If the VIRTIO_BLK_F_CONFIG_WCE feature is negotiated and the
>> +\field{writeback} field in configuration space is 0, the device MUST
>> +ensure that all writes are committed to non-volatile storage before
>> +reporting completion.
>> +
>> +If the VIRTIO_BLK_F_CONFIG_WCE feature is negotiated, the driver MAY
>> +switch to writethrough or writeback mode by writing respectively 0 and
>> +1 to the \field{writeback} field.  After writing a 0 to \field{writeback},
>> +the driver MUST NOT assume that writes submitted while the field was 1
> 
> since the last VIRTIO_BLK_F_FLUSH completed, correct?

More precisely "submitted while the field was 1 and completed after the
last VIRTIO_BLK_F_FLUSH was submitted".  Does it have to be specified
explicitly?

Paolo

>> have +been committed to non-volatile storage.
> 
> 
> 
>>  
>>  \subsubsection{Legacy Interface: Device Operation}\label{sec:Device Types / Block Device / Device Operation / Legacy Interface: Device Operation}
>>  When using the legacy interface, transitional devices and drivers
>> -- 
>> 2.4.3
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe from this mail list, you must leave the OASIS TC that 
>> generates this mail.  Follow this link to all your TCs in OASIS at:
>> https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php 


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]