OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

# virtio message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: Re: [virtio] [PATCH] virtio-blk: restore VIRTIO_BLK_F_FLUSH and VIRTIO_BLK_F_CONFIG_WCE

• From: Paolo Bonzini <pbonzini@redhat.com>
• To: "Michael S. Tsirkin" <mst@redhat.com>
• Date: Wed, 8 Jul 2015 12:36:21 +0200


On 08/07/2015 12:24, Michael S. Tsirkin wrote:
>>>> +If VIRTIO_BLK_F_CONFIG_WCE is negotiated but the VIRTIO_BLK_F_FLUSH feature
>>>> +is not, the device MUST set \field{writeback} to 0 as soon as the driver
>>>> +sets the FEATURES_OK status bit.
>>>
>>> So this field is dual use, sometimes it's written by driver, sometimes -
>>> by a device. A bit messy I think, will cause races or at least
>>> confusion.
>>
>> It doesn't cause races, because the device can set it before returning
>> from the instruction that wrote FEATURES_OK.
>
> SMP guests exist :)

See below.

>>> What is this in aid of?
>>>
>>> If VIRTIO_BLK_F_FLUSH is not negotiated devices already commit writes.
>>>
>>> Why also set writeback to 0?
>>
>> So that it's consistent with the status of the device.  The driver may
>> set it to 1 to prevent the device from committing the writes.
>
> Fields should have a single owner. RO fields can change, RW ones can't.
> This one is writeable, let's ask driver to set it consistently.

Until FEATURES_OK is set and the guest has checked that the FEATURES_OK
bit was not reset by the device, the driver does not even _know that the
field exists_.  So it is a non problem.

Until FEATURES_OK is set, all fields are owned by the device.  Doing as
you suggest would make things a lot more complicated for legacy devices.

>>>> +Legacy devices must also never modify \field{writeback} as a result of
>>>
>>> We can not require anything from legacy devices. They are out there.
>>
>> They didn't as far as I know, and if you want to implement both modern
>> and legacy this is a deviation from the requirements of modern devices.
>
> We don't tell people how to implement legacy devices. We support transitional devices
> with legacy interfaces. Is this what you are trying to say?
> If yes must is MUST, etc.

Okay, will fix.

>>>> -Some older legacy devices did not operate in writethrough mode even
>>>> -after a driver announced lack of support for VIRTIO_BLK_F_FLUSH.
>>>>
>>>>  \subsection{Device Operation}\label{sec:Device Types / Block Device / Device Operation}
>>>>
>>>> @@ -3865,14 +3897,67 @@ A driver SHOULD accept the VIRTIO_BLK_F_RO feature if offered.
>>>>  A driver MUST set \field{sector} to 0 for a VIRTIO_BLK_T_FLUSH request.
>>>>  A driver SHOULD NOT include any data in a VIRTIO_BLK_T_FLUSH request.
>>>>
>>>> +If the VIRTIO_BLK_F_CONFIG_WCE feature is negotiated, the driver MAY
>>>> +switch to writethrough or writeback mode by writing respectively 0 and
>>>> +1 to the \field{writeback} field.  After writing a 0 to \field{writeback},
>>>> +the driver MUST NOT assume that any volatile writes have been committed
>>>> +to non-volatile storage.
>>>
>>> I think volatile here clashes with its use for buffers.
>>> How about we say e.g.  "permanent device backend storage" here?
>>> Same below.
>>
>> Okay, or even just s/non-volatile/permanent/.
>
> I added "device backend" to stress this depends on device.
> It's not absoutely permanent - as permanent as device can make it.

Okay.

>>>> +
>>>>  \devicenormative{\subsubsection}{Device Operation}{Device Types / Block Device / Device Operation}
>>>>
>>>>  A device MUST set the \field{status} byte to VIRTIO_BLK_S_IOERR
>>>>  for a write request if the VIRTIO_BLK_F_RO feature if offered, and MUST NOT
>>>>  write any data.
>>>>
>>>> -Upon receipt of a VIRTIO_BLK_T_FLUSH request, the device SHOULD ensure
>>>> -that any writes which were completed are committed to non-volatile storage.
>>>> +A write is considered volatile when it is submitted; the contents of
>>>> +sectors covered by a volatile are undefined until the write becomes stable.
>>>> +A write becomes stable once it is completed and one or more of the
>>>> +following conditions is true:
>>>> +
>>>> +\begin{enumerate}
>>>> +\item\label{item:flush1} neither VIRTIO_BLK_F_CONFIG_WCE nor
>>>> +  VIRTIO_BLK_F_FLUSH feature were negotiated, but VIRTIO_BLK_F_FLUSH was
>>>> +  proposed by the device;
>>>> +
>>>> +\item\label{item:flush2} the VIRTIO_BLK_F_CONFIG_WCE feature was negotiated and the
>>>> +  \field{writeback} field in configuration space was 0 \textbf{all the time between
>>>> +  the submission of the write and its completion};
>>>> +
>>>> +\item\label{item:flush3} a VIRTIO_BLK_T_FLUSH request is sent \textbf{after the write is
>>>> +  completed} and is completed itself.
>>>> +\end{enumerate}
>>>> +
>>>> +The device MUST ensure that stable writes are committed to
>>>> +non-volatile storage,
>>>
>>> So in particular at the moment QEMU violates this requirement.
>>
>> It doesn't.  Why do you say it does?
>
> Below 2 paragraphs tell you why :)
> Because it does not always *have* non-volatile storage.

So are we still discussing what MUST means?  If the guest cannot see the
difference, QEMU is not violating anything.

>>>> +% ** Not using MAY/MAY NOT intentionally, this is not optional behavior **
>>>> +If the driver changes \field{writeback} between the submission of the write
>>>> +and its completion, the write can be either volatile or stable when
>>>> +its completion is reported; the exact behavior is undefined.
>>>
>>> So MUST then?
>>
>> No, definitely not MUST.  "The write MUST be either X or not X" is a
>> tautology.  "The write MAY be either X or not X" also does not make
>> sense, because there's no way for an implementation to do neither X nor
>> not X.  Tertium non datur.
>
> Missed this fact. As would other readers :)
>
> I would s/can be/is/. And drop the exact behavior is undefined -
> this is just clarifying the definition, not saying anything
> about the behaviour. Maybe prefix with "in other words".

That's certainly okay.

>> No, it's definitely a MAY.  "One vendor may choose to include the item
>> because a particular marketplace requires it or because the vendor feels
>> that it enhances the product while another vendor may omit the same
>> item" is a perfect description of this.  Also, the driver can recognize
>> the situation (and though it cannot do anything about it, it can log
>> something).
>
> But SHOULD is an even better match:
> 	SHOULD   This word, or the adjective "RECOMMENDED", mean that there
> 	   may exist valid reasons in particular circumstances to ignore a
> 	   particular item, but the full implications must be understood and
> 	   carefully weighed before choosing a different course.
>
> implications such as data loss should definitely be well understood.
>
> MAY just means "ok not to do".

Ah I see what you mean.  I think what you want is already covered above:
"Devices SHOULD always propose VIRTIO_BLK_F_FLUSH".

If they don't, they've already done something not recommended, and a MAY
is just fine in this paragraph.  This paragraph only applies to
implementation that have already disregarded that other "SHOULD".  It
makes little sense to do yet another recommendation that they would
disregard.

This "MAY" is here to explicitly allow the behavior.

>>> It seems to imply VIRTIO_BLK_F_FLUSH somehow means "I will not lose
>>> your data on crash". That requirement is just too strong, devices often have
>>> no way to know, and guests often don't care.
>>
>> I don't think it's too strong.  Of course it is more like "I will not
>> lose your data on crash, unless the user screwed up".
>
> As I explained above, this blocks using tmpfs for storage -
> that loses data on a crash.

tmpfs for storage loses way more data than just unflushed things.  The
whole backend _does not exist anymore_ after a crash.  Since today I'm
in the mood of using Latin, "ex falso quodlibet". :)

>>> Why is it tied to VIRTIO_BLK_F_FLUSH at all? Is this an optimization?
>>>
>>>
>>>> +\end{note}
>>>>
>>>>  \subsubsection{Legacy Interface: Device Operation}\label{sec:Device Types / Block Device / Device Operation / Legacy Interface: Device Operation}
>>>>  When using the legacy interface, transitional devices and drivers
>>>> @@ -3907,6 +3992,9 @@ serve as data consistency guarantee.  Only a VIRTIO_BLK_T_FLUSH request
>>>>  does that.
>>>>  \end{note}
>>>>
>>>> +Some older legacy devices did not commit completed writes to non-volatile
>>>> +storage, even if VIRTIO_BLK_F_FLUSH was proposed but not negotiated.
>>>
>>> Anything transitional drivers can do it work around that?
>>
>> Always accept VIRTIO_BLK_F_FLUSH.
>
> I thought they also did this on flush. So simply "when", not "even".
> Maybe suggest what transitional drivers should do.

Okay, I'll take a look.

>>> If no, this doesn't seem like a useful bit of info to me.
>>
>> I can remove it, but it was in the existing document and there is a
>> workaround.
>
> That one talked about using writethrough mode.
> So it was clear isn't unrelated to flush.

VIRTIO_BLK_F_FLUSH proposed but not negotiated _is_ writethrough mode.

Paolo


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]