[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [virtio] [PATCH v2] virtio-blk: restore VIRTIO_BLK_F_FLUSH and VIRTIO_BLK_F_CONFIG_WCE
On 02/07/2015 12:51, Michael S. Tsirkin wrote: >>>> +\item If the VIRTIO_BLK_F_CONFIG_WCE feature is negotiated, the cache >>>> + mode can be read from \field{writeback}. 0 corresponds to a writethrough >>>> + cache, 1 to a writeback cache\footnote{Consistent with >>>> + \ref{sec:Device Types / Block Device / Device Operation}, >>>> + a writethrough cache can be defined broadly as a cache that commits >>>> + writes to non-volatile storage before reporting their completion. >>>> + For example, a battery-backed writeback cache actually counts as >>>> + writethrough according to this definition.}. >>> >>> This seems a bit too much text for a footnote, I'd still prefer it >>> as a simple text note. >> >> I can't do this inside an {enumerate}. It is long-ish, but it's just >> documenting what existing hardware does. > > Why not? I just tried this: note nests fine within enumerate. I get an error from LaTeX. >>>> + >>>> +\item If the VIRTIO_BLK_F_CONFIG_WCE feature is not available, the driver >>>> + can deduct the cache mode by looking at the result of negotiating >>>> + VIRTIO_BLK_F_FLUSH: the cache will be in writeback mode if and only if >>>> + VIRTIO_BLK_F_FLUSH is negotiated. >>>> +\end{enumerate} >>>> >>>> -Some older legacy devices did not operate in writethrough mode even >>>> -after a driver announced lack of support for VIRTIO_BLK_F_FLUSH. >>>> +If the VIRTIO_BLK_F_FLUSH feature is not negotiated while >>>> +VIRTIO_BLK_F_CONFIG_WCE is, \field{writeback} MUST read 0 immediately >>>> +after feature negotiation. >>> >>> This is confusing. >>> Previously you simply said writeback is only valid if >>> VIRTIO_BLK_F_CONFIG_WCE is negotiated. >> >> This paragraph says "If the VIRTIO_BLK_F_FLUSH feature is not negotiated >> *while VIRTIO_BLK_F_CONFIG_WCE is*". > > Would be clearer to revert the order then. Sure, I can do that. > Is this logic something that legacy devices implemented? > I don't see anything like this in QEMU. It wasn't possible because of the lack of a FEATURES_OK state. I'll have to document legacy behavior better. >>> Also, what's the reset value if it is negotiated? >> >> It is not specified. The device can make it persistent or reset it to a >> user-specified value. > > Does driver have to read it on init then? Read it or set it. Linux does the latter. > I think this text could benefit from separating > between two concepts: > - device writenback mode > - writeback field > > operation should refer to current mode, separately > field might or might not make it discoverable. > > I see this in QEMU: > > /* A guest that supports VIRTIO_BLK_F_CONFIG_WCE must be able to send > * cache flushes. Thus, the "auto writethrough" behavior is never > * necessary for guests that support the VIRTIO_BLK_F_CONFIG_WCE feature. > * Leaving it enabled would break the following sequence: > * > * Guest started with "-drive cache=writethrough" > * Guest sets status to 0 > * Guest sets DRIVER bit in status field > * Guest reads host features (WCE=0, CONFIG_WCE=1) > * Guest writes guest features (WCE=0, CONFIG_WCE=1) > * Guest writes 1 to the WCE configuration field (writeback mode) > * Guest sets DRIVER_OK bit in status field > * > * s->blk would erroneously be placed in writethrough mode. > */ > if (!virtio_has_feature(vdev, VIRTIO_BLK_F_CONFIG_WCE)) { > aio_context_acquire(blk_get_aio_context(s->blk)); > blk_set_enable_write_cache(s->blk, > virtio_has_feature(vdev, VIRTIO_BLK_F_WCE)); > aio_context_release(blk_get_aio_context(s->blk)); > } > > > is this what this text is trying to say? This is part of the behavior that I want to clarify. > > > > >>> We can't have a mandatory behaviour but also >>> make it optional. And driver does not propose features, it negotiates >>> them. >> >> Will fix. >> >>> Or maybe device must fail negotiation if it meets some criteria? >> >> Yes, but just "MAY". > > Devices may fail negotiation if any dependencies are not met. > Is this the case? > See Feature bit requirements that we have for the net device, > I think that's formulated in a clearer way. Good. But I don't want to make it mandatory. I'll see what I can come up with in v3. > >>>> \subsection{Device Operation}\label{sec:Device Types / Block Device / Device Operation} >>>> >>>> @@ -3871,8 +3882,25 @@ A device MUST set the \field{status} byte to VIRTIO_BLK_S_IOERR >>>> for a write request if the VIRTIO_BLK_F_RO feature if offered, and MUST NOT >>>> write any data. >>>> >>>> -Upon receipt of a VIRTIO_BLK_T_FLUSH request, the device SHOULD ensure >>>> -that any writes which were completed are committed to non-volatile storage. >>>> +If the \field{VIRTIO_BLK_F_FLUSH} feature is not negotiated, the >>>> +device SHOULD ensure that all writes are committed to non-volatile >>>> +storage before reporting completion. It MUST do so if it proposed >>>> +\field{VIRTIO_BLK_F_FLUSH}. Failure to do so can cause data loss. >>> >>> The 1st one is a SHOULD because not everyone cares about data loss. >>> But why is the second one a MUST? Same logic seems to apply. >> >> Because then the driver can _know_ that it is safe from data losses, by >> looking at proposed features. > > I still don't see a difference. the point of the 1st > SHOULD is that maybe it's used with e.g. -snapshot > so data will be discarded on qemu exit anyway. No, that's not the point of the 1st SHOULD. From the point of view of the guest, if you use "-snapshot" it is totally irrelevant whether the host commits to non-volatile storage. As you wrote, data will be discarded so there is no separation between volatile and non-volatile storage. The point of the first SHOULD is simply to document reasonable behavior for the case of simple drivers that do not care about data loss. Devices that do not support VIRTIO_BLK_F_FLUSH probably would not obey that SHOULD. But devices that do support VIRTIO_BLK_F_FLUSH need to do better, hence the MUST. Do you think the MUST prevents the trick with "-snapshot"? I think it doesn't, because the guest cannot see the difference. (And I think even "MUST"s can be overridden by a user's "I know what I am doing" option such as QEMU's cache.no-flush=on). > Maybe we should specify that data loss is in case of a crash. Yes. > This seems to apply to all cases that might lead to data loss. > > IOW driver doesn't really know it's safe from losses since > backend can still lose data. All driver cares about is > what is required from driver to avoid losses. That's the point: do not propose VIRTIO_BLK_T_FLUSH if you can lose unflushed data from a crash. >>>> +If the \field{VIRTIO_BLK_F_FLUSH} feature is negotiated, upon receipt >>>> +of a VIRTIO_BLK_T_FLUSH request, the device MUST ensure that any writes >>>> +which were completed are committed to non-volatile storage. > > If writeback is enabled, device can cache requests, > and driver must issue explicit flushes, correct? > If writeback is disabled, driver may assume data > is committed after request completes. > > Above should be conformance statements for the driver. I'll add something, but I don't think there should be conformance statement for the drivers about this. Every "MUST" for the driver can be rephrased to a "MAY assume" on the driver and vice versa. > >>>> + >>>> +If the VIRTIO_BLK_F_CONFIG_WCE feature is negotiated and the >>>> +\field{writeback} field in configuration space is 0, the device MUST >>>> +ensure that all writes are committed to non-volatile storage before >>>> +reporting completion. >>>> + >>>> +If the VIRTIO_BLK_F_CONFIG_WCE feature is negotiated, the driver MAY >>>> +switch to writethrough or writeback mode by writing respectively 0 and >>>> +1 to the \field{writeback} field. After writing a 0 to \field{writeback}, >>>> +the driver MUST NOT assume that writes submitted while the field was 1 >>> >>> since the last VIRTIO_BLK_F_FLUSH completed, correct? >> >> More precisely "submitted while the field was 1 and completed after the >> last VIRTIO_BLK_F_FLUSH was submitted". Does it have to be specified >> explicitly? > > If we want it understood correctly, I guess yes. Ok. I have reworked the whole description of when a write becomes stable. It is longer but I think it leaves no space for interpretation. Paolo
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]