[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [virtio-dev] Re: [virtio-comment] Problems with VIRTIO-4 and writeback only disks
Rusty Russell <email@example.com> writes: > Stefan Hajnoczi <firstname.lastname@example.org> writes: >> On Tue, Sep 17, 2013 at 03:03:47PM +0930, Rusty Russell wrote: >>> 3) If device does not offer VIRTIO_BLK_F_WCE, or driver doesn't >>> negotiate it: >>> - Completed writes should be persistent if guest crashes. >>> - No flush commands are supported. >>> - No guarantee about writes hitting permanent storage. >>> >>> This pretty neatly divides it into complex and simple cases. If you >>> want more fine-grained, you know where to find virtio-scsi... >> >> #3 is worse than what we had with VIRTIO_BLK_F_WCE semantics. In order >> to keep things simple you weakened the guarantees to the point where you >> have to look at your hypervisor implementation instead of the virtio >> standard. > > Let's be absolutely clear here, the spec can *never* say: > > Writes MUST be committed to persistent storage. > > Because there are real use cases which violate that: consider qemu > -snapshot. So you will *always* have to consider the hypervisor. > >> We're trying to define standard so guests and hypervisors can >> work together - undefined behavior doesn't further that goal, it >> actually prevents virtio implementations from working universally. > > It's a quality of implementation issue, not a core compatibility issue. > And I think it's perfectly reasonable not to flush to permanent storage. > Bryan, have there been any complaints about bhyve not doing it? > >> When VIRTIO_BLK_F_WCE is not offered by the device or negotiated by the >> guest it makes sense to guarantee that every write hits permanent >> storage. > > Perhaps conflating the two (WCE <=> permanence) is a mistake. But I > think we need a way for fast, simple implementations to exist: so far, > that's the norm. And I'm reluctant to weaken SHOULD to MAY. Thinking about this some more: why not make WCE the only option? (1) It's simple. (2) We say flush SHOULD hit the disk. Now, I *think* we should remove "Completed (unflushed) writes should be persistent if guest crashes.". This is a nice property for simple guests, but it disallows aggressive internal caching. eg. consider a device implementation which compresses into blocks. It might want to cache writes aggressively internally in the hope of aggregating them before compressing and writing back. OTOH, it could do so by deferring completion until that has been done... Thoughts? Rusty.