OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [virtio] Re: [virtio-comment] Problems with VIRTIO-4 and writeback only disks


Paolo Bonzini <pbonzini@redhat.com> writes:
> Il 16/09/2013 03:34, Rusty Russell ha scritto:
>>> > The default WCE=0 semantics should be that the host ensures every write
>>> > reaches stable storage.
>> Here's the problem: I don't think anyone will really implement this.
>> 
>> lguest certainly doesn't flush every write, not bhyve.  Xen famously
>> didn't.  I can't see where qemu does it either, but it could be buried
>> in the aio stuff?
>> 
>
> It's here in block.c's bdrv_co_do_writev:
>
>     if (ret < 0) {
>         /* Do nothing, write notifier decided to fail this request */
>     } else if (flags & BDRV_REQ_ZERO_WRITE) {
>         ret = bdrv_co_do_write_zeroes(bs, sector_num, nb_sectors);
>     } else {
>         ret = drv->bdrv_co_writev(bs, sector_num, nb_sectors, qiov);
>     }
>
>     if (ret == 0 && !bs->enable_write_cache) {
>         ret = bdrv_co_flush(bs);
>     }

The truth is more complex than that.

bdrv_co_flush() calls:

        bs->drv->bdrv_co_flush_to_disk

or if that's NULL:

        bs->drv->bdrv_aio_flush

or if that's NULL, does nothing.

Now, qcow2 doesn't set bdrv_co_flush_to_disk or bdrv_aio_flush.  I tried
to follow the others, eg. vmdk, but it's completely runtime-determined
maze of function pointers so I can't tell if anyone *actually* flushes
to disk.

It seems a raw file will get an fdatasync, via bdrv_aio_flush() ->
raw_aio_flush(), but strace shows no fsync/fdatasync calls, even when I
just a raw file, and type "sync" in the guest:

strace -e trace=file,desc -o /tmp/trace qemu-system-i386 -machine pc,accel=kvm -m 512 -drive file=/home/rusty/qemu-images/ubuntu-copy,index=0,media=disk,if=virtio -kernel arch/x86/boot/bzImage -append "ro root=/dev/vda1 single"

So perhaps if you use the right back ends and set your non-default
options just right you can have your data safe on disk?  But you're
certainly not convincing me anyone wants it...

Cheers,
Rusty.



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]