virtio-dev message

Subject: Re: [virtio-comment] [PATCH V2 2/2] virtio: introduce STOP status bit

From: Jason Wang <jasowang@redhat.com>
To: Stefan Hajnoczi <stefanha@redhat.com>
Date: Tue, 3 Aug 2021 14:33:20 +0800


å 2021/7/26 äå11:07, Stefan Hajnoczi åé:

On Thu, Jul 22, 2021 at 09:08:58PM +0800, Jason Wang wrote:

å 2021/7/22 äå6:24, Stefan Hajnoczi åé:

On Thu, Jul 22, 2021 at 03:33:10PM +0800, Jason Wang wrote:

å 2021/7/21 äå6:20, Stefan Hajnoczi åé:

On Wed, Jul 21, 2021 at 10:29:17AM +0800, Jason Wang wrote:

å 2021/7/20 äå4:50, Stefan Hajnoczi åé:

I recognize that opaque device state poses a risk to migration
compatibility, because device implementors may arbitrarily use opaque
state when a standard is available.

However, the way to avoid this scenario is by:

1. Making the standard migration approach the easiest to implement
      because everything has been taken care of. It will save implementors
      the headache of defining and coding their own device state
      representations and versioning.

2. Educate users about migration compatibility so they can identify
      implementors are locking in their users.

For vendor specific device, this may work. But for standard devices like
virtio, we should go further.

The device states should be defined in the spec clearly. We should re-visit
the design if those states contains anything that is implementation
specific.

Can you describe how migrating virtiofs devices should work?


I need to learn more virtio-fs before answering this question.

Actually, it would be faster if I can see a prototype of the migration
support for virtio-fs and start from there (as I've suggested this in
another thread).

   I think
that might be quicker than if I reply to each of your points because our
views are still quite far apart.


Yes, it would be quicker if we can start from a prototype.

I have CCed Max Reitz to check whether a prototype of virtiofs migration
might be available soon?

But I can describe the key state that needs to be migrated:

- FUSE nodeid -> host inode mappings. The driver uses nodeid numbers in
   the FUSE protocol and the device maps them to actual inodes on the
   passthrough file system.
- FUSE fh -> open fd mappings. The driver uses fh numbers in the FUSE
   protocol and the device maps them to actual file descriptors on the
   host.
- FUSE fh -> open dir fd mappings. The driver uses fh numbers in the
   FUSE protocol and the device maps them to actual O_DIRECTORY file
   descriptors on the host.

The driver expects to be able to continue using nodeid and fh numbers
across migration. Let's look at just the open fds for a moment:

The OPEN command opens the file for a given nodeid and returns its fh.
Due to POSIX file system semantics there is no reliable way to reopen
the same file from just the filename. The problem is that a file can be
renamed or deleted (but still accessible until the last fd is closed).

Linux file handles (open_by_handle_at(2) and name_to_handle_at(2)) make
it possible to reopen the exact same file using a struct file_handle
instead of a filename. So the virtiofs device could transfer the Linux
file handles to the destination where the fd -> open fd mappings can be
restored.

The problem is that Linux file handles are an implementation-specific
solution to this problem.

Yes according to the manpage, it not a part of the uABI, so it's notguaranteed to work on the destination if I understand it correctly.

On non-Linux hosts there may be other
solutions that userspace file systems use to solve this problem. Or a
virtiofs device may not implement a passthrough host file system and
have a completely different concept of what an inode is.

The situation is somehow similar to device pass-through which makes itvery hard to have a general way to migrate.


This means only a subset of virtiofs implementations can use Linux file
handles as part of their device state. There is no way for the driver or
device to recreate or restore the necessary information without
implementation-specific device state like Linux file handles, though.

So my understanding is that even the linux file handle is not a generalsolution:


- not a part of uABI (not guaranteed to work on the destination)
- depends on the kernel version and a specific Kconfig (CONFIG_FHANDLE)


I guess this is just a summary of what we've already discussed and not
new information. I think an implementation today would use DBus VMState
to transfer implementation-specific device state (an opaque blob).

Instead of trying to migrate those opaque stuffs which is kind oftricky, I wonder if we can avoid them by recording the mapping in theshared filesystem itself.


Thanks


Stefan

Follow-Ups:
- Re: [virtio-comment] [PATCH V2 2/2] virtio: introduce STOP status bit
  - From: Stefan Hajnoczi <stefanha@redhat.com>