[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [virtio-comment] [PATCH V2 2/2] virtio: introduce STOP status bit
å 2021/7/26 äå11:07, Stefan Hajnoczi åé:
On Thu, Jul 22, 2021 at 09:08:58PM +0800, Jason Wang wrote:å 2021/7/22 äå6:24, Stefan Hajnoczi åé:On Thu, Jul 22, 2021 at 03:33:10PM +0800, Jason Wang wrote:å 2021/7/21 äå6:20, Stefan Hajnoczi åé:On Wed, Jul 21, 2021 at 10:29:17AM +0800, Jason Wang wrote:å 2021/7/20 äå4:50, Stefan Hajnoczi åé:I recognize that opaque device state poses a risk to migration compatibility, because device implementors may arbitrarily use opaque state when a standard is available. However, the way to avoid this scenario is by: 1. Making the standard migration approach the easiest to implement because everything has been taken care of. It will save implementors the headache of defining and coding their own device state representations and versioning. 2. Educate users about migration compatibility so they can identify implementors are locking in their users.For vendor specific device, this may work. But for standard devices like virtio, we should go further. The device states should be defined in the spec clearly. We should re-visit the design if those states contains anything that is implementation specific.Can you describe how migrating virtiofs devices should work?I need to learn more virtio-fs before answering this question. Actually, it would be faster if I can see a prototype of the migration support for virtio-fs and start from there (as I've suggested this in another thread).I think that might be quicker than if I reply to each of your points because our views are still quite far apart.Yes, it would be quicker if we can start from a prototype.I have CCed Max Reitz to check whether a prototype of virtiofs migration might be available soon? But I can describe the key state that needs to be migrated: - FUSE nodeid -> host inode mappings. The driver uses nodeid numbers in the FUSE protocol and the device maps them to actual inodes on the passthrough file system. - FUSE fh -> open fd mappings. The driver uses fh numbers in the FUSE protocol and the device maps them to actual file descriptors on the host. - FUSE fh -> open dir fd mappings. The driver uses fh numbers in the FUSE protocol and the device maps them to actual O_DIRECTORY file descriptors on the host. The driver expects to be able to continue using nodeid and fh numbers across migration. Let's look at just the open fds for a moment: The OPEN command opens the file for a given nodeid and returns its fh. Due to POSIX file system semantics there is no reliable way to reopen the same file from just the filename. The problem is that a file can be renamed or deleted (but still accessible until the last fd is closed). Linux file handles (open_by_handle_at(2) and name_to_handle_at(2)) make it possible to reopen the exact same file using a struct file_handle instead of a filename. So the virtiofs device could transfer the Linux file handles to the destination where the fd -> open fd mappings can be restored. The problem is that Linux file handles are an implementation-specific solution to this problem.
Yes according to the manpage, it not a part of the uABI, so it's not guaranteed to work on the destination if I understand it correctly.
On non-Linux hosts there may be other solutions that userspace file systems use to solve this problem. Or a virtiofs device may not implement a passthrough host file system and have a completely different concept of what an inode is.
The situation is somehow similar to device pass-through which makes it very hard to have a general way to migrate.
This means only a subset of virtiofs implementations can use Linux file handles as part of their device state. There is no way for the driver or device to recreate or restore the necessary information without implementation-specific device state like Linux file handles, though.
So my understanding is that even the linux file handle is not a general solution:
- not a part of uABI (not guaranteed to work on the destination) - depends on the kernel version and a specific Kconfig (CONFIG_FHANDLE)
I guess this is just a summary of what we've already discussed and not new information. I think an implementation today would use DBus VMState to transfer implementation-specific device state (an opaque blob).
Instead of trying to migrate those opaque stuffs which is kind of tricky, I wonder if we can avoid them by recording the mapping in the shared filesystem itself.