OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [PATCH v1 4/8] admin: Add device migration admin commands


On Wed, Oct 18, 2023 at 08:24:56AM +0000, Parav Pandit wrote:
> 
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Wednesday, October 18, 2023 12:17 PM
> > 
> > On Sun, Oct 08, 2023 at 02:25:51PM +0300, Parav Pandit wrote:
> > 
> > ...
> > 
> > > +\paragraph{Device Context Size Get Command} \label{par:Basic
> > > +Facilities of a Virtio Device / Device groups / Group administration
> > > +commands / Device Migration / Device Context Size Get Command}
> > > +
> > > +This command returns the remaining estimated device context size. The
> > > +driver can query the remaining estimated device context size for the
> > > +current mode or for the \field{Freeze} mode. While reading the device
> > > +context using VIRTIO_ADMIN_CMD_DEV_CTX_READ command, the actual
> > > +device context size may differ than what is being returned by this
> > > +command. After reading the device context using command
> > > +VIRTIO_ADMIN_CMD_DEV_CTX_READ, the remaining estimated context
> > size
> > > +usually reduces by amount of device context read by the driver using
> > > +VIRTIO_ADMIN_CMD_DEV_CTX_READ command. If the device context is
> > > +updated rapidly the remaining estimated context size may also
> > > +increase even after reading the device context using
> > VIRTIO_ADMIN_CMD_DEV_CTX_READ command.
> > > +
> > > +For the command VIRTIO_ADMIN_CMD_DEV_CTX_SIZE_GET, \field{opcode}
> > is set to 0x9.
> > > +The \field{group_member_id} refers to the member device to be accessed.
> > > +
> > > +\begin{lstlisting}
> > > +struct virtio_admin_cmd_dev_ctx_size_get_data {
> > > +        u8 freeze_mode;
> > > +};
> > > +\end{lstlisting}
> > > +
> > > +The \field{command_specific_data} is in the format \field{struct
> > > +virtio_admin_cmd_dev_ctx_size_get_data}.
> > > +When \field{freeze_mode} is set to 1, the device returns the
> > > +estimated device context size when the device will be in \field{Freeze} mode.
> > > +As the device context is read from the device, the remaining
> > > +estimated context size may decrease. For example, member device mode
> > > +is \field{Stop}, the device has estimated total device context size
> > > +as 12KB; the device would return 12KB for the first
> > > +VIRTIO_ADMIN_CMD_DEV_CTX_SIZE_GET command, once the driver has
> > > +already read 8KB of device context data using
> > > +VIRTIO_ADMIN_CMD_DEV_CTX_READ command, and the remaining data is
> > 4KB,
> > > +hence the device returns 4KB in the subsequent
> > > +VIRTIO_ADMIN_CMD_DEV_CTX_SIZE_GET command.
> > > +
> > > +\begin{lstlisting}
> > > +struct virtio_admin_cmd_dev_ctx_size_get_result {
> > > +        le64 size;
> > > +};
> > > +\end{lstlisting}
> > 
> > So we have a 64 bit size? How are we going to return so much?
> > 
> I agree it is a lot.
> But this is the case because one has defined struct virtio_pci_cap64.

Good point.

That's there so we can define 64 bit resources.

But if you have them and you want to migrate then you need change
tracking and incremental state save, you can not do it in
one shot with device context.


> > 
> > > +
> > > +When the command completes successfully,
> > > +\field{command_specific_result} is in the format \field{struct
> > virtio_admin_cmd_dev_ctx_size_get_result}.
> > > +
> > > +Once the device context is fully read, this command returns zero for
> > > +\field{size} until the new device context is generated.
> > > +
> > > +\paragraph{Device Context Read Command} \label{par:Basic Facilities
> > > +of a Virtio Device / Device groups / Group administration commands /
> > > +Device Migration / Device Context Read Command}
> > > +
> > > +This command reads the current device context.
> > > +For the command VIRTIO_ADMIN_CMD_DEV_CTX_READ, \field{opcode} is
> > set to 0xa.
> > > +The \field{group_member_id} refers to the member device to be accessed.
> > > +
> > > +This command has no command specific data.
> > 
> > So I am not sure this is wise. Multi-year experience with QEMU taught us that
> > we are likely to make mistakes when defining migration format - forget some
> > fields, validate them incorrectly, and so on.
> > Making a somewhat safe assumption that we'll make mistakes in the spec, too,
> > I'd like to see some kind of idea of how we'll support compatibility and/or
> > graceful failure if/when we do.
> > 
> Very valid point. In v2 I added the compatibility command at [1] as "Device Context Supported Fields Query Command"
> 
> [1] https://lists.oasis-open.org/archives/virtio-comment/202310/msg00195.html
> > 
> > > +\begin{lstlisting}
> > > +struct virtio_admin_cmd_dev_ctx_rd_len {
> > > +        le32 context_len;
> > > +};
> > > +
> > > +struct virtio_admin_cmd_dev_ctx_rd_result {
> > > +        u8 data[];
> > > +};
> > > +\end{lstlisting}
> > 
> > so callers needs to pin whatever device tells it to?
> > 
> How much memory to pin is driver choice, it can pin 4K and keep reading 1MB memory by 1MB/4K size calls.
> Or it can pin 1MB and read full.
> 
> > admin commands support truncation intentionally.
> > 
> > it is not clear, to me, that it's ok to have device just save as much state as it
> > wants to.
> Why would device store some unreasonable amount of state?

Because I know how hardware vendor minds work: oh we can request
a bunch of memory, let's do that just in case ;)

> > 
> > > +
> > > +When the command completes successfully,
> > > +\field{command_specific_result} is in the format \field{struct
> > > +virtio_admin_cmd_dev_ctx_rd_result}
> > > +returned by the device containing the device context data and
> > > +\field{command_specific_output} is in format of \field{struct
> > > +virtio_admin_cmd_dev_ctx_rd_len} containing length of context data
> > > +returned by the device in the command response. When the length
> > > +returned is zero or when the returned context data is less the data
> > > +requested by the driver, the device do not have any device context
> > > +data left that the device can report, at this point the device context stream
> > ends.
> > > +
> > > +The driver can read the whole device context data using one or
> > > +multiple commands. When the device context does not fit in the
> > > +\field{command_specific_result}, driver reads the subsequent
> > > +remaining bytes using one or more subsequent commands.
> > 
> > how?
> For example, driver requested 100 bytes.
> a. Device returned 40 bytes. So the whole context fit, nothing more to be done.
> Or
> b. Device returned 100 bytes, so driver does not know if there is more or not.
> Hence, driver request again for additional 100 bytes in 2nd call.
> If there is nothing, device returns success with zero bytes.
> If the device has 30 bytes more it returns 30 bytes. The returned bytes < requested bytes hence the read stream ends.

do you then expect device to maintain all data in some internal buffer,
then keep an offset internally and each following command returns a
different chunk?  all this sounds rather fragile, not to mention
wasteful wrt using memory for that internal buffer.

-- 
MST



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]