OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: RE: [PATCH v1 4/8] admin: Add device migration admin commands



> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Wednesday, October 18, 2023 3:56 PM

> On Wed, Oct 18, 2023 at 08:24:56AM +0000, Parav Pandit wrote:
> >
> > > From: Michael S. Tsirkin <mst@redhat.com>
> > > Sent: Wednesday, October 18, 2023 12:17 PM
> > >
> > > On Sun, Oct 08, 2023 at 02:25:51PM +0300, Parav Pandit wrote:
> > >
> > > ...
> > >
> > > > +\paragraph{Device Context Size Get Command} \label{par:Basic
> > > > +Facilities of a Virtio Device / Device groups / Group
> > > > +administration commands / Device Migration / Device Context Size
> > > > +Get Command}
> > > > +
> > > > +This command returns the remaining estimated device context size.
> > > > +The driver can query the remaining estimated device context size
> > > > +for the current mode or for the \field{Freeze} mode. While
> > > > +reading the device context using VIRTIO_ADMIN_CMD_DEV_CTX_READ
> > > > +command, the actual device context size may differ than what is
> > > > +being returned by this command. After reading the device context
> > > > +using command VIRTIO_ADMIN_CMD_DEV_CTX_READ, the remaining
> > > > +estimated context
> > > size
> > > > +usually reduces by amount of device context read by the driver
> > > > +using VIRTIO_ADMIN_CMD_DEV_CTX_READ command. If the device
> > > > +context is updated rapidly the remaining estimated context size
> > > > +may also increase even after reading the device context using
> > > VIRTIO_ADMIN_CMD_DEV_CTX_READ command.
> > > > +
> > > > +For the command VIRTIO_ADMIN_CMD_DEV_CTX_SIZE_GET,
> \field{opcode}
> > > is set to 0x9.
> > > > +The \field{group_member_id} refers to the member device to be
> accessed.
> > > > +
> > > > +\begin{lstlisting}
> > > > +struct virtio_admin_cmd_dev_ctx_size_get_data {
> > > > +        u8 freeze_mode;
> > > > +};
> > > > +\end{lstlisting}
> > > > +
> > > > +The \field{command_specific_data} is in the format \field{struct
> > > > +virtio_admin_cmd_dev_ctx_size_get_data}.
> > > > +When \field{freeze_mode} is set to 1, the device returns the
> > > > +estimated device context size when the device will be in \field{Freeze}
> mode.
> > > > +As the device context is read from the device, the remaining
> > > > +estimated context size may decrease. For example, member device
> > > > +mode is \field{Stop}, the device has estimated total device
> > > > +context size as 12KB; the device would return 12KB for the first
> > > > +VIRTIO_ADMIN_CMD_DEV_CTX_SIZE_GET command, once the driver
> has
> > > > +already read 8KB of device context data using
> > > > +VIRTIO_ADMIN_CMD_DEV_CTX_READ command, and the remaining
> data is
> > > 4KB,
> > > > +hence the device returns 4KB in the subsequent
> > > > +VIRTIO_ADMIN_CMD_DEV_CTX_SIZE_GET command.
> > > > +
> > > > +\begin{lstlisting}
> > > > +struct virtio_admin_cmd_dev_ctx_size_get_result {
> > > > +        le64 size;
> > > > +};
> > > > +\end{lstlisting}
> > >
> > > So we have a 64 bit size? How are we going to return so much?
> > >
> > I agree it is a lot.
> > But this is the case because one has defined struct virtio_pci_cap64.
> 
> Good point.
> 
> That's there so we can define 64 bit resources.
> 
> But if you have them and you want to migrate then you need change tracking
> and incremental state save, you can not do it in one shot with device context.
>
Each read of the device context would return different incremental state.
It would be good possibly to have explicit signal on the device context read to indicate end of stream, instead of relying on the size.
 
> 
> > >
> > > > +
> > > > +When the command completes successfully,
> > > > +\field{command_specific_result} is in the format \field{struct
> > > virtio_admin_cmd_dev_ctx_size_get_result}.
> > > > +
> > > > +Once the device context is fully read, this command returns zero
> > > > +for \field{size} until the new device context is generated.
> > > > +
> > > > +\paragraph{Device Context Read Command} \label{par:Basic
> > > > +Facilities of a Virtio Device / Device groups / Group
> > > > +administration commands / Device Migration / Device Context Read
> > > > +Command}
> > > > +
> > > > +This command reads the current device context.
> > > > +For the command VIRTIO_ADMIN_CMD_DEV_CTX_READ, \field{opcode}
> is
> > > set to 0xa.
> > > > +The \field{group_member_id} refers to the member device to be
> accessed.
> > > > +
> > > > +This command has no command specific data.
> > >
> > > So I am not sure this is wise. Multi-year experience with QEMU
> > > taught us that we are likely to make mistakes when defining
> > > migration format - forget some fields, validate them incorrectly, and so on.
> > > Making a somewhat safe assumption that we'll make mistakes in the
> > > spec, too, I'd like to see some kind of idea of how we'll support
> > > compatibility and/or graceful failure if/when we do.
> > >
> > Very valid point. In v2 I added the compatibility command at [1] as "Device
> Context Supported Fields Query Command"
> >
> > [1]
> > https://lists.oasis-open.org/archives/virtio-comment/202310/msg00195.h
> > tml
> > >
> > > > +\begin{lstlisting}
> > > > +struct virtio_admin_cmd_dev_ctx_rd_len {
> > > > +        le32 context_len;
> > > > +};
> > > > +
> > > > +struct virtio_admin_cmd_dev_ctx_rd_result {
> > > > +        u8 data[];
> > > > +};
> > > > +\end{lstlisting}
> > >
> > > so callers needs to pin whatever device tells it to?
> > >
> > How much memory to pin is driver choice, it can pin 4K and keep reading 1MB
> memory by 1MB/4K size calls.
> > Or it can pin 1MB and read full.
> >
> > > admin commands support truncation intentionally.
> > >
> > > it is not clear, to me, that it's ok to have device just save as
> > > much state as it wants to.
> > Why would device store some unreasonable amount of state?
> 
> Because I know how hardware vendor minds work: oh we can request a bunch
> of memory, let's do that just in case ;)

For a device that has large memory, it is unlikely it can store it in second copy.

> 
> > >
> > > > +
> > > > +When the command completes successfully,
> > > > +\field{command_specific_result} is in the format \field{struct
> > > > +virtio_admin_cmd_dev_ctx_rd_result}
> > > > +returned by the device containing the device context data and
> > > > +\field{command_specific_output} is in format of \field{struct
> > > > +virtio_admin_cmd_dev_ctx_rd_len} containing length of context
> > > > +data returned by the device in the command response. When the
> > > > +length returned is zero or when the returned context data is less
> > > > +the data requested by the driver, the device do not have any
> > > > +device context data left that the device can report, at this
> > > > +point the device context stream
> > > ends.
> > > > +
> > > > +The driver can read the whole device context data using one or
> > > > +multiple commands. When the device context does not fit in the
> > > > +\field{command_specific_result}, driver reads the subsequent
> > > > +remaining bytes using one or more subsequent commands.
> > >
> > > how?
> > For example, driver requested 100 bytes.
> > a. Device returned 40 bytes. So the whole context fit, nothing more to be
> done.
> > Or
> > b. Device returned 100 bytes, so driver does not know if there is more or not.
> > Hence, driver request again for additional 100 bytes in 2nd call.
> > If there is nothing, device returns success with zero bytes.
> > If the device has 30 bytes more it returns 30 bytes. The returned bytes <
> requested bytes hence the read stream ends.
> 
> do you then expect device to maintain all data in some internal buffer, then
> keep an offset internally and each following command returns a different
> chunk?  all this sounds rather fragile, not to mention wasteful wrt using
> memory for that internal buffer.

For small device in range of few hundred KB that changes the context often should be fine this way as this is only transient memory.
For device with large memory, an internal buffer may not work (depends on device implementation), it can fetch from the running hw context and return without another copy.


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]