OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: RE: [PATCH v1 7/8] admin: Add write recording commands



> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Sunday, October 8, 2023 5:23 PM
> 
> On Sun, Oct 08, 2023 at 02:25:54PM +0300, Parav Pandit wrote:
> > When migrating a virtual machine with passthrough virtio devices, the
> > virtio device may write into the guest memory. Some systems may not be
> > able to keep track of these pages efficiently.
> >
> > To facilitate such a system, a device provides the record of pages
> > which are written by the device. In one use case, this commands
> > connect to the vfio framework at [1].
> >
> > The owner driver configures the member device for list of address
> > ranges for which it expects write recording and reporting by the device.
> >
> > The owner driver periodically queries the written pages address record
> > which gets cleared from the device upon reading it.
> >
> > When the write records reduces over the time, at one point write
> > recording is stopped after the device mode is set to FREEZE.
> >
> > [1]
> > https://elixir.bootlin.com/linux/v6.4-rc1/source/include/uapi/linux/vf
> > io.h#L1207
> >
> > Fixes: https://github.com/oasis-tcs/virtio-spec/issues/176
> > Signed-off-by: Parav Pandit <parav@nvidia.com>
> > Signed-off-by: Satananda Burla <sburla@marvell.com>
> > ---
> >  admin-cmds-device-migration.tex | 146
> ++++++++++++++++++++++++++++++--
> >  admin.tex                       |  10 ++-
> >  2 files changed, 146 insertions(+), 10 deletions(-)
> >
> > diff --git a/admin-cmds-device-migration.tex
> > b/admin-cmds-device-migration.tex index e98d552..49835eb 100644
> > --- a/admin-cmds-device-migration.tex
> > +++ b/admin-cmds-device-migration.tex
> > @@ -97,15 +97,16 @@ \subsubsection{Device Migration}\label{sec:Basic
> > Facilities of a Virtio Device /  During the device migration flow, a
> > passthrough device may write data to the  guest virtual machine
> > memory, a source hypervisor needs to keep track of these  written memory to
> migrate such memory to destination hypervisor.
> > -Some systems may not be able to keep track of such memory write
> > addresses at -hypervisor level. In such a scenario, a device records
> > and reports these -written memory addresses to the owner device. Such
> > an address is named as -IO virtual address (IOVA). The owner driver
> > enables write recording for one or -more IOVA ranges per device during
> > device migration flow. The owner driver -periodically queries these
> > written IOVA records from the device. As the driver -reads the written IOVA
> records, the device clears those records from the device.
> > -Once the device reports zero or small number of written IOVA records,
> > the device -mode is set to \field{Stop} or \field{Freeze}. Once the
> > device is set to \field{Stop}
> > +Some systems may not be able to keep track of such memory writes at
> > +addresses at hypervisor level. In such a scenario, a device records
> > +and reports these written memory addresses to the owner device.
> 
> 
> what does it mean to record them?
>
May be we can use a different verb than record. May be tracking?
Record means, device accumulates these written pages addresses, and when driver queries, it returns these addresses and clears it internally within the device.
 
> > Such an
> > +address is named as IO virtual address (IOVA).
> 
> I don't know what does this have to do with IOVA. For that matter everything
> would have to be "IOVA". Spec calls these physical address and let's stick to
> that.
> 
Make sense.
At device level it does not have knowledge of IOVA.
I will rename it.

> 
> > The owner driver enables write
> > +recording for one or more IOVA ranges per device during device
> > +migration flow. The owner driver periodically queries these written
> > +IOVA records from the device.
> 
> periodical reads without any indication are the only option then?
>
At least for now, this is starting point. Software stack such as QEMU does it periodically.
When new use case arise, may be it can be extended.
 
> > As the driver reads the written IOVA records,
> > +the device clears those records from the device. Once the device
> > +reports zero or small number of written IOVA records, the device is
> > +set to \field{Stop} or \field{Freeze} mode. Once the device is set to
> > +\field{Stop}
> >  or \field{Freeze} mode, and once all the IOVA records are read, the
> > driver stops  the write recording in the device.
> 
> 
> it is not great that you are rewriting text you just wrote in patch 1 here. pls find
> a way not to make reviewers read everything twice.
> 
There is small duplication of one line explaining mode change, rest is contextual to the write recording.
Merging with text of patch_1 was slightly complicated to read, so one sentence is duplicated.

I will again check if patch_1 text extension is easier to read.

> > @@ -118,6 +119,10 @@ \subsubsection{Device Migration}\label{sec:Basic
> > Facilities of a Virtio Device /  \item Device Context Read Command
> > \item Device Context Write Command  \item Device Context Discard
> > Command
> > +\item Device Write Record Capabilities Query Command \item Device
> > +Write Records Start Command \item Device Write Records Stop Command
> > +\item Device Write Records Read Command
> >  \end{enumerate}
> >
> >  These commands are currently only defined for the SR-IOV group type.
> > @@ -307,6 +312,129 @@ \subsubsection{Device Migration}\label{sec:Basic
> > Facilities of a Virtio Device /  discarded, subsequent
> > VIRTIO_ADMIN_CMD_DEV_CTX_WRITE command writes a new device
> context.
> >
> > +\paragraph{Device Write Record Capabilities Query Command}
> > +\label{par:Basic Facilities of a Virtio Device / Device groups /
> > +Group administration commands / Device Migration / Device Write
> > +Record Capabilities Query Command}
> > +
> > +This command reads the device write record capabilities.
> > +For the command VIRTIO_ADMIN_CMD_DEV_WRITE_RECORD_CAP_QUERY,
> > +\field{opcode} is set to 0xd.
> > +The \field{group_member_id} refers to the member device to be accessed.
> > +
> > +\begin{lstlisting}
> > +struct virtio_admin_cmd_dev_write_record_cap_result {
> > +        le32 supported_iova_page_size_bitmap;
> > +        le32 supported_iova_ranges;
> > +};
> > +\end{lstlisting}
> > +
> > +When the command completes successfully,
> > +\field{command_specific_result} is in the format \field{struct
> > +virtio_admin_cmd_dev_write_record_cap_result}
> > +returned by the device. The \field{supported_iova_page_size_bitmap}
> > +indicates the granularity at which the device can record IOVA ranges.
> > +the minimum granularity can be 4KB. Bit 0 corresponds to 4KB, bit 1
> > +corresponds to 8KB, bit 31 corresponds to 4TB. The device supports at least
> one page granularity.
> > +The device support one or more IOVA page granularity; for each IOVA
> > +page granularity, the device sets corresponding bit in the
> > +\field{supported_iova_page_size_bitmap}. The
> > +\field{supported_iova_ranges} indicates how many unique (non
> > +overlapping) IOVA ranges can be recorded by the device.
> 
> what role does this granularity play? i see no mention of it down the road.
> 
The page_size in struct virtio_admin_cmd_write_record_start_data must match to the granularity supplied above.
I missed it. Will add in v2.
This is very useful comment.

> 
> > +
> > +\paragraph{Device Write Records Start Command} \label{par:Basic
> > +Facilities of a Virtio Device / Device groups / Group administration
> > +commands / Device Migration / Device Write Records Start Command}
> > +
> > +This command starts the write recording in the device for the
> > +specified IOVA ranges.
> > +
> > +For the command VIRTIO_ADMIN_CMD_DEV_WRITE_RECORDS_START,
> > +\field{opcode} is set to 0xe.
> > +The \field{group_member_id} refers to the member device to be accessed.
> > +
> > +The \field{command_specific_data} is in the format \field{struct
> > +virtio_admin_cmd_write_record_start_data}.
> > +
> > +\begin{lstlisting}
> > +struct virtio_admin_cmd_write_record_start_entry {
> > +        le64 iova;
> > +        le64 page_count;
> > +};
> > +
> > +struct virtio_admin_cmd_write_record_start_data {
> > +        le64 page_size;
> > +        le32 count;
> > +        u8 reserved[4];
> > +        struct virtio_admin_cmd_write_record_start_entry entries[];
> > +};
> > +
> > +\end{lstlisting}
> > +
> > +The \field{count} is set to indicate number of valid \field{entries}.
> > +The \field{iova} indicates the start IOVA address. The
> > +\field{page_count} indicates number of pages of size
> > +\field{page_size} starting from \field{iova} to record for write
> > +reporting. VIRTIO_ADMIN_CMD_DEV_WRITE_RECORDS_START
> > +command contains unique i.e. non overlapping IOVA range entries.
> > +Whenever a memory write occurs by the device in the supplied IOVA
> > +range, the device records the actual IOVA and number of bytes written to the
> IOVA.
> > +These write records can be read by the the driver using
> > +VIRTIO_ADMIN_CMD_DEV_WRITE_RECORDS_READ command.
> > +
> > +This command has no command specific result.
> > +
> > +\paragraph{Device Write Record Stop Command} \label{par:Basic
> > +Facilities of a Virtio Device / Device groups / Group administration
> > +commands / Device Migration / Device Write Record Stop Command}
> > +
> > +This command stops the write recording in the device for IOVA ranges
> > +which were previously started using
> > +VIRTIO_ADMIN_CMD_DEV_WRITE_RECORDS_START
> > +command.
> > +
> > +For the command VIRTIO_ADMIN_CMD_DEV_WRITE_RECORDS_STOP,
> > +\field{opcode} is set to 0xf.
> > +The \field{group_member_id} refers to the member device to be accessed.
> > +
> > +This command does not have any command specific data.
> > +This command has no command specific result.
> > +
> > +\paragraph{Device Write Records Read Command} \label{par:Basic
> > +Facilities of a Virtio Device / Device groups / Group administration
> > +commands / Device Migration / Device Write Records Read Command}
> > +
> > +This command reads the device write records for which the write
> > +recording is previously started using
> VIRTIO_ADMIN_CMD_DEV_WRITE_RECORDS_START command.
> > +
> > +For the command VIRTIO_ADMIN_CMD_DEV_WRITE_RECORDS_READ,
> > +\field{opcode} is set to 0x10.
> > +The \field{group_member_id} refers to the member device to be accessed.
> > +
> > +\begin{lstlisting}
> > +struct virtio_admin_cmd_write_records_read_data {
> > +        le64 iova;
> > +        le64 length;
> > +};
> > +
> > +struct virtio_admin_cmd_dev_write_records_cnt {
> > +        le32 count;
> > +};
> > +
> > +struct virtio_admin_cmd_dev_write_records_result {
> > +        le64 iova_entries[];
> > +};
> > +\end{lstlisting}
> > +
> > +The \field{command_specific_data} is in the format \field{struct
> > +virtio_admin_cmd_write_records_read_data}. The driver sets the \field
> > +{iova} indicating the start IOVA address for up to the \field{length}
> > +number of bytes. The supplied IOVA range same or smaller than the
> > +range supplied when write recording is started by the driver in
> > +VIRTIO_ADMIN_CMD_DEV_WRITE_RECORDS_START command.
> 
> Seems pretty sparse. Lots of hypervisors chose to implement a bit per page
> strategy.
This command result addresses will feed into such bit.

> 
> > +
> > +When the command completes successfully,
> > +\field{command_specific_result} is in the format \field{struct
> > +virtio_admin_cmd_dev_write_records_result}
> > +and \field{command_specific_result} is in format of \field{struct
> > +virtio_admin_cmd_dev_write_records_cnt} containing number of write
> > +records returned by the device.
> 
> what are these records though?
> 
It is struct virtio_admin_cmd_dev_write_records_result.
I will rephrase it to link to struct virtio_admin_cmd_dev_write_records_result.

> 
> > When the command completes
> > +successfully, the write records which are returned in the result are
> > +cleared from the device and same records cannot be read again. When
> > +new writes occur at same IOVA range or at different once, those
> > +records can be read as new write records.
> 
> 
> this last sentence just confuses.
> 
How about just keeping below text rewrite?
When the command completes successfully, the write records returned in the result are cleared from the device.


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]