OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [PATCH v1 7/8] admin: Add write recording commands


On Mon, Oct 09, 2023 at 04:14:22AM +0000, Parav Pandit wrote:
> 
> 
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Sunday, October 8, 2023 5:23 PM
> > 
> > On Sun, Oct 08, 2023 at 02:25:54PM +0300, Parav Pandit wrote:
> > > When migrating a virtual machine with passthrough virtio devices, the
> > > virtio device may write into the guest memory. Some systems may not be
> > > able to keep track of these pages efficiently.
> > >
> > > To facilitate such a system, a device provides the record of pages
> > > which are written by the device. In one use case, this commands
> > > connect to the vfio framework at [1].
> > >
> > > The owner driver configures the member device for list of address
> > > ranges for which it expects write recording and reporting by the device.
> > >
> > > The owner driver periodically queries the written pages address record
> > > which gets cleared from the device upon reading it.
> > >
> > > When the write records reduces over the time, at one point write
> > > recording is stopped after the device mode is set to FREEZE.
> > >
> > > [1]
> > > https://elixir.bootlin.com/linux/v6.4-rc1/source/include/uapi/linux/vf
> > > io.h#L1207
> > >
> > > Fixes: https://github.com/oasis-tcs/virtio-spec/issues/176
> > > Signed-off-by: Parav Pandit <parav@nvidia.com>
> > > Signed-off-by: Satananda Burla <sburla@marvell.com>
> > > ---
> > >  admin-cmds-device-migration.tex | 146
> > ++++++++++++++++++++++++++++++--
> > >  admin.tex                       |  10 ++-
> > >  2 files changed, 146 insertions(+), 10 deletions(-)
> > >
> > > diff --git a/admin-cmds-device-migration.tex
> > > b/admin-cmds-device-migration.tex index e98d552..49835eb 100644
> > > --- a/admin-cmds-device-migration.tex
> > > +++ b/admin-cmds-device-migration.tex
> > > @@ -97,15 +97,16 @@ \subsubsection{Device Migration}\label{sec:Basic
> > > Facilities of a Virtio Device /  During the device migration flow, a
> > > passthrough device may write data to the  guest virtual machine
> > > memory, a source hypervisor needs to keep track of these  written memory to
> > migrate such memory to destination hypervisor.
> > > -Some systems may not be able to keep track of such memory write
> > > addresses at -hypervisor level. In such a scenario, a device records
> > > and reports these -written memory addresses to the owner device. Such
> > > an address is named as -IO virtual address (IOVA). The owner driver
> > > enables write recording for one or -more IOVA ranges per device during
> > > device migration flow. The owner driver -periodically queries these
> > > written IOVA records from the device. As the driver -reads the written IOVA
> > records, the device clears those records from the device.
> > > -Once the device reports zero or small number of written IOVA records,
> > > the device -mode is set to \field{Stop} or \field{Freeze}. Once the
> > > device is set to \field{Stop}
> > > +Some systems may not be able to keep track of such memory writes at
> > > +addresses at hypervisor level. In such a scenario, a device records
> > > +and reports these written memory addresses to the owner device.
> > 
> > 
> > what does it mean to record them?
> >
> May be we can use a different verb than record. May be tracking?
> Record means, device accumulates these written pages addresses, and when driver queries, it returns these addresses and clears it internally within the device.

For example, what about two writes into same address. Do you get one
record or two? What if the length is different?


> > > Such an
> > > +address is named as IO virtual address (IOVA).
> > 
> > I don't know what does this have to do with IOVA. For that matter everything
> > would have to be "IOVA". Spec calls these physical address and let's stick to
> > that.
> > 
> Make sense.
> At device level it does not have knowledge of IOVA.
> I will rename it.
> 
> > 
> > > The owner driver enables write
> > > +recording for one or more IOVA ranges per device during device
> > > +migration flow. The owner driver periodically queries these written
> > > +IOVA records from the device.
> > 
> > periodical reads without any indication are the only option then?
> >
> At least for now, this is starting point. Software stack such as QEMU does it periodically.

So for CPU kvm switched to PML and this seems to work better -
it guarantees there's convergence.

> When new use case arise, may be it can be extended.
>  
> > > As the driver reads the written IOVA records,
> > > +the device clears those records from the device. Once the device
> > > +reports zero or small number of written IOVA records, the device is
> > > +set to \field{Stop} or \field{Freeze} mode. Once the device is set to
> > > +\field{Stop}
> > >  or \field{Freeze} mode, and once all the IOVA records are read, the
> > > driver stops  the write recording in the device.
> > 
> > 
> > it is not great that you are rewriting text you just wrote in patch 1 here. pls find
> > a way not to make reviewers read everything twice.
> > 
> There is small duplication of one line explaining mode change, rest is contextual to the write recording.
> Merging with text of patch_1 was slightly complicated to read, so one sentence is duplicated.
> 
> I will again check if patch_1 text extension is easier to read.

this is latex, space is ignored. if you are only changing one word just
don't move the rest and diff will look sane

> > > @@ -118,6 +119,10 @@ \subsubsection{Device Migration}\label{sec:Basic
> > > Facilities of a Virtio Device /  \item Device Context Read Command
> > > \item Device Context Write Command  \item Device Context Discard
> > > Command
> > > +\item Device Write Record Capabilities Query Command \item Device
> > > +Write Records Start Command \item Device Write Records Stop Command
> > > +\item Device Write Records Read Command
> > >  \end{enumerate}
> > >
> > >  These commands are currently only defined for the SR-IOV group type.
> > > @@ -307,6 +312,129 @@ \subsubsection{Device Migration}\label{sec:Basic
> > > Facilities of a Virtio Device /  discarded, subsequent
> > > VIRTIO_ADMIN_CMD_DEV_CTX_WRITE command writes a new device
> > context.
> > >
> > > +\paragraph{Device Write Record Capabilities Query Command}
> > > +\label{par:Basic Facilities of a Virtio Device / Device groups /
> > > +Group administration commands / Device Migration / Device Write
> > > +Record Capabilities Query Command}
> > > +
> > > +This command reads the device write record capabilities.
> > > +For the command VIRTIO_ADMIN_CMD_DEV_WRITE_RECORD_CAP_QUERY,
> > > +\field{opcode} is set to 0xd.
> > > +The \field{group_member_id} refers to the member device to be accessed.
> > > +
> > > +\begin{lstlisting}
> > > +struct virtio_admin_cmd_dev_write_record_cap_result {
> > > +        le32 supported_iova_page_size_bitmap;
> > > +        le32 supported_iova_ranges;
> > > +};
> > > +\end{lstlisting}
> > > +
> > > +When the command completes successfully,
> > > +\field{command_specific_result} is in the format \field{struct
> > > +virtio_admin_cmd_dev_write_record_cap_result}
> > > +returned by the device. The \field{supported_iova_page_size_bitmap}
> > > +indicates the granularity at which the device can record IOVA ranges.
> > > +the minimum granularity can be 4KB. Bit 0 corresponds to 4KB, bit 1
> > > +corresponds to 8KB, bit 31 corresponds to 4TB. The device supports at least
> > one page granularity.
> > > +The device support one or more IOVA page granularity; for each IOVA
> > > +page granularity, the device sets corresponding bit in the
> > > +\field{supported_iova_page_size_bitmap}. The
> > > +\field{supported_iova_ranges} indicates how many unique (non
> > > +overlapping) IOVA ranges can be recorded by the device.
> > 
> > what role does this granularity play? i see no mention of it down the road.
> > 
> The page_size in struct virtio_admin_cmd_write_record_start_data must match to the granularity supplied above.
> I missed it. Will add in v2.
> This is very useful comment.

Not that it's very clear what does page_size do.

> > 
> > > +
> > > +\paragraph{Device Write Records Start Command} \label{par:Basic
> > > +Facilities of a Virtio Device / Device groups / Group administration
> > > +commands / Device Migration / Device Write Records Start Command}
> > > +
> > > +This command starts the write recording in the device for the
> > > +specified IOVA ranges.
> > > +
> > > +For the command VIRTIO_ADMIN_CMD_DEV_WRITE_RECORDS_START,
> > > +\field{opcode} is set to 0xe.
> > > +The \field{group_member_id} refers to the member device to be accessed.
> > > +
> > > +The \field{command_specific_data} is in the format \field{struct
> > > +virtio_admin_cmd_write_record_start_data}.
> > > +
> > > +\begin{lstlisting}
> > > +struct virtio_admin_cmd_write_record_start_entry {
> > > +        le64 iova;
> > > +        le64 page_count;
> > > +};
> > > +
> > > +struct virtio_admin_cmd_write_record_start_data {
> > > +        le64 page_size;
> > > +        le32 count;
> > > +        u8 reserved[4];
> > > +        struct virtio_admin_cmd_write_record_start_entry entries[];
> > > +};
> > > +
> > > +\end{lstlisting}
> > > +
> > > +The \field{count} is set to indicate number of valid \field{entries}.
> > > +The \field{iova} indicates the start IOVA address. The
> > > +\field{page_count} indicates number of pages of size
> > > +\field{page_size} starting from \field{iova} to record for write
> > > +reporting. VIRTIO_ADMIN_CMD_DEV_WRITE_RECORDS_START
> > > +command contains unique i.e. non overlapping IOVA range entries.
> > > +Whenever a memory write occurs by the device in the supplied IOVA
> > > +range, the device records the actual IOVA and number of bytes written to the
> > IOVA.
> > > +These write records can be read by the the driver using
> > > +VIRTIO_ADMIN_CMD_DEV_WRITE_RECORDS_READ command.
> > > +
> > > +This command has no command specific result.
> > > +
> > > +\paragraph{Device Write Record Stop Command} \label{par:Basic
> > > +Facilities of a Virtio Device / Device groups / Group administration
> > > +commands / Device Migration / Device Write Record Stop Command}
> > > +
> > > +This command stops the write recording in the device for IOVA ranges
> > > +which were previously started using
> > > +VIRTIO_ADMIN_CMD_DEV_WRITE_RECORDS_START
> > > +command.
> > > +
> > > +For the command VIRTIO_ADMIN_CMD_DEV_WRITE_RECORDS_STOP,
> > > +\field{opcode} is set to 0xf.
> > > +The \field{group_member_id} refers to the member device to be accessed.
> > > +
> > > +This command does not have any command specific data.
> > > +This command has no command specific result.
> > > +
> > > +\paragraph{Device Write Records Read Command} \label{par:Basic
> > > +Facilities of a Virtio Device / Device groups / Group administration
> > > +commands / Device Migration / Device Write Records Read Command}
> > > +
> > > +This command reads the device write records for which the write
> > > +recording is previously started using
> > VIRTIO_ADMIN_CMD_DEV_WRITE_RECORDS_START command.
> > > +
> > > +For the command VIRTIO_ADMIN_CMD_DEV_WRITE_RECORDS_READ,
> > > +\field{opcode} is set to 0x10.
> > > +The \field{group_member_id} refers to the member device to be accessed.
> > > +
> > > +\begin{lstlisting}
> > > +struct virtio_admin_cmd_write_records_read_data {
> > > +        le64 iova;
> > > +        le64 length;
> > > +};
> > > +
> > > +struct virtio_admin_cmd_dev_write_records_cnt {
> > > +        le32 count;
> > > +};
> > > +
> > > +struct virtio_admin_cmd_dev_write_records_result {
> > > +        le64 iova_entries[];
> > > +};
> > > +\end{lstlisting}
> > > +
> > > +The \field{command_specific_data} is in the format \field{struct
> > > +virtio_admin_cmd_write_records_read_data}. The driver sets the \field
> > > +{iova} indicating the start IOVA address for up to the \field{length}
> > > +number of bytes. The supplied IOVA range same or smaller than the
> > > +range supplied when write recording is started by the driver in
> > > +VIRTIO_ADMIN_CMD_DEV_WRITE_RECORDS_START command.
> > 
> > Seems pretty sparse. Lots of hypervisors chose to implement a bit per page
> > strategy.
> This command result addresses will feed into such bit.

Do we want to return it in this format then?

> > 
> > > +
> > > +When the command completes successfully,
> > > +\field{command_specific_result} is in the format \field{struct
> > > +virtio_admin_cmd_dev_write_records_result}
> > > +and \field{command_specific_result} is in format of \field{struct
> > > +virtio_admin_cmd_dev_write_records_cnt} containing number of write
> > > +records returned by the device.
> > 
> > what are these records though?
> > 
> It is struct virtio_admin_cmd_dev_write_records_result.
> I will rephrase it to link to struct virtio_admin_cmd_dev_write_records_result.
> 
> > 
> > > When the command completes
> > > +successfully, the write records which are returned in the result are
> > > +cleared from the device and same records cannot be read again. When
> > > +new writes occur at same IOVA range or at different once, those
> > > +records can be read as new write records.
> > 
> > 
> > this last sentence just confuses.
> > 
> How about just keeping below text rewrite?
> When the command completes successfully, the write records returned in the result are cleared from the device.

I think we need to explain in more detail what exactly is expected
to be recorded and when.

-- 
MST



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]