[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [PATCH v1 7/8] admin: Add write recording commands
On Sun, Oct 08, 2023 at 02:25:54PM +0300, Parav Pandit wrote: > When migrating a virtual machine with passthrough > virtio devices, the virtio device may write into the guest > memory. Some systems may not be able to keep track of these > pages efficiently. > > To facilitate such a system, a device provides the record > of pages which are written by the device. In one use case, this > commands connect to the vfio framework at [1]. > > The owner driver configures the member device for list of address > ranges for which it expects write recording and reporting by the device. > > The owner driver periodically queries the written pages address record > which gets cleared from the device upon reading it. > > When the write records reduces over the time, at one point write recording > is stopped after the device mode is set to FREEZE. > > [1] https://elixir.bootlin.com/linux/v6.4-rc1/source/include/uapi/linux/vfio.h#L1207 > > Fixes: https://github.com/oasis-tcs/virtio-spec/issues/176 > Signed-off-by: Parav Pandit <parav@nvidia.com> > Signed-off-by: Satananda Burla <sburla@marvell.com> > --- > admin-cmds-device-migration.tex | 146 ++++++++++++++++++++++++++++++-- > admin.tex | 10 ++- > 2 files changed, 146 insertions(+), 10 deletions(-) > > diff --git a/admin-cmds-device-migration.tex b/admin-cmds-device-migration.tex > index e98d552..49835eb 100644 > --- a/admin-cmds-device-migration.tex > +++ b/admin-cmds-device-migration.tex > @@ -97,15 +97,16 @@ \subsubsection{Device Migration}\label{sec:Basic Facilities of a Virtio Device / > During the device migration flow, a passthrough device may write data to the > guest virtual machine memory, a source hypervisor needs to keep track of these > written memory to migrate such memory to destination hypervisor. > -Some systems may not be able to keep track of such memory write addresses at > -hypervisor level. In such a scenario, a device records and reports these > -written memory addresses to the owner device. Such an address is named as > -IO virtual address (IOVA). The owner driver enables write recording for one or > -more IOVA ranges per device during device migration flow. The owner driver > -periodically queries these written IOVA records from the device. As the driver > -reads the written IOVA records, the device clears those records from the device. > -Once the device reports zero or small number of written IOVA records, the device > -mode is set to \field{Stop} or \field{Freeze}. Once the device is set to \field{Stop} > +Some systems may not be able to keep track of such > +memory writes at addresses at hypervisor level. In such a scenario, a device > +records and reports these written memory addresses to the owner device. what does it mean to record them? > Such an > +address is named as IO virtual address (IOVA). I don't know what does this have to do with IOVA. For that matter everything would have to be "IOVA". Spec calls these physical address and let's stick to that. > The owner driver enables write > +recording for one or more IOVA ranges per device during device migration > +flow. The owner driver periodically queries these written IOVA records from > +the device. periodical reads without any indication are the only option then? > As the driver reads the written IOVA records, > +the device clears those records from the device. Once the device reports > +zero or small number of written IOVA records, the device is set to > +\field{Stop} or \field{Freeze} mode. Once the device is set to \field{Stop} > or \field{Freeze} mode, and once all the IOVA records are read, the driver stops > the write recording in the device. it is not great that you are rewriting text you just wrote in patch 1 here. pls find a way not to make reviewers read everything twice. > @@ -118,6 +119,10 @@ \subsubsection{Device Migration}\label{sec:Basic Facilities of a Virtio Device / > \item Device Context Read Command > \item Device Context Write Command > \item Device Context Discard Command > +\item Device Write Record Capabilities Query Command > +\item Device Write Records Start Command > +\item Device Write Records Stop Command > +\item Device Write Records Read Command > \end{enumerate} > > These commands are currently only defined for the SR-IOV group type. > @@ -307,6 +312,129 @@ \subsubsection{Device Migration}\label{sec:Basic Facilities of a Virtio Device / > discarded, subsequent VIRTIO_ADMIN_CMD_DEV_CTX_WRITE command writes a new device > context. > > +\paragraph{Device Write Record Capabilities Query Command} > +\label{par:Basic Facilities of a Virtio Device / Device groups / Group administration commands / Device Migration / Device Write Record Capabilities Query Command} > + > +This command reads the device write record capabilities. > +For the command VIRTIO_ADMIN_CMD_DEV_WRITE_RECORD_CAP_QUERY, \field{opcode} > +is set to 0xd. > +The \field{group_member_id} refers to the member device to be accessed. > + > +\begin{lstlisting} > +struct virtio_admin_cmd_dev_write_record_cap_result { > + le32 supported_iova_page_size_bitmap; > + le32 supported_iova_ranges; > +}; > +\end{lstlisting} > + > +When the command completes successfully, \field{command_specific_result} > +is in the format \field{struct virtio_admin_cmd_dev_write_record_cap_result} > +returned by the device. The \field{supported_iova_page_size_bitmap} indicates > +the granularity at which the device can record IOVA ranges. the minimum > +granularity can be 4KB. Bit 0 corresponds to 4KB, bit 1 corresponds to 8KB, bit 31 > +corresponds to 4TB. The device supports at least one page granularity. > +The device support one or more IOVA page granularity; for each IOVA page > +granularity, the device sets corresponding bit in the > +\field{supported_iova_page_size_bitmap}. The \field{supported_iova_ranges} > +indicates how many unique (non overlapping) IOVA ranges can be recorded by > +the device. what role does this granularity play? i see no mention of it down the road. > + > +\paragraph{Device Write Records Start Command} > +\label{par:Basic Facilities of a Virtio Device / Device groups / Group administration commands / Device Migration / Device Write Records Start Command} > + > +This command starts the write recording in the device for the specified IOVA > +ranges. > + > +For the command VIRTIO_ADMIN_CMD_DEV_WRITE_RECORDS_START, \field{opcode} > +is set to 0xe. > +The \field{group_member_id} refers to the member device to be accessed. > + > +The \field{command_specific_data} is in the format > +\field{struct virtio_admin_cmd_write_record_start_data}. > + > +\begin{lstlisting} > +struct virtio_admin_cmd_write_record_start_entry { > + le64 iova; > + le64 page_count; > +}; > + > +struct virtio_admin_cmd_write_record_start_data { > + le64 page_size; > + le32 count; > + u8 reserved[4]; > + struct virtio_admin_cmd_write_record_start_entry entries[]; > +}; > + > +\end{lstlisting} > + > +The \field{count} is set to indicate number of valid \field{entries}. > +The \field{iova} indicates the start IOVA address. The \field{page_count} > +indicates number of pages of size \field{page_size} starting from \field{iova} > +to record for write reporting. VIRTIO_ADMIN_CMD_DEV_WRITE_RECORDS_START > +command contains unique i.e. non overlapping IOVA range entries. > +Whenever a memory write occurs by the device in the supplied IOVA range, the > +device records the actual IOVA and number of bytes written to the IOVA. > +These write records can be read by the > +the driver using VIRTIO_ADMIN_CMD_DEV_WRITE_RECORDS_READ command. > + > +This command has no command specific result. > + > +\paragraph{Device Write Record Stop Command} > +\label{par:Basic Facilities of a Virtio Device / Device groups / Group administration commands / Device Migration / Device Write Record Stop Command} > + > +This command stops the write recording in the device for IOVA ranges > +which were previously started using VIRTIO_ADMIN_CMD_DEV_WRITE_RECORDS_START > +command. > + > +For the command VIRTIO_ADMIN_CMD_DEV_WRITE_RECORDS_STOP, \field{opcode} > +is set to 0xf. > +The \field{group_member_id} refers to the member device to be accessed. > + > +This command does not have any command specific data. > +This command has no command specific result. > + > +\paragraph{Device Write Records Read Command} > +\label{par:Basic Facilities of a Virtio Device / Device groups / Group administration commands / Device Migration / Device Write Records Read Command} > + > +This command reads the device write records for which the write recording is > +previously started using VIRTIO_ADMIN_CMD_DEV_WRITE_RECORDS_START command. > + > +For the command VIRTIO_ADMIN_CMD_DEV_WRITE_RECORDS_READ, \field{opcode} > +is set to 0x10. > +The \field{group_member_id} refers to the member device to be accessed. > + > +\begin{lstlisting} > +struct virtio_admin_cmd_write_records_read_data { > + le64 iova; > + le64 length; > +}; > + > +struct virtio_admin_cmd_dev_write_records_cnt { > + le32 count; > +}; > + > +struct virtio_admin_cmd_dev_write_records_result { > + le64 iova_entries[]; > +}; > +\end{lstlisting} > + > +The \field{command_specific_data} is in the format > +\field{struct virtio_admin_cmd_write_records_read_data}. The driver > +sets the \field {iova} indicating the start IOVA address for up to the > +\field{length} number of bytes. The supplied IOVA range same or smaller > +than the range supplied when write recording is started by the driver > +in VIRTIO_ADMIN_CMD_DEV_WRITE_RECORDS_START command. Seems pretty sparse. Lots of hypervisors chose to implement a bit per page strategy. > + > +When the command completes successfully, \field{command_specific_result} > +is in the format \field{struct virtio_admin_cmd_dev_write_records_result} > +and \field{command_specific_result} is in format of > +\field{struct virtio_admin_cmd_dev_write_records_cnt} containing number > +of write records returned by the device. what are these records though? > When the command completes > +successfully, the write records which are returned in the result are > +cleared from the device and same records cannot be read again. When new > +writes occur at same IOVA range or at different once, those records can be read > +as new write records. this last sentence just confuses. > + > \devicenormative{\paragraph}{Device Migration}{Basic Facilities of a Virtio Device / Device groups / Group administration commands / Device Migration} > > A device MUST either support all of, or none of > diff --git a/admin.tex b/admin.tex > index 3429c4e..cffd85e 100644 > --- a/admin.tex > +++ b/admin.tex > @@ -138,7 +138,15 @@ \subsection{Group administration commands}\label{sec:Basic Facilities of a Virti > \hline > 0x000c & VIRTIO_ADMIN_CMD_DEV_CTX_DISCARD & Clear the device context data \\ > \hline > -0x000d - 0x7FFF & - & Commands using \field{struct virtio_admin_cmd} \\ > +0x000d & VIRTIO_ADMIN_CMD_DEV_WRITE_RECORD_CAP_QUERY & Query Write recording capabilities \\ > +\hline > +0x000e & VIRTIO_ADMIN_CMD_DEV_WRITE_RECORDS_START & Start Write recording in the device \\ > +\hline > +0x000f & VIRTIO_ADMIN_CMD_DEV_WRITE_RECORDS_STOP & Stop all write recording in the device \\ > +\hline > +0x0010 & VIRTIO_ADMIN_CMD_DEV_WRITE_RECORDS_READ & Read and clear write records from the device \\ > +\hline > +0x0011 - 0x7FFF & - & Commands using \field{struct virtio_admin_cmd} \\ > \hline > 0x8000 - 0xFFFF & - & Reserved for future commands (possibly using a different structure) \\ > \hline > -- > 2.34.1
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]