OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [virtio-comment] Re: [PATCH RFC v7 2/8] Introduce group administration commands


On Sat, Aug 20, 2022 at 7:41 AM Max Gurtovoy <mgurtovoy@nvidia.com> wrote:
>
>
> On 8/19/2022 7:37 AM, Jason Wang wrote:
> > On Fri, Aug 19, 2022 at 11:28 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >> On Fri, Aug 19, 2022 at 08:26:50AM +0800, Jason Wang wrote:
> >>> On Thu, Aug 18, 2022 at 4:51 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >>>> On Thu, Aug 18, 2022 at 04:46:45PM +0800, Jason Wang wrote:
> >>>>> On Sat, Aug 13, 2022 at 1:19 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >>>>>> From: Max Gurtovoy <mgurtovoy@nvidia.com>
> >>>>>>
> >>>>>> These commands are used for essential administrative and management
> >>>>>> operations.
> >>>>>>
> >>>>>> Following patches will introduce an interface for submitting these
> >>>>>> commands to the device.
> >>>>>>
> >>>>>> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
> >>>>>> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> >>>>>> ---
> >>>>>>   admin.tex | 99 ++++++++++++++++++++++++++++++++++++++++++++++++++++---
> >>>>>>   1 file changed, 94 insertions(+), 5 deletions(-)
> >>>>>>
> >>>>>> diff --git a/admin.tex b/admin.tex
> >>>>>> index 6e9434a..4840dd4 100644
> >>>>>> --- a/admin.tex
> >>>>>> +++ b/admin.tex
> >>>>>> @@ -9,8 +9,10 @@ \section{Device groups}\label{sec:Basic Facilities of a Virtio Device / Device g
> >>>>>>   \item[Parent device]
> >>>>>>           or parent, the device controlling the group.
> >>>>>>   \item[Member device]
> >>>>>> -        a device within a group. Parent device itself may, or may
> >>>>>> -       not be a member of the group.
> >>>>>> +        a device within a group. Parent device itself is not
> >>>>>> +       a member of the group. In the future it is envisoned that
> >>>>>> +       new group types may be introduced where the parent
> >>>>>> +       device is a member of the group.
> >>>>> This part should go with patch 1?
> >>>>>
> >>>>>>   \item[Member identifier]
> >>>>>>           each member has this indentifier, unique within the group
> >>>>>>          and used to address it through the parent device.
> >>>>>> @@ -20,9 +22,10 @@ \section{Device groups}\label{sec:Basic Facilities of a Virtio Device / Device g
> >>>>>>          and what kind of control does the parent have.
> >>>>>>   \item[Group identifier]
> >>>>>>          each group has this identifier, unique within the parent device.
> >>>>>> -       this specifies the group type and possibly selects the
> >>>>>> -       group if multiple groups of the same type can be controlled by the same
> >>>>>> -       parent device.
> >>>>>> +       This specifies the group type. In the future it is
> >>>>>> +       envisoned that new group types will be introduced where the group
> >>>>>> +       identifier also selects the group if multiple groups of the same
> >>>>>> +       type can be controlled by the same parent device.
> >>>>>>   \end{description}
> >>>>>>
> >>>>>>   A single group type is currently specified:
> >>>>>> @@ -46,4 +49,90 @@ \section{Device groups}\label{sec:Basic Facilities of a Virtio Device / Device g
> >>>>>>   PCI transport (see \ref{sec:Virtio Transport Options / Virtio Over PCI Bus}).
> >>>>>>   \end{description}
> >>>>>>
> >>>>>> +\subsection{Group administration commands}\label{sec:Basic Facilities of a Virtio Device / Group administration commands}
> >>>>>> +
> >>>>>> +Group administration commands can be issued through a parent
> >>>>>> +device to control member devices of a group.  This mechanism can
> >>>>>> +be used, for example, to configure a member device before it is
> >>>>>> +initialized by its driver.
> >>>>>> +
> >>>>>> +All the group administration commands are of the following form:
> >>>>>> +
> >>>>>> +\begin{lstlisting}
> >>>>>> +struct virtio_admin_cmd {
> >>>>>> +        /* Device-readable part */
> >>>>>> +        le16 opcode;
> >>>>>> +        /*
> >>>>>> +         * 1 - SR-IOV
> >>>>>> +         * 2 - 65535 reserved
> >>>>>> +         */
> >>>>>> +        le16 group_id;
> >>>>> I wonder about the implication of this field. Does it mean:
> >>>>>
> >>>>> 1) a single admin virtqueue that manage two group of VFs
> >>>>>
> >>>>> or
> >>>>>
> >>>>> 2) a single admin virtqueue that manages both VFs and SFs (then
> >>>>> 'group_type' should be better?)
> >>>>>
> >>>>> If it's 1, the group id seems not sufficient. If it's 2, I wonder if
> >>>>> it scales (using a single admin virtqueue to manage both VF and SF).
> >>>>>
> >>>>> Thanks
> >>>> 2 at the moment.  Not sure I understand why wouldn't it scale.
> >>> Scaling as the number of member devices increases (e.g 1K or 10K). Or
> >>> it's too early to care about those?
> >> member devices have vqs, so up to 2^15 of these and
> >> ring size happens to be up to 32K too. Seems to match!
> >>
> >>> Do we gain better latency if the implementation limits the number of
> >>> member devices per admin virtqueue?
> >> Oh if we see that it's a problem we could have multiple admin vqs just
> >> for performance, and allow guest to send commands on any of these.
> >> Hmm I am not sure whether we should worry, but I think
> >> it might become an actual issue when we have LM since that
> >> might want to send faults from device to host.
> >>
> >> So depends on how painful it's going to make things for us. Let me ask
> >> you, do you have an idea about how we'll expose the multiple admin vqs
> >> without too much pain?
> > Haven't thought about this. Just want to see if it's an actual issue.
>
> In LM you usually migrate 1 VM at a time.

This only works if we can find a way to mandate it in the spec (which
I'm not sure it's possible).

> You don't migrate 1000 VMs at
> once.

Probably, but we may have other transactions in the meantime. They
need to compete with a single queue? Or maybe migration should go with
a dedicated virtqueue.

>
> AQ can have a depth of 32-128 and I don't see any issue with having 10k
> devices.

It depends on the API, e.g what does the dirty page tracking looks
like? Michael proposes a device page fault with partial order, if it's
the API can we end up with more than one device #PF?

Thanks

>
> It's a queue for admin/mgmt/control path and having multiple of those
> will just complicate the solution.
>
> >
> >> Should we have a #admin vqs register? And then after regular vqs
> >> immediately come the admin vqs?
> > Probably, but then it looks better to have a dedicated capability.
> >
> >>   And a follow up question would be
> >> how it will work later, if we later want another type of vq
> >> (and I think LM might want one, for faults)?
> > Another capability?
> >
> > Thanks
> >
> >>>> I don't
> >>>> see a fundamental difference between 1 or 2 VQs. We are not going to be
> >>>> adding 1000s of them,
> >>> Yes, but not sure if it will cause delay for e.g live migration
> >>> downtime in the future.
> >>>
> >>> Thanks
> >> yea, maybe we'll add several of these guys.
> >>
> >>>>>> +        /* unused, reserved for future extensions */
> >>>>>> +        u8 reserved1[12];
> >>>>>> +        le64 group_member_id;
> >>>>>> +        u8 command_specific_data[];
> >>>>>> +
> >>>>>> +        /* Device-writable part */
> >>>>>> +        u8 status;
> >>>>>> +        u8 command_specific_error;
> >>>>>> +        /* unused, reserved for future extensions */
> >>>>>> +        u8 reserved2[6];
> >>>>>> +        u8 command_specific_result[];
> >>>>>> +};
> >>>>>> +\end{lstlisting}
> >>>>>> +
> >>>>>> +For all commands, \field{opcode}, \field{group_id} and if
> >>>>>> +necessary \field{group_member_id} and \field{command_specific_data} are
> >>>>>> +set by the driver, and the parent device sets \field{status} and if
> >>>>>> +needed \field{command_specific_error} and
> >>>>>> +\field{command_specific_result}.
> >>>>>> +
> >>>>>> +As a rule, any unused device-readable fields are set to zero by the driver
> >>>>>> +and ignored by the device.  Any unused device-writeable fields are set to zero
> >>>>>> +by the device and ignored by the driver.
> >>>>>> +
> >>>>>> +\field{opcode} specifies the command. The valid
> >>>>>> +values for \field{opcode} can be found in the following table:
> >>>>>> +
> >>>>>> +\begin{tabular}{|l|l|}
> >>>>>> +\hline
> >>>>>> +opcode & Command Description \\
> >>>>>> +\hline \hline
> >>>>>> +0000h - 7FFFh   & Group administration commands    \\
> >>>>>> +\hline
> >>>>>> +8000h - FFFFh   & Reserved    \\
> >>>>>> +\hline
> >>>>>> +\end{tabular}
> >>>>>> +
> >>>>>> +The \field{group_id} specifies the device group.
> >>>>>> +The \field{group_member_id} specifies the member device within the group.
> >>>>>> +See section \ref{sec:Introduction / Terminology / Device group}
> >>>>>> +for the definition of the group identifier and group member
> >>>>>> +identifier.
> >>>>>> +
> >>>>>> +The following table describes possible \field{status} values:
> >>>>>> +
> >>>>>> +\begin{tabular}{|l|l|l|}
> >>>>>> +\hline
> >>>>>> +Status & Name & Description \\
> >>>>>> +\hline \hline
> >>>>>> +00h   & VIRTIO_ADMIN_STATUS_OK    & successful completion  \\
> >>>>>> +\hline
> >>>>>> +01h   & VIRTIO_ADMIN_STATUS_CS_ERR    & command specific error  \\
> >>>>>> +\hline
> >>>>>> +02h   & VIRTIO_ADMIN_STATUS_OPCODE_UNSUPPORTED    & unsupported or invalid \field{opcode}  \\
> >>>>>> +\hline
> >>>>>> +03h   & VIRTIO_ADMIN_STATUS_INVALID_FIELD    & invalid field within command specific data  \\
> >>>>>> +\hline
> >>>>>> +04h   & VIRTIO_ADMIN_STATUS_INVALID_GROUP    & invalid group identifier was set  \\
> >>>>>> +\hline
> >>>>>> +05h   & VIRTIO_ADMIN_STATUS_INVALID_MEM    & invalid group member identifier was set  \\
> >>>>>> +\hline
> >>>>>> +\end{tabular}
> >>>>>> +
> >>>>>> +The \field{command_specific_error} should be inspected by the driver only if \field{status} is set to
> >>>>>> +VIRTIO_ADMIN_STATUS_CS_ERR by the device. In this case, the
> >>>>>> +contents of \field{command_specific_error}
> >>>>>> +holds the command specific error. If \field{status} is not set to VIRTIO_ADMIN_STATUS_CS_ERR, the
> >>>>>> +\field{command_specific_error} value is undefined and should be ignored by the driver.
> >>>>>>
> >>>>>> --
> >>>>>> MST
> >>>>>>
>
> This publicly archived list offers a means to provide input to the
> OASIS Virtual I/O Device (VIRTIO) TC.
>
> In order to verify user consent to the Feedback License terms and
> to minimize spam in the list archive, subscription is required
> before posting.
>
> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> List help: virtio-comment-help@lists.oasis-open.org
> List archive: https://lists.oasis-open.org/archives/virtio-comment/
> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/virtio/
> Join OASIS: https://www.oasis-open.org/join/
>



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]