virtio-comment message

Subject: Re: [PATCH v1 1/2] transport-pci: Introduce legacy registers access commands

From: Jason Wang <jasowang@redhat.com>
To: Parav Pandit <parav@nvidia.com>
Date: Sat, 6 May 2023 10:25:31 +0800

On Sat, May 6, 2023 at 10:24âAM Jason Wang <jasowang@redhat.com> wrote:
>
> On Fri, May 5, 2023 at 8:49âPM Parav Pandit <parav@nvidia.com> wrote:
> >
> >
> >
> > > From: Jason Wang <jasowang@redhat.com>
> > > Sent: Thursday, May 4, 2023 11:27 PM
> > >
> > > So the "single stack" is kind of misleading, you need a dedicated virtio
> > > mediation layer which has different code path than a simpler vfio-pci which is
> > > completely duplicated with vDPA subsystem.
> > Huh. No. it is not duplicated.
> > Vfio-pci provides the framework for extension than doing simple vfio-pci for extensions.
>
> I'm not sure how to define simple here, do you mean mdev?
>
> > I am not debating here vdpa vs non vdpa yet again.
> >
> > > And you lose all the advantages of
> > > vDPA in this way. The device should not be designed for a single type of
> > > software stack , it need to leave the decision to the hypervisor/cloud vendors.
> > >
> > It is left to the hypervisor/cloud user to decide to use vdpa or vfio or something else.
> >
> > >
> > > >     virtio device type (net/blk) and be future compatible with a
> > > >     single vfio stack using SR-IOV or other scalable device
> > > >     virtualization technology to map PCI devices to the guest VM.
> > > >     (as transitional or otherwise)
> > > >
> > > > Motivation/Background:
> > > > ----------------------
> > > > The existing virtio transitional PCI device is missing support for PCI
> > > > SR-IOV based devices. Currently it does not work beyond PCI PF, or as
> > > > software emulated device in reality. Currently it has below cited
> > > > system level limitations:
> > > >
> > > > [a] PCIe spec citation:
> > > > VFs do not support I/O Space and thus VF BARs shall not indicate I/O Space.
> > > >
> > > > [b] cpu arch citiation:
> > > > Intel 64 and IA-32 Architectures Software Developerâs Manual:
> > > > The processorâs I/O address space is separate and distinct from the
> > > > physical-memory address space. The I/O address space consists of 64K
> > > > individually addressable 8-bit I/O ports, numbered 0 through FFFFH.
> > > >
> > > > [c] PCIe spec citation:
> > > > If a bridge implements an I/O address range,...I/O address range will
> > > > be aligned to a 4 KB boundary.
> > > >
> > > > Above usecase requirements can be solved by PCI PF group owner
> > > > enabling the access to its group member PCI VFs legacy registers using
> > > > an admin virtqueue of the group owner PCI PF.
> > > >
> > > > Software usage example:
> > > > -----------------------
> > > > The most common way to use and map to the guest VM is by using vfio
> > > > driver framework in Linux kernel.
> > > >
> > > >                  +----------------------+
> > > >                  |pci_dev_id = 0x100X   |
> > > > +---------------|pci_rev_id = 0x0      |-----+
> > > > |vfio device    |BAR0 = I/O region     |     |
> > > > |               |Other attributes      |     |
> > > > |               +----------------------+     |
> > > > |                                            |
> > > > +   +--------------+     +-----------------+ |
> > > > |   |I/O BAR to AQ |     | Other vfio      | |
> > > > |   |rd/wr mapper  |     | functionalities | |
> > > > |   +--------------+     +-----------------+ |
> > > > |                                            |
> > > > +------+-------------------------+-----------+
> > > >         |                         |
> > >
> > >
> > > So the mapper here is actually the control path mediation layer which
> > > duplicates with vDPA.
> > >
> > Yet again no. It implements PCI level abstraction.
> > It is not touching the whole QEMU layer and not at all getting involved in virtio device flow of understanding device reset, and device config space, cvq, features bits and more.
>
> I think you miss the fact that QEMU can choose to not understand all
> you mentioned here with the help of the general vdpa device.
> Vhost-vDPA provides a much simpler device abstraction than vfio-pci.
> If a cloud vendor wants a tiny/thin hypervisor layer, it can be done
> through vDPA for sure.
>
> > All of these were discussed in v0, lets not repeat.
> >
> > >
> > > >    +----+------------+       +----+------------+
> > > >    | +-----+         |       | PCI VF device A |
> > > >    | | AQ  |-------------+---->+-------------+ |
> > > >    | +-----+         |   |   | | legacy regs | |
> > > >    | PCI PF device   |   |   | +-------------+ |
> > > >    +-----------------+   |   +-----------------+
> > > >                          |
> > > >                          |   +----+------------+
> > > >                          |   | PCI VF device N |
> > > >                          +---->+-------------+ |
> > > >                              | | legacy regs | |
> > > >                              | +-------------+ |
> > > >                              +-----------------+
> > > >
> > > > 2. Virtio pci driver to bind to the listed device id and
> > > >     use it as native device in the host.
> > >
> > >
> > > How this can be done now?
> > >
> > Currently a PCI VF binds to the virtio driver and without any vdpa layering, virtio net/blk etc devices are created on top of virtio PCI VF device.
> > Not sure I understood your question.
>
> I meant the current virtio-pci driver can use what you propose here.

Actually, I meant "can't".

Thanks

>
> >
> > > > +\begin{lstlisting}
> > > > +struct virtio_admin_cmd_lreg_wr_data {
> > > > +   u8 offset; /* Starting byte offset of the register(s) to write */
> > > > +   u8 size; /* Number of bytes to write into the register. */
> > > > +   u8 register[];
> > > > +};
> > > > +\end{lstlisting}
> > >
> > >
> > > So this actually implements a transport, I wonder if it would be better
> > > (and simpler) to do it on top of the transport vq proposal:
> > >
> > > https://lists.oasis-open.org/archives/virtio-comment/202208/msg00003.html
> > >
> > I also wonder why TVQ cannot use AQ.
>
> It can for sure, but whether using a single virtqueue type for both
> administration and transport is still questionable.
>
> >
> > > Then it aligns with SIOV natively.
> > >
> > SIOV is not well defined spec, whenever it is defined, it can use AQ or TVQ.
> >
> > We also discussed that mediation of hypervisor for control path in some use case is not desired, hence I will leave that discussion to the future when SIOV arrives.
>
> We need to plan it ahead. We don't want to end up with redundant
> design. For example, this proposal is actually a partial transport
> implementation. Transport virtqueue can do much better in this case.
>
> Thanks

References:
- [PATCH v1 0/2] transport-pci: Introduce legacy registers access using AQ
  - From: Parav Pandit <parav@nvidia.com>
- [PATCH v1 1/2] transport-pci: Introduce legacy registers access commands
  - From: Parav Pandit <parav@nvidia.com>
- Re: [PATCH v1 1/2] transport-pci: Introduce legacy registers access commands
  - From: Jason Wang <jasowang@redhat.com>
- RE: [PATCH v1 1/2] transport-pci: Introduce legacy registers access commands
  - From: Parav Pandit <parav@nvidia.com>
- Re: [PATCH v1 1/2] transport-pci: Introduce legacy registers access commands
  - From: Jason Wang <jasowang@redhat.com>