virtio-comment message

Subject: Re: [PATCH v1 1/2] transport-pci: Introduce legacy registers access commands

From: Jason Wang <jasowang@redhat.com>
To: Parav Pandit <parav@nvidia.com>
Date: Sat, 6 May 2023 10:24:20 +0800

On Fri, May 5, 2023 at 8:49âPM Parav Pandit <parav@nvidia.com> wrote:
>
>
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Thursday, May 4, 2023 11:27 PM
> >
> > So the "single stack" is kind of misleading, you need a dedicated virtio
> > mediation layer which has different code path than a simpler vfio-pci which is
> > completely duplicated with vDPA subsystem.
> Huh. No. it is not duplicated.
> Vfio-pci provides the framework for extension than doing simple vfio-pci for extensions.

I'm not sure how to define simple here, do you mean mdev?

> I am not debating here vdpa vs non vdpa yet again.
>
> > And you lose all the advantages of
> > vDPA in this way. The device should not be designed for a single type of
> > software stack , it need to leave the decision to the hypervisor/cloud vendors.
> >
> It is left to the hypervisor/cloud user to decide to use vdpa or vfio or something else.
>
> >
> > >     virtio device type (net/blk) and be future compatible with a
> > >     single vfio stack using SR-IOV or other scalable device
> > >     virtualization technology to map PCI devices to the guest VM.
> > >     (as transitional or otherwise)
> > >
> > > Motivation/Background:
> > > ----------------------
> > > The existing virtio transitional PCI device is missing support for PCI
> > > SR-IOV based devices. Currently it does not work beyond PCI PF, or as
> > > software emulated device in reality. Currently it has below cited
> > > system level limitations:
> > >
> > > [a] PCIe spec citation:
> > > VFs do not support I/O Space and thus VF BARs shall not indicate I/O Space.
> > >
> > > [b] cpu arch citiation:
> > > Intel 64 and IA-32 Architectures Software Developerâs Manual:
> > > The processorâs I/O address space is separate and distinct from the
> > > physical-memory address space. The I/O address space consists of 64K
> > > individually addressable 8-bit I/O ports, numbered 0 through FFFFH.
> > >
> > > [c] PCIe spec citation:
> > > If a bridge implements an I/O address range,...I/O address range will
> > > be aligned to a 4 KB boundary.
> > >
> > > Above usecase requirements can be solved by PCI PF group owner
> > > enabling the access to its group member PCI VFs legacy registers using
> > > an admin virtqueue of the group owner PCI PF.
> > >
> > > Software usage example:
> > > -----------------------
> > > The most common way to use and map to the guest VM is by using vfio
> > > driver framework in Linux kernel.
> > >
> > >                  +----------------------+
> > >                  |pci_dev_id = 0x100X   |
> > > +---------------|pci_rev_id = 0x0      |-----+
> > > |vfio device    |BAR0 = I/O region     |     |
> > > |               |Other attributes      |     |
> > > |               +----------------------+     |
> > > |                                            |
> > > +   +--------------+     +-----------------+ |
> > > |   |I/O BAR to AQ |     | Other vfio      | |
> > > |   |rd/wr mapper  |     | functionalities | |
> > > |   +--------------+     +-----------------+ |
> > > |                                            |
> > > +------+-------------------------+-----------+
> > >         |                         |
> >
> >
> > So the mapper here is actually the control path mediation layer which
> > duplicates with vDPA.
> >
> Yet again no. It implements PCI level abstraction.
> It is not touching the whole QEMU layer and not at all getting involved in virtio device flow of understanding device reset, and device config space, cvq, features bits and more.

I think you miss the fact that QEMU can choose to not understand all
you mentioned here with the help of the general vdpa device.
Vhost-vDPA provides a much simpler device abstraction than vfio-pci.
If a cloud vendor wants a tiny/thin hypervisor layer, it can be done
through vDPA for sure.

> All of these were discussed in v0, lets not repeat.
>
> >
> > >    +----+------------+       +----+------------+
> > >    | +-----+         |       | PCI VF device A |
> > >    | | AQ  |-------------+---->+-------------+ |
> > >    | +-----+         |   |   | | legacy regs | |
> > >    | PCI PF device   |   |   | +-------------+ |
> > >    +-----------------+   |   +-----------------+
> > >                          |
> > >                          |   +----+------------+
> > >                          |   | PCI VF device N |
> > >                          +---->+-------------+ |
> > >                              | | legacy regs | |
> > >                              | +-------------+ |
> > >                              +-----------------+
> > >
> > > 2. Virtio pci driver to bind to the listed device id and
> > >     use it as native device in the host.
> >
> >
> > How this can be done now?
> >
> Currently a PCI VF binds to the virtio driver and without any vdpa layering, virtio net/blk etc devices are created on top of virtio PCI VF device.
> Not sure I understood your question.

I meant the current virtio-pci driver can use what you propose here.

>
> > > +\begin{lstlisting}
> > > +struct virtio_admin_cmd_lreg_wr_data {
> > > +   u8 offset; /* Starting byte offset of the register(s) to write */
> > > +   u8 size; /* Number of bytes to write into the register. */
> > > +   u8 register[];
> > > +};
> > > +\end{lstlisting}
> >
> >
> > So this actually implements a transport, I wonder if it would be better
> > (and simpler) to do it on top of the transport vq proposal:
> >
> > https://lists.oasis-open.org/archives/virtio-comment/202208/msg00003.html
> >
> I also wonder why TVQ cannot use AQ.

It can for sure, but whether using a single virtqueue type for both
administration and transport is still questionable.

>
> > Then it aligns with SIOV natively.
> >
> SIOV is not well defined spec, whenever it is defined, it can use AQ or TVQ.
>
> We also discussed that mediation of hypervisor for control path in some use case is not desired, hence I will leave that discussion to the future when SIOV arrives.

We need to plan it ahead. We don't want to end up with redundant
design. For example, this proposal is actually a partial transport
implementation. Transport virtqueue can do much better in this case.

Thanks

Follow-Ups:
- Re: [PATCH v1 1/2] transport-pci: Introduce legacy registers access commands
  - From: Jason Wang <jasowang@redhat.com>

References:
- [PATCH v1 0/2] transport-pci: Introduce legacy registers access using AQ
  - From: Parav Pandit <parav@nvidia.com>
- [PATCH v1 1/2] transport-pci: Introduce legacy registers access commands
  - From: Parav Pandit <parav@nvidia.com>
- Re: [PATCH v1 1/2] transport-pci: Introduce legacy registers access commands
  - From: Jason Wang <jasowang@redhat.com>
- RE: [PATCH v1 1/2] transport-pci: Introduce legacy registers access commands
  - From: Parav Pandit <parav@nvidia.com>