OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [PATCH v2 0/2] transport-pci: Introduce legacy registers access using AQ




On 5/7/2023 10:23 PM, Jason Wang wrote:
On Sun, May 7, 2023 at 9:44âPM Michael S. Tsirkin <mst@redhat.com> wrote:

On Sat, May 06, 2023 at 10:31:30AM +0800, Jason Wang wrote:
On Sat, May 6, 2023 at 8:02âAM Parav Pandit <parav@nvidia.com> wrote:

This short series introduces legacy registers access commands for the owner
group member PCI PF to access the legacy registers of the member VFs.

If in future any SIOV devices to support legacy registers, they
can be easily supported using same commands by using the group
member identifiers of the future SIOV devices.

More details as overview, motivation, use case are further described
below.

Patch summary:
--------------
patch-1 adds administrative virtuqueue commands
patch-2 adds its conformance section

This short series is on top of latest work [1] from Michael.
It uses the newly introduced administrative virtqueue facility with 3 new
commands which uses the existing virtio_admin_cmd.

[1] https://lists.oasis-open.org/archives/virtio-comment/202305/msg00112.html

Usecase:
--------
1. A hypervisor/system needs to provide transitional
    virtio devices to the guest VM at scale of thousands,
    typically, one to eight devices per VM.

2. A hypervisor/system needs to provide such devices using a
    vendor agnostic driver in the hypervisor system.

3. A hypervisor system prefers to have single stack regardless of
    virtio device type (net/blk) and be future compatible with a
    single vfio stack using SR-IOV or other scalable device
    virtualization technology to map PCI devices to the guest VM.
    (as transitional or otherwise)

Motivation/Background:
----------------------
The existing virtio transitional PCI device is missing support for
PCI SR-IOV based devices. Currently it does not work beyond
PCI PF, or as software emulated device in reality. Currently it
has below cited system level limitations:

[a] PCIe spec citation:
VFs do not support I/O Space and thus VF BARs shall not indicate I/O Space.

[b] cpu arch citiation:
Intel 64 and IA-32 Architectures Software Developerâs Manual:
The processorâs I/O address space is separate and distinct from
the physical-memory address space. The I/O address space consists
of 64K individually addressable 8-bit I/O ports, numbered 0 through FFFFH.

[c] PCIe spec citation:
If a bridge implements an I/O address range,...I/O address range will be
aligned to a 4 KB boundary.

Overview:
---------
Above usecase requirements can be solved by PCI PF group owner accessing
its group member PCI VFs legacy registers using an admin virtqueue of
the group owner PCI PF.

Two new admin virtqueue commands are added which read/write PCI VF
registers.

The third command suggested by Jason queries the VF device's driver
notification region.

Software usage example:
-----------------------
One way to use and map to the guest VM is by using vfio driver
framework in Linux kernel.

                 +----------------------+
                 |pci_dev_id = 0x100X   |
+---------------|pci_rev_id = 0x0      |-----+
|vfio device    |BAR0 = I/O region     |     |
|               |Other attributes      |     |
|               +----------------------+     |
|                                            |
+   +--------------+     +-----------------+ |
|   |I/O BAR to AQ |     | Other vfio      | |
|   |rd/wr mapper  |     | functionalities | |
|   +--------------+     +-----------------+ |
|                                            |
+------+-------------------------+-----------+
        |                         |
   +----+------------+       +----+------------+
   | +-----+         |       | PCI VF device A |
   | | AQ  |-------------+---->+-------------+ |
   | +-----+         |   |   | | legacy regs | |
   | PCI PF device   |   |   | +-------------+ |
   +-----------------+   |   +-----------------+
                         |
                         |   +----+------------+
                         |   | PCI VF device N |
                         +---->+-------------+ |
                             | | legacy regs | |
                             | +-------------+ |
                             +-----------------+

2. Virtio pci driver to bind to the listed device id and
    use it as native device in the host.

3. Use it in a light weight hypervisor to run bare-metal OS.

Please review.

Fixes: https://github.com/oasis-tcs/virtio-spec/issues/167
Signed-off-by: Parav Pandit <parav@nvidia.com>

---
changelog:
v1->v2:
- addressed comments from Michael
- added theory of operation
- grammar corrections
- removed group fields description from individual commands as
   it is already present in generic section
- added endianness normative for legacy device registers region
- renamed the file to drop vf and add legacy prefix
- added overview in commit log
- renamed subsection to reflect command

So as replied in V1, I think it's not a good idea to invent commands
for a partial transport just for legacy devices. It's better either:

1) rebase or collaborate this work on top of the transport virtqueue

or

2) having a PCI over admin virtqueue transport, since this proposal
has already had BAR access, we can add config space access then it is
self-contained so we don't need to go through every corner case like
inventing dedicated commands to accessing some function that is
duplicated with capabilities. It will become yet another transport and
legacy support is just a good byproduct.

Thanks


I thought so too originally. Unfortunately I now think that no, legacy is not
going to be a byproduct of transport virtqueue for modern -
it is different enough that it needs dedicated commands.

If you mean the transport virtqueue, I think some dedicated commands
for legacy are needed. Then it would be a transport that supports
transitional devices. It would be much better than having commands for
a partial transport like this patch did.

Consider simplest case, multibyte fields. Legacy needs multibyte write,
modern does not even need multibyte read.

I'm not sure I will get here, since we can't expose admin vq to
guests, it means we need some software mediation. So if we just
implement what PCI allows us, then everything would be fine (even if
some method is not used).

Thanks

The fundamental reason for not accessing the 1.x VF and SIOV device registers, config space, feature bits through PF is: it requires PF device mediation. VF and SIOV devices are first class citizen in PCIe spec and deserve direct interaction with the device.

Hence, the transport we built is to consider this in mind for the coming future. So if each VF has its own configq, or cmdq, it totally make sense to me which is bootstrap interface to transport existing config space interface.
The problem is: it is not backward compatible;
Hence a device has no way of when to support both or only new configq.

So eve growing these fields and optionally placement on configq doesn't really help and device builder to build it efficiently (without much predictability).

Instead of we say, that what exists today in config space stays in config space, anything additional on new q, than its deterministic behavior to size up the scale.

For example, a PCI device who wants to support 100 VFs, can easily size its memory to 30 bytes * 100 reserved for supporting config space.
And new 40 bytes * 100 fields doesn't have to be in the resident memory.

If we have optional configq/cmdq for transport, than 30*100 bytes are used (reserved) as 3000/(30+40) = 42 VFs.

Only if some VFs use configq, more VFs can be deployed. It is hard to build scale this way. Therefore suggestion is to place new attributes on new config/cmd/transport q, and old to stay as-is.

The legacy infra is unfortunately is for the exception path due to the history; hence they are different commands as Michael suggests.

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]