virtio-comment message

Subject: Re: [PATCH 00/11] Introduce transitional mmr pci device

From: Parav Pandit <parav@nvidia.com>
To: mst@redhat.com, virtio-dev@lists.oasis-open.org, cohuck@redhat.com
Date: Mon, 24 Apr 2023 22:42:11 -0400



On 3/30/2023 6:58 PM, Parav Pandit wrote:

Overview:
---------
The Transitional MMR device is a variant of the transitional PCI device.
It has its own small Device ID range. It does not have I/O
region BAR; instead it exposes legacy configuration and device
specific registers at an offset in the memory region BAR.

Such transitional MMR devices will be used at the scale of
thousands of devices using PCI SR-IOV and/or future scalable
virtualization technology to provide backward
compatibility (for legacy devices) and also future
compatibility with new features.

Usecase:
--------
1. A hypervisor/system needs to provide transitional
    virtio devices to the guest VM at scale of thousands,
    typically, one to eight devices per VM.

2. A hypervisor/system needs to provide such devices using a
    vendor agnostic driver in the hypervisor system.

3. A hypervisor system prefers to have single stack regardless of
    virtio device type (net/blk) and be future compatible with a
    single vfio stack using SR-IOV or other scalable device
    virtualization technology to map PCI devices to the guest VM.
    (as transitional or otherwise)

Motivation/Background:
----------------------
The existing transitional PCI device is missing support for
PCI SR-IOV based devices. Currently it does not work beyond
PCI PF, or as software emulated device in reality. It currently
has below cited system level limitations:

[a] PCIe spec citation:
VFs do not support I/O Space and thus VF BARs shall not
indicate I/O Space.

[b] cpu arch citiation:
Intel 64 and IA-32 Architectures Software Developerâs Manual:
The processorâs I/O address space is separate and distinct from
the physical-memory address space. The I/O address space consists
of 64K individually addressable 8-bit I/O ports, numbered 0 through FFFFH.

[c] PCIe spec citation:
If a bridge implements an I/O address range,...I/O address range
will be aligned to a 4 KB boundary.

[d] I/O region accesses at PCI system level is slow as they are non-posted
operations in PCIe fabric.

After our last several discussions, feedback from Michel and Jason,

to support above use case requirements, I would like to update v1 withbelow proposal.

1. Use existing non transitional device to extend the legacy registersaccess.

2. AQ of the parent PF is the optimal choice to access VF legacyregisters. (As opposed to MMR of the VF).

This is because:
a. it enables to avoid complex reset flow at scale for the VFs.

b. it enables using existing driver notification which is alreadypresent at notification section of 1.x and transitional device.


3. New AQ command opcode for legacy register access read/write
Input fields:
a. opcode 0x8000
b. group and VF member identifiers.
c. registers offset,
d. registers size (1 to 64B)
e. registers content (on write)

output fields:
a. cmd status
b. register content on read

4. New AQ command to return q notify address for legacy access.
Inputs:
a. opcode 0x8001
b. group and VF member identifier or this can be just constant for all VFs?

Output:
a. BAR index
b. byte offset within the BAR

5. PCI Extended capabilities for all the existing capabilities locatedin the legacy section.

Why?

a. This is for the new driver (such as vfio) to always rely on the newcapabilities.

b. Legacy PCI regions is close to its full capacity.

Few option questions:

1. Should the q notification query command be per VF or should be onefor all group members (VF)?


Any further comments to address in v1?