virtio-comment message

Subject: Re: [virtio-comment] Re: [PATCH 00/11] Introduce transitional mmr pci device
From: David Edmondson <david.edmondson@oracle.com>
To: Parav Pandit <parav@nvidia.com>, mst@redhat.com, virtio-dev@lists.oasis-open.org, cohuck@redhat.com
Date: Tue, 02 May 2023 08:17:43 +0100
In line with our recent discussion about agreeing requirements for
specification changes, I wanted to say that we have a significant
existing estate of VMs using the legacy interface where the customer is
disinclined to update their software to a newer version (which might
consume the 1.x interface).

Given that, some mechanism for supporting a (mostly) hardware offloaded
legacy interface would definitely be useful to us, and the proposal here
seems like a sensible approach.

We are aware of vDPA, but that comes with its own challenges.

Parav Pandit <parav@nvidia.com> writes:

> On 3/30/2023 6:58 PM, Parav Pandit wrote:
>> Overview:
>> ---------
>> The Transitional MMR device is a variant of the transitional PCI device.
>> It has its own small Device ID range. It does not have I/O
>> region BAR; instead it exposes legacy configuration and device
>> specific registers at an offset in the memory region BAR.
>> 
>> Such transitional MMR devices will be used at the scale of
>> thousands of devices using PCI SR-IOV and/or future scalable
>> virtualization technology to provide backward
>> compatibility (for legacy devices) and also future
>> compatibility with new features.
>> 
>> Usecase:
>> --------
>> 1. A hypervisor/system needs to provide transitional
>>     virtio devices to the guest VM at scale of thousands,
>>     typically, one to eight devices per VM.
>> 
>> 2. A hypervisor/system needs to provide such devices using a
>>     vendor agnostic driver in the hypervisor system.
>> 
>> 3. A hypervisor system prefers to have single stack regardless of
>>     virtio device type (net/blk) and be future compatible with a
>>     single vfio stack using SR-IOV or other scalable device
>>     virtualization technology to map PCI devices to the guest VM.
>>     (as transitional or otherwise)
>> 
>> Motivation/Background:
>> ----------------------
>> The existing transitional PCI device is missing support for
>> PCI SR-IOV based devices. Currently it does not work beyond
>> PCI PF, or as software emulated device in reality. It currently
>> has below cited system level limitations:
>> 
>> [a] PCIe spec citation:
>> VFs do not support I/O Space and thus VF BARs shall not
>> indicate I/O Space.
>> 
>> [b] cpu arch citiation:
>> Intel 64 and IA-32 Architectures Software Developerâs Manual:
>> The processorâs I/O address space is separate and distinct from
>> the physical-memory address space. The I/O address space consists
>> of 64K individually addressable 8-bit I/O ports, numbered 0 through FFFFH.
>> 
>> [c] PCIe spec citation:
>> If a bridge implements an I/O address range,...I/O address range
>> will be aligned to a 4 KB boundary.
>> 
>> [d] I/O region accesses at PCI system level is slow as they are non-posted
>> operations in PCIe fabric.
>> 
> After our last several discussions, feedback from Michel and Jason,
> to support above use case requirements, I would like to update v1 with 
> below proposal.
>
> 1. Use existing non transitional device to extend the legacy registers 
> access.
>
> 2. AQ of the parent PF is the optimal choice to access VF legacy 
> registers. (As opposed to MMR of the VF).
> This is because:
> a. it enables to avoid complex reset flow at scale for the VFs.
>
> b. it enables using existing driver notification which is already 
> present at notification section of 1.x and transitional device.
>
> 3. New AQ command opcode for legacy register access read/write
> Input fields:
> a. opcode 0x8000
> b. group and VF member identifiers.
> c. registers offset,
> d. registers size (1 to 64B)
> e. registers content (on write)
>
> output fields:
> a. cmd status
> b. register content on read
>
> 4. New AQ command to return q notify address for legacy access.
> Inputs:
> a. opcode 0x8001
> b. group and VF member identifier or this can be just constant for all VFs?
>
> Output:
> a. BAR index
> b. byte offset within the BAR
>
> 5. PCI Extended capabilities for all the existing capabilities located 
> in the legacy section.
> Why?
> a. This is for the new driver (such as vfio) to always rely on the new 
> capabilities.
> b. Legacy PCI regions is close to its full capacity.
>
> Few option questions:
> 1. Should the q notification query command be per VF or should be one 
> for all group members (VF)?
>
> Any further comments to address in v1?
>
> This publicly archived list offers a means to provide input to the
> OASIS Virtual I/O Device (VIRTIO) TC.
>
> In order to verify user consent to the Feedback License terms and
> to minimize spam in the list archive, subscription is required
> before posting.
>
> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> List help: virtio-comment-help@lists.oasis-open.org
> List archive: https://lists.oasis-open.org/archives/virtio-comment/
> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/virtio/
> Join OASIS: https://www.oasis-open.org/join/
-- 
And you're standing here beside me, I love the passing of time.
Follow-Ups:
- RE: [virtio-comment] Re: [PATCH 00/11] Introduce transitional mmr pci device
  - From: Parav Pandit <parav@nvidia.com>