virtio-comment message

Subject: Re: [virtio-comment] Re: [PATCH 09/11] transport-pci: Describe PCI MMR dev config registers

From: Parav Pandit <parav@nvidia.com>
To: Jason Wang <jasowang@redhat.com>
Date: Thu, 13 Apr 2023 13:24:24 -0400



On 4/13/2023 1:14 AM, Jason Wang wrote:

For LM BAR length and number should be same between two PCI VFs. But its
orthogonal to this point. Such checks will be done anyway.


Quoted the wrong sections, I think it should be:

"
length MAY include padding, or fields unused by the driver, or future
extensions. Note: For example, a future device might present a large
structure size of several MBytes. As current devices never utilize
structures larger than 4KBytes in size, driver MAY limit the mapped
structure size to e.g. 4KBytes (thus ignoring parts of structure after
the first 4KBytes) to allow forward compatibility with such devices
without loss of functionality and without wasting resources.
"

yes. This is the one.

If it's a transitional device but not placed at BAR0, it might have
side effects for Linux drivers which assumes BAR0 for legacy.

True. Transitional can be at BAR0.

I don't see how easy it could be a non transitional device:

"
Devices or drivers with no legacy compatibility are referred to as
non-transitional devices and drivers, respectively.
"

Michael has suggested rewording of the text.
It is anyway new text so lets park it aside for now.
It is mostly tweaking the text.

device can expose this optional capability and its attached MMIO region.

Spec changes are similar to #2.

- non trivial spec changes which ends up of the tricky cases that
tries to workaround legacy to fit for a hardware implementation
- work only for the case of virtualization with the help of
meditation, can't work for bare metal

For bare-metal PFs usually thin hypervisors are used that does very
minimal setup. But I agree that bare-metal is relatively less important.


This is not what I understand. I know several vendors that are using
virtio devices for bare metal.

I was saying the case for legacy bare metal is less of a problem becausePCIe does not limit functionality, perf is still limited due to IOBAR.

- only work for some specific archs without SVQ

That is the legacy limitation that we don't worry about.

2) allow BAR0 to be MMIO for transitional device

Pros:
- very minor change for the spec

Spec changes wise they are similar to #1.


This is different since the changes for this are trivial.

- work for virtualization (and it work even without dedicated
mediation for some setups)

I am not aware where can it work without mediation. Do you know any
specific kernel version where it actually works?


E.g current Linux driver did:

rc = pci_request_region(pci_dev, 0, "virtio-pci-legacy");

It doesn't differ from I/O with memory. It means if you had a
"transitional" device with legacy MMIO BAR0, it just works.


Thanks to the abstract PCI API in Linux.

- work for bare metal for some setups (without mediation)
Cons:
- only work for some specific archs without SVQ
- BAR0 is required

Both are not limitation as they are mainly coming from the legacy side
of things.

3) modern device mediation for legacy

Pros:
- no changes in the spec
Cons:
- require mediation layer in order to work in bare metal
- require datapath mediation like SVQ to work for virtualization

Spec change is still require for net and blk because modern device do
not understand legacy, even with mediation layer.


That's fine and easy since we work on top of modern devices.

FEATURE_1, RW cap via CVQ which is not really owned by the hypervisor.


Hypervisors can trap if they wish.

Trapping non legacy accessing for 1.x doesn't make sense.

A guest may be legacy or non legacy, so mediation shouldn't be always done.


Yes, so mediation can work only if we found it's a legacy driver.

Mediation will be done only for legacy accesses without cvq, rest willgo as-is without any cvq and other mediation.

Compared to method 2) the only advantages of method 1) is the
flexibility of BAR0 but it has too many disadvantages. If we only care
about virtualization, modern devices are sufficient. Then why bother
for that?


So that a single stack which doesn't always have the knowledge of which
driver version is running is guest can utilize it. Otherwise 1.x also
end up doing mediation when guest driver = 1.x and device = transitional
PCI VF.


I don't see how this can be solved in your proposal.

This proposal only traps the legacy accesses and doesnt require othergiant framework.

I think we can make the BAR0 work for transitional with spec change andwith optional notification region.

I am evaluating further.


so (1) and (2) both are equivalent, one is more flexible, if you know
more valid cases where BAR0 as MMIO can work as_is, such option is open.


As said in previous threads, this has been used by several vendors for years.

E.g I have a handy transitional hardware virtio device that has:

         Region 0: Memory at f5ff0000 (64-bit, prefetchable) [size=8K]
         Region 2: Memory at f5fe0000 (64-bit, prefetchable) [size=4K]
         Region 4: Memory at f5800000 (64-bit, prefetchable) [size=4M]

And:

         Capabilities: [64] Vendor Specific Information: VirtIO: CommonCfg
                 BAR=0 offset=00000888 size=00000078
         Capabilities: [74] Vendor Specific Information: VirtIO: Notify
                 BAR=0 offset=00001800 size=00000020 multiplier=00000000
         Capabilities: [88] Vendor Specific Information: VirtIO: ISR
                 BAR=0 offset=00000820 size=00000020
         Capabilities: [98] Vendor Specific Information: VirtIO: DeviceCfg
                 BAR=0 offset=00000840 size=00000020


We can draft the spec that MMIO BAR SHOULD be exposes in BAR0.

Yes, above one.

Follow-Ups:
- Re: [virtio-comment] Re: [PATCH 09/11] transport-pci: Describe PCI MMR dev config registers
  - From: Jason Wang <jasowang@redhat.com>
- Re: [virtio-comment] Re: [PATCH 09/11] transport-pci: Describe PCI MMR dev config registers
  - From: "Michael S. Tsirkin" <mst@redhat.com>

References:
- Re: [virtio-comment] Re: [PATCH 09/11] transport-pci: Describe PCI MMR dev config registers
  - From: "Michael S. Tsirkin" <mst@redhat.com>
- Re: [virtio-comment] Re: [PATCH 09/11] transport-pci: Describe PCI MMR dev config registers
  - From: Jason Wang <jasowang@redhat.com>
- Re: [virtio-comment] Re: [PATCH 09/11] transport-pci: Describe PCI MMR dev config registers
  - From: "Michael S. Tsirkin" <mst@redhat.com>
- Re: [virtio-comment] Re: [PATCH 09/11] transport-pci: Describe PCI MMR dev config registers
  - From: Jason Wang <jasowang@redhat.com>
- RE: [virtio-comment] Re: [PATCH 09/11] transport-pci: Describe PCI MMR dev config registers
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] Re: [PATCH 09/11] transport-pci: Describe PCI MMR dev config registers
  - From: Jason Wang <jasowang@redhat.com>
- RE: [virtio-comment] Re: [PATCH 09/11] transport-pci: Describe PCI MMR dev config registers
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] Re: [PATCH 09/11] transport-pci: Describe PCI MMR dev config registers
  - From: Jason Wang <jasowang@redhat.com>
- RE: [virtio-comment] Re: [PATCH 09/11] transport-pci: Describe PCI MMR dev config registers
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] Re: [PATCH 09/11] transport-pci: Describe PCI MMR dev config registers
  - From: Jason Wang <jasowang@redhat.com>
- RE: [virtio-comment] Re: [PATCH 09/11] transport-pci: Describe PCI MMR dev config registers
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] Re: [PATCH 09/11] transport-pci: Describe PCI MMR dev config registers
  - From: Jason Wang <jasowang@redhat.com>
- Re: [virtio-comment] Re: [PATCH 09/11] transport-pci: Describe PCI MMR dev config registers
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] Re: [PATCH 09/11] transport-pci: Describe PCI MMR dev config registers
  - From: Jason Wang <jasowang@redhat.com>