[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [virtio-comment] Re: [PATCH 09/11] transport-pci: Describe PCI MMR dev config registers
On 4/12/2023 9:48 PM, Jason Wang wrote:
The defined spec for PCI device does not work today for transitional device for virtualization. Only works in limited PF case.On Wed, Apr 12, 2023 at 10:23âPM Parav Pandit <parav@nvidia.com> wrote:From: Jason Wang <jasowang@redhat.com> Sent: Wednesday, April 12, 2023 2:15 AM On Wed, Apr 12, 2023 at 1:55âPM Parav Pandit <parav@nvidia.com> wrote:From: Jason Wang <jasowang@redhat.com> Sent: Wednesday, April 12, 2023 1:38 AMModern device says FEAETURE_1 must be offered and must be negotiated bydriver.Legacy has Mac as RW area. (hypervisor can do it). Reset flow is difference between the legacy and modern.Just to make sure we're at the same page. We're talking in the context of mediation. Without mediation, your proposal can't work.Right.So in this case, the guest driver is not talking with the device directly. Qemu needs to traps whatever it wants to achieve the mediation:I prefer to avoid picking specific sw component here, but yes. QEMU can trap.1) It's perfectly fine that Qemu negotiated VERSION_1 but presented a mediated legacy device to guests.Right but if VERSION_1 is negotiated, device will work as V_1 with 12Bvirtio_net_hdr. Shadow virtqueue could be used here. And we have much more issues without shadow virtqueue, more below.2) For MAC and Reset, Qemu can trap and do anything it wants.The idea is not to poke in the fields even though such sw can. MAC is RW in legacy. Mac ia RO in 1.x. So QEMU cannot make RO register into RW.It can be done via using the control vq. Trap the MAC write and forward it via control virtqueue.This proposal Is not implementing about vdpa mediator that requires far higher understanding in hypervisor.It's not related to vDPA, it's about a common technology that is used in virtualization. You do a trap and emulate the status, why can't you do that for others?Such mediation works fine for vdpa and it is upto vdpa layer to do. Not relevant here.The proposed solution in this series enables it and avoid per field swinterpretation and mediation in parsing values etc. I don't think it's possible. See the discussion about ORDER_PLATFORM and ACCESS_PLATFORM in previous threads.I have read the previous thread. Hypervisor will be limiting to those platforms where ORDER_PLATFORM is not needed.So you introduce a bunch of new facilities that only work on some specific archs. This breaks the architecture independence of virtiosince 1.0.
Hence this update. More below.
The root cause is legacy is not fit for hardware implementation, any kind of hardware that tries to offer legacy function will finally run into those corner cases which require extra interfaces which may finally end up with a (partial) duplication of the modern interface.
I agree with you. We cannot change the legacy.What is being added here it to enable legacy transport via MMIO or AQ and using notification region.
Will comment where you listed 3 options.
Device uses the PCI transport level addresses configured because its a PCI device.And this is a pci transitional device that uses the standard platform dma anyway so ACCESS_PLATFORM is not related.So which type of transactions did this device use when it is used via legacy MMIO BAR? Translated request or not?
For example, a device may have implemented say only BAR2, and small portion of the BAR2 is pointing to legacy MMIO config registers.We're discussing spec changes, not a specific implementation here. Why is the device can't use BAR0, do you see any restriction in the spec?
No restriction. Forcing it to use BAR0 is the restrictive method.
cap_len talks about length of the PCI capability structure as defined by the PCI spec. BAR length is located in the le32 length.A mediator hypervisor sw will be able to read/write to it when BAR0 is exposed towards the guest VM as IOBAR 0.So I don't think it can work: 1) This is very dangerous unless the spec mandates the size (this is also tricky since page size varies among arches) for any BAR/capability which is not what virtio wants, the spec leave those flexibility to the implementation: E.g """ The driver MUST accept a cap_len value which is larger than specified here. """
So new MMIO region can be of any size and anywhere in the BAR.For LM BAR length and number should be same between two PCI VFs. But its orthogonal to this point. Such checks will be done anyway.
2) A blocker for live migration (and compatibility), the hypervisor should not assume the size for any capability so for whatever case it should have a fallback for the case where the BAR can't be assigned.
I agree that hypervisor should not assume. for LM such compatibility checks will be done anyway. So not a blocker, they should match on two sides is all needed.
Not needed as Michael suggest. Existing transitional or non transitional device can expose this optional capability and its attached MMIO region.Let me summarize, we had three ways currently: 1) legacy MMIO BAR via capability: Pros: - allow some flexibility to place MMIO BAR other than 0 Cons: - new device ID
Spec changes are similar to #2.
For bare-metal PFs usually thin hypervisors are used that does very minimal setup. But I agree that bare-metal is relatively less important.- non trivial spec changes which ends up of the tricky cases that tries to workaround legacy to fit for a hardware implementation - work only for the case of virtualization with the help of meditation, can't work for bare metal
- only work for some specific archs without SVQ
That is the legacy limitation that we don't worry about.
2) allow BAR0 to be MMIO for transitional device Pros: - very minor change for the spec
Spec changes wise they are similar to #1.
I am not aware where can it work without mediation. Do you know any specific kernel version where it actually works?- work for virtualization (and it work even without dedicated mediation for some setups)
Both are not limitation as they are mainly coming from the legacy side of things.- work for bare metal for some setups (without mediation) Cons: - only work for some specific archs without SVQ - BAR0 is required
Spec change is still require for net and blk because modern device do not understand legacy, even with mediation layer.3) modern device mediation for legacy Pros: - no changes in the spec Cons: - require mediation layer in order to work in bare metal - require datapath mediation like SVQ to work for virtualization
FEATURE_1, RW cap via CVQ which is not really owned by the hypervisor. A guest may be legacy or non legacy, so mediation shouldn't be always done.
Compared to method 2) the only advantages of method 1) is the flexibility of BAR0 but it has too many disadvantages. If we only care about virtualization, modern devices are sufficient. Then why bother for that?
So that a single stack which doesn't always have the knowledge of which driver version is running is guest can utilize it. Otherwise 1.x also end up doing mediation when guest driver = 1.x and device = transitional PCI VF.
so (1) and (2) both are equivalent, one is more flexible, if you know more valid cases where BAR0 as MMIO can work as_is, such option is open.
We can draft the spec that MMIO BAR SHOULD be exposes in BAR0.
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]