OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [virtio-comment] Re: [PATCH 09/11] transport-pci: Describe PCI MMR dev config registers


On Wed, Apr 12, 2023 at 10:23âPM Parav Pandit <parav@nvidia.com> wrote:
>
>
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Wednesday, April 12, 2023 2:15 AM
> >
> > On Wed, Apr 12, 2023 at 1:55âPM Parav Pandit <parav@nvidia.com> wrote:
> > >
> > >
> > >
> > > > From: Jason Wang <jasowang@redhat.com>
> > > > Sent: Wednesday, April 12, 2023 1:38 AM
> > >
> > > > > Modern device says FEAETURE_1 must be offered and must be
> > > > > negotiated by
> > > > driver.
> > > > > Legacy has Mac as RW area. (hypervisor can do it).
> > > > > Reset flow is difference between the legacy and modern.
> > > >
> > > > Just to make sure we're at the same page. We're talking in the
> > > > context of mediation. Without mediation, your proposal can't work.
> > > >
> > > Right.
> > >
> > > > So in this case, the guest driver is not talking with the device
> > > > directly. Qemu needs to traps whatever it wants to achieve the
> > > > mediation:
> > > >
> > > I prefer to avoid picking specific sw component here, but yes. QEMU can trap.
> > >
> > > > 1) It's perfectly fine that Qemu negotiated VERSION_1 but presented
> > > > a mediated legacy device to guests.
> > > Right but if VERSION_1 is negotiated, device will work as V_1 with 12B
> > virtio_net_hdr.
> >
> > Shadow virtqueue could be used here. And we have much more issues without
> > shadow virtqueue, more below.
> >
> > >
> > > > 2) For MAC and Reset, Qemu can trap and do anything it wants.
> > > >
> > > The idea is not to poke in the fields even though such sw can.
> > > MAC is RW in legacy.
> > > Mac ia RO in 1.x.
> > >
> > > So QEMU cannot make RO register into RW.
> >
> > It can be done via using the control vq. Trap the MAC write and forward it via
> > control virtqueue.
> >
> This proposal Is not implementing about vdpa mediator that requires far higher understanding in hypervisor.

It's not related to vDPA, it's about a common technology that is used
in virtualization. You do a trap and emulate the status, why can't you
do that for others?

> Such mediation works fine for vdpa and it is upto vdpa layer to do. Not relevant here.
>
> > >
> > > The proposed solution in this series enables it and avoid per field sw
> > interpretation and mediation in parsing values etc.
> >
> > I don't think it's possible. See the discussion about ORDER_PLATFORM and
> > ACCESS_PLATFORM in previous threads.
> >
> I have read the previous thread.
> Hypervisor will be limiting to those platforms where ORDER_PLATFORM is not needed.

So you introduce a bunch of new facilities that only work on some
specific archs. This breaks the architecture independence of virtio
since 1.0. The root cause is legacy is not fit for hardware
implementation, any kind of hardware that tries to offer legacy
function will finally run into those corner cases which require extra
interfaces which may finally end up with a (partial) duplication of
the modern interface.

> And this is a pci transitional device that uses the standard platform dma anyway so ACCESS_PLATFORM is not related.

So which type of transactions did this device use when it is used via
legacy MMIO BAR? Translated request or not?

>
> > >
> > > What is proposed here, that
> > > a. legacy registers are emulated as MMIO in a BAR.
> > > b. This can be either be BAR0 or some other BAR
> > >
> > > Your question was why this flexibility?
> >
> > Yes.
> >
> > >
> > > The reason is:
> > > a. if device prefers to implement only two BARs, it can do so and have window
> > for this 60+ config registers in an existing BAR.
> > > b. if device prefers to implement a new BAR dedicated for legacy registers
> > emulation, it is fine too.
> > >
> > > A mediating sw will be able to forward them regardless.
> >
> > I'm not sure I fully understand this. The only difference is that for b, it can only
> > use BAR0.
> Why do say it can use only BAR 0?

Because:

1) It's the way current transitional device works
2) it's simple, a small extension to the transitional device instead
of a brunch of facilities that is can do much less than this
3) it works for legacy drivers on some OSes such as Linux and DPDK, it
means it works for bare metal which can't be achieved by your proposal
here

>
> For example, a device may have implemented say only BAR2, and small portion of the BAR2 is pointing to legacy MMIO config registers.

We're discussing spec changes, not a specific implementation here. Why
is the device can't use BAR0, do you see any restriction in the spec?

> A mediator hypervisor sw will be able to read/write to it when BAR0 is exposed towards the guest VM as IOBAR 0.

So I don't think it can work:

1) This is very dangerous unless the spec mandates the size (this is
also tricky since page size varies among arches) for any
BAR/capability which is not what virtio wants, the spec leave those
flexibility to the implementation:

E.g

"""
The driver MUST accept a cap_len value which is larger than specified here.
"""

2) A blocker for live migration (and compatibility), the hypervisor
should not assume the size for any capability so for whatever case it
should have a fallback for the case where the BAR can't be assigned.

>
> > Unless there's a new feature that mandates
> > BAR0 (which I think is impossible since all the features are advertised via
> > capabilities now). We're fine.
> >
> No new feature. Legacy BAR emulation is exposed via the extended capability we discussed providing the location.
>
> > >
> > > > > Right, it doesnât. But spec shouldnât write BAR0 is only for
> > > > > legacy MMIO
> > > > emulation, that would prevent BAR0 usage.
> > > >
> > > > How can it be prevented? Can you give me an example?
> > >
> > > I mean to say, that say if we write a spec like below,
> > >
> > > A device exposes BAR 0 of size X bytes for supporting legacy configuration
> > and device specific registers as memory mapped region.
> > >
> >
> > Ok, it looks just a matter of how the spec is written. The problematic part is that
> > it tries to enforce a size which is suboptimal.
> >
> > What's has been done is:
> >
> > "
> > Transitional devices MUST expose the Legacy Interface in I/O space in BAR0.
> > "
> >
> > Without mentioning the size.
>
> For new legacy MMIO registers can be implemented as BAR0 with same size. But better to not place such restriction like above wording.

Let me summarize, we had three ways currently:

1) legacy MMIO BAR via capability:

Pros:
- allow some flexibility to place MMIO BAR other than 0
Cons:
- new device ID
- non trivial spec changes which ends up of the tricky cases that
tries to workaround legacy to fit for a hardware implementation
- work only for the case of virtualization with the help of
meditation, can't work for bare metal
- only work for some specific archs without SVQ

2) allow BAR0 to be MMIO for transitional device

Pros:
- very minor change for the spec
- work for virtualization (and it work even without dedicated
mediation for some setups)
- work for bare metal for some setups (without mediation)
Cons:
- only work for some specific archs without SVQ
- BAR0 is required

3) modern device mediation for legacy

Pros:
- no changes in the spec
Cons:
- require mediation layer in order to work in bare metal
- require datapath mediation like SVQ to work for virtualization

Compared to method 2) the only advantages of method 1) is the
flexibility of BAR0 but it has too many disadvantages. If we only care
about virtualization, modern devices are sufficient. Then why bother
for that?

Thanks



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]