OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE


> From: Jason Wang <jasowang@redhat.com>
> Sent: Thursday, September 14, 2023 8:39 AM
> 
> On Wed, Sep 13, 2023 at 2:39âPM Parav Pandit <parav@nvidia.com> wrote:
> >
> >
> > > From: Jason Wang <jasowang@redhat.com>
> > > Sent: Wednesday, September 13, 2023 10:15 AM
> >
> > [..]
> > > > > > [1]
> > > > > > https://lists.oasis-open.org/archives/virtio-comment/202309/ms
> > > > > > g000
> > > > > > 61.h
> > > > > > tml
> > > > >
> > > > > The series works for stateless devices. Before we introduce
> > > > > device states in the spec, we can't migrate stateful devices. So
> > > > > the device context doesn't make much sense right now.
> > > > The series works for stateful devices too. The device context covers it.
> > >
> > > How? Can it be used for migrating any existing stateful devices?
> > > Don't we need to define what context means for a specific stateful
> > > device before you can introduce things like device context? Please
> > > go through the archives for the relevant discussions (e.g
> > > virtio-FS), it's not as simple as introducing a device context API.
> > >
> > A device will have its own context for example RSS definition, or flow filters
> tomorrow.
> 
> If you know there are things that are missing when posting the patches, please
> use the RFC tag.
> 
It is not missing. They are optional, which is why it is not needed in this series.

> > The device context will be extended post the first series.
> >
> > > And what's more, how can it handle the migration compatibility?
> > It will be taken care in follow on as we all know that this to be checked.
> 
> You don't even mention it anywhere in your series.
> 
Migration compatibility is topic in itself regardless of device migration series.
It is part of the feature provisioning phase needed regardless.
Like how you and Lingshan wanted to keep the suspend bit series small and logical, device migration series is also logically split for the functionality.
I donât see a need to mention the long known missing functionality and common to both approaches.

> > I will include the notes of future follow up work items in v1, which will be
> taken care post this series.
> >
> > > > > Dirty page tracking in virtio is not a must for live migration
> > > > > to work. It can be done via platform facilities or even
> > > > > software. And to make it more efficient, it needs to utilize
> > > > > transport facilities instead of a
> > > general one.
> > > > >
> > > > It is also optional in the spec proposal.
> > > > Most platforms claimed are not able to do efficiently either,
> > >
> > > Most platforms are working towards an efficient way. But we are
> > > talking about different things, hardware based dirty page logging is
> > > not a must, that is what I'm saying. For example, KVM doesn't use hardware
> to log dirty pages.
> > >
> > I also said same, that hw based dirty page logging is not must. :) One
> > day hw mmu will be able to track everything efficiently. I have not seen it
> happening yet.
> 
> How do you define efficiency? KVM uses page fault and most modern IOMMU
> support PRI now.
>
One cannot define PRI as mandatory feature. In our research and experiments we see that PRI is significantly slower to handle page faults.
Yet different topic...

Efficiency is defined by the downtime of the multiple devices in a VM.
And leading OS allowed device advancements by allowing device to report dirty pages in cpu and platform agnostic way...

One can use post-copy approach as well, current device migration is around established pre-copy approach.

> >
> > > > hence the vfio subsystem added the support for it.
> > >
> > > As an open standard, if it is designed for a specific software
> > > subsystem on a specific OS, it's a failure.
> > >
> > It is not.
> > One need accept that, in certain areas virtio is following the trails of
> advancement already done in sw stack.
> > So that virtio spec advancement fits in to supply such use cases.
> > And blocking such advancement of virtio spec to promote only_mediation
> approach is not good either.
> >
> > BTW: One can say the mediation approach is also designed for specific
> software subsystem and hence failure.
> > I will stay away from quoting it, as I donât see it this way.
> 
> The proposal is based on well known technology since the birth of virtualization.
Sure, but that does not change the fact that such series is also targeted for a specific software subsystem..
And hence failure.

I didnât say that, I said the opposite that yes, since the virtio is in catch up mode, it is defining the interface so that it can fit into these OS platforms.
Mostly multiple of them, who all support passthrough devices.

> I never knew a mainstream hypervisor that doesn't do trap and emulate, did
> you?
> 
It does trap and emulation for PCI config space, not for virtio interfaces like queues, config space and more for passthrough devices.

> >
> > > >
> > > > > The FLR, P2P demonstrates the fragility of a simple passthrough
> > > > > method and how it conflicts with live migration and complicates
> > > > > the device
> > > implementation.
> > > > Huh, it shows the opposite.
> > > > It shows that both will seamlessly work.
> > >
> > > Have you even tried your proposal with a prototype device?
> > Of course, it is delivered to user for 1.5 years ago before bringing it to the
> spec with virtio-net and virtio-blk devices.
> 
> I hope this is your serious answer, but it looks like it is not. Your proposal misses
> a lot of states as I pointed out in another thread, how can it work in fact?
> 
Which states?
What is posted in series [1] is minimal and base required items, optional one is omitted as it can be done incrementally.
Lingshan had hard time digesting the basics of P2P and dirty page tracking work in this short series.
So there is no point in pushing large part of the device context and making the series blurry.
It will be done incrementally subsequently.

> > > >
> > > > > And it means you need to audit all PCI features and do
> > > > > workaround if there're any possible issues (or using a whitelist).
> > > > No need for any of this.
> > >
> > > You need to prove this otherwise it's fragile. It's the duty of the
> > > author to justify not the reviewer.
> > >
> > One cannot post patches and nor review giant series in one go.
> > Hence the work to be split on a logical boundary.
> > Features provisioning, pci layout etc is secondary tasks to take care of.
> 
> Again, if you know something is missing, you need to explain it in the series
> instead of waiting for some reviewers to point it out and say it's well-known
> afterwards.
> 
The patch set cannot be a laundry list of items missing in virtio spec.
It is short and focused on the device migration.

> >
> > > For example FLR is required to be done in 100ms. How could you
> > > achieve this during the live migration? How does it affect the downtime and
> FRS?
> > >
> > Good technical question to discuss instead of passthrough vs
> > mediation. :)
> >
> > Device administration work is separate from the device operational part.
> > The device context records what is the current device context, when the FLR
> occurs, the device stops all the operations.
> > And on next read of the device context the FLRed context is returned.
> 
> Firstly, you didn't explain how it affects the live migration, for example, what
> happens if we try to migrate while FLR is ongoing.
> Secondly, you ignore the other two questions.
> 
> Let's save the time of both.
> 
There is nothing to explain about device reset and live migration, because there is absolutely there is no touch points.
device_status is just another registers like rest of them.
One does not need to poke around registers when doing passthrough.

> >
> > > >
> > > > > This is tricky and we are migrating virtio not virtio-pci. If we
> > > > > don't use simple passthrough we don't need to care about this.
> > > > >
> > > > Exactly, we are migrating virtio device for the PCI transport.
> > >
> > > No, the migration facility is a general requirement for all transport.
> > It is for all transport. One can extend when do for MMIO.
> 
> By using admin commands? It can not perform well for registered.
> 
Yes, admin commands using AQ on MMIO based owner device will also be just fine.

> >
> > > Starting from a PCI specific (actually your proposal does not even
> > > cover all even for PCI) solution which may easily end up with issues in other
> transports.
> > >
> > Like?
> 
> The admin command/virtqueue itself may not work well for other transport.
> That's the drawback of your proposal while this proposal doesn't do any
> coupling.
> 
There is no coupling in the spec of admin command with virtqueue as Michael consistently insisted.
And in my proposal also there is no such coupling.

> >
> > > Even if you want to migrate virtio for PCI,  please at least read
> > > Qemu migration codes for virtio and PCI, then you will soon realize
> > > that a lot of things are missing in your proposal.
> > >
> > Device context is something that will be extended.
> > VIRTIO_PCI_CAP_PCI_CFG will also be added as optional item for PCI
> transport.
> 
> This is just one mini stuff, how about PCI config space and others?
> 
No need to migrate the PCI config space, because migration is of the virtio device, and not the underlying transport.
Therefore, one can migrate from virtio member device to a fully software based device as well and vis versa.

> Again, please read Qemu codes, a lot of things are missing in your proposal
> now. If everything is fine to do passthrough based live migration, I'm pretty sure
> you need more than what Qemu has since it can only do a small fraction of the
> whole PCI.
> 
I will read.
Many of the pieces may be implemented by the device over time following the charter.

> >
> > > > As usual, if you have to keep arguing about not doing
> > > > passhthrough, we are
> > > surely past that point.
> > >
> > > Who is "we"?
> > >
> > We = You and me.
> > From 2021, you keep objecting that passthrough must not be done.
> 
> This is a big misunderstanding, you need to justify it or at least address the
> concerns from any reviewer.
> 
They are getting addressed, if you have comments, please post those comments in the actual series.
I wouldnât diverge to discuss in different series here.

> > And blocking the work done by other technical committee members to
> improve the virtio spec to make that happen is simply wrong.
> 
> It's unrealistic to think that one will be 100% correct. Justify your proposal or
> why I was wrong instead of ignoring my questions and complaining. That is why
> we need a community. If it doesn't work, virtio provides another process for
> convergence.
> 
I am not expecting you to be correct at all. I totally agree that you may miss something, I may miss something.
And this is why I repeatedly, humbly ask to converge and jointly address the passthrough mode without trap+emulation method.
The way I understood from your comment is, passthrough for hw based device must not be done and multiple of hw vendors disagree to it.

> >
> > > Is something like what you said here passed the vote and written to
> > > the spec?
> > Not only me.
> > The virtio technical committee has agreed for nested and hardware-based
> implementation _both_.
> >
> > " hardware-based implementations" is part of the virtio specification charter
> with ballot of [1].
> >
> > [1] https://lists.oasis-open.org/archives/virtio/202104/msg00038.html
> 
> Let's don't do conceptual shifts, I was asking the passthrough but you give me
> the hardware implementation.
> 
Passthrough devices implemented by hw which does dirty tracking and following the spec.

> >
> > And passthrough hardware-based device is in the charter that we strive to
> support.
> >
> > > We all know the current virtio spec is not built upon passthrough.
> >
> > This efforts improve the passthrough hw based implementation that should
> not be blocked.
> 
> Your proposal was posted only for several days and you think I would block that
> just because I asked several questions and some of them are not answered?
> 
If I misunderstood, then I am sorry.
Lets progress and improve the passthrough use case without trap+emulation.
Trap+emulation=mediation is also a valid solution for nested case.
And I frankly see a need for both as both are solving a different problem.
Trap+emulation cannot achieve passthrough mode, hence my request was not to step on each other.

When both can use the common infra, it is good to do that, when they cannot, due to the technical challenges of underlying transport, they should evolve differently.

> >
> > > > Virtio does not need to stay in the weird umbrella to always mediate etc.
> > >
> > > It's not the mediation, we're not doing vDPA, the device model we
> > > had in hardware and we present to guests are all virtio devices.
> > > It's the trap and emulation which is fundamental in the world of
> > > virtualization for the past decades. It's the model we used to
> > > virtualize standard devices. If you want to debate this methodology, virtio
> community is clearly the wrong forum.
> > >
> > I am not debating it at all. You keep bringing up the point of mediation.
> >
> > The proposal of [1] is clear that wants to do hardware based passthrough
> devices with least amount of virtio level mediation.
> >
> > So somewhere mode of virtualizing has been used, thatâs fine, it can
> > continue with full virtualization, mediation,
> >
> > And also hardware based passthrough device.
> >
> > > >
> > > > Series [1] will be enhanced further to support virtio passthrough
> > > > device for
> > > device context and more.
> > > > Even further we like to extend the support.
> > > >
> > > > > Since the functionality proposed in this series focus on the
> > > > > minimal set of the functionality for migration, it is virtio
> > > > > specific and self contained so nothing special is required to work in the
> nest.
> > > >
> > > > Maybe it is.
> > > >
> > > > Again, I repeat and like to converge the admin commands between
> > > passthrough and non-passthrough cases.
> > >
> > > You need to prove at least that your proposal can work for the
> > > passthrough before we can try to converge.
> > >
> > What do you mean by "prove"? virtio specification development is not proof
> based method.
> 
> For example, several of my questions were ignored.
> 
I didnât ignore, but if I miss, I will answer.

> >
> > If you want to participate, please review the patches and help community to
> improve.
> 
> See above.
> 
> >
> > > > If we can converge it is good.
> > > > If not both modes can expand.
> > > > It is not either or as use cases are different.
> > >
> > > Admin commands are not the cure for all, I've stated drawbacks in
> > > other threads. Not repeating it again here.
> > He he, sure, I am not attempting to cure all.
> > One solution does not fit all cases.
> 
> Then why do you want to couple migration with admin commands?
> 
Because of following.
1. A device migration needs to bulk data transfer, this is something cannot be done with tiny registers.
Cannot be done through registers, because
a. registers are slow for bidirectional communication
b. do not scale well with scale of VFs

> > Admin commands are used to solve the specific problem for which the AQ is
> designed for.
> >
> > One can make argument saying take pci fabric to 10 km distance, donât bring
> new virtio tcp transport...
> >
> > Drawing boundaries around virtio spec in certain way only makes it further
> inferior. So please do not block advancements bring in [1].
> 
> As a reviewer, I ask questions but some of them are ignored, do you expect the
> reviewer to figure out by themselves?  
Sure, please review.

Many of them were not questions, but assertion and conclusions that it does not fit nested.. and sub-optional etc.

> 
> > We really would like to make it more robust with your rich experience and
> inputs, if you care to participate.
> 
> We can collaborate for sure: as I pointed out in another threads, from what I
> can see from the both proposals of the current version:
> 
> I see a good opportunity to build your admin commands proposal on top of this
> proposal. Or it means, we can focus on what needs to be migrated first:
> 
> 1) queue state
This is just one small part of the device context
So once a device context is read/written, it covers q.

> 2) inflight descriptors
Same a q state, it is part of the device context.

> 3) dirty pages (optional)
> 4) device state(context) (optional)
>
It is same as #1 and #2.
Splitting them from #1 and #2 is not needed.

We can extend the device context to be selectively queried for nested case..
 
> I'd leave 3 or 4 since they are very complicated features. Then we can invent an
> interface to access those facilities? This is how this series is structured.
> 
> And what's more, admin commands or transport specific interfaces. And when
> we invent admin commands, you may realize you are inventing a new transport
> which is the idea of transport via admin commands.

Not really. it is not a new transport at all.
I explained you before when you quote is as transport, it must carry the driver notifications as well..
Otherwise it is just set of commands..

The new commands are self contained anyway of [1].

[1] 20230909142911.524407-1-parav@nvidia.com/T/#t">https://lore.kernel.org/virtio-comment/20230909142911.524407-1-parav@nvidia.com/T/#t


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]