OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE


> From: Jason Wang <jasowang@redhat.com>
> Sent: Tuesday, September 19, 2023 10:06 AM

> > Current spec is not the steering point to define new methods.
> > So we will build the spec infra to support passthrough.
> >
> 
> Passthrough migration actually, passthrough is already supported now.
Yes, device migration basic facility for passthrough devices.
My series adds basic facility section extension.

> 
> > Mediation/trap-emulation where hypervisor is involved is also second use
> case that you are addressing.
> >
> > And hence, both are not mutually exclusive.
> > Hence we should not debate that anymore.
> >
> > >
> > > > And for sure virtio do not need to live in the dark shadow of mediation
> always.
> > >
> > > 99% of virtio devices are implemented in this way (which is what you
> > > call dark and shadow) now.
> > >
> > What I am saying is one should not say mediation/trap-emulation is the only
> way for virtio.
> 
> Then using things like "dark shadow" is not fair.
I apologize. Lets work towards supporting device migration for passthrough as well.
The comments I hear from you hints that virtio must live its life through mediation.

> 
> > So let passthrough device migration to progress.
> 
> Then you need to answer or address the concerns.
> 
Sure. Will do once I receive the comments in the patches of [1].

[1] 20230909142911.524407-4-parav@nvidia.com/T/#md9fcfa1ba997463de8c7fb8c6d1786b224b0bead">https://lore.kernel.org/virtio-comment/20230909142911.524407-4-parav@nvidia.com/T/#md9fcfa1ba997463de8c7fb8c6d1786b224b0bead

> >
> > > > For nesting use case sure one can do mediation related mode.
> > > >
> > > > So only mediation is not the direction.
> > >
> > > CPU and MMU virtualization were all built in this way.
> > >
> > Not anymore. Both of them have vcpus and viommu where may things are not
> trapped.
> 
> We are talking about different things. I'm saying trap is a must but you say not
> all are trapped.
> 
To be clear, all I am saying is virtio interface level trap is not must to achieve device migration.

> > So as I said both has pros and cons and users will pick what fits their need and
> use case.
> >
> > > >
> > > > > > So for such N and M being > 1, one can use software base
> > > > > > emulation
> > > anyway.
> > > > >
> > > > > No, only the control path is trapped, the datapath is still passthrough.
> > > > >
> > > > Again, it depends on the use case.
> > >
> > > No matter what use case, the definition and methodology of
> > > virtualization stands still.
> > >
> > I will stop debating this because the core technical question is not answered.
> > I donât see a technology available that virtio can utilize to it.
> > That is interface that can work without messing with device status and flr
> while device migration is ongoing.
> 
> Again, you need to justify it. For example, why does it mess up device status?
With FLR and device reset in place, many things like queues, device registers etc do not work.
Because they are reset while migration is going on, and inflight descriptors info is lost.

> Why is rest ok but not suspending?
> 
Rest is not ok either to me.
I am not suggesting trapping CVQ or AQ or other virtio registers either.

> At least so far, I don't see good answers for thoses.
> 
> > Hence, methodology for passthrough and mediation/trap-emulation is
> fundamentally different.
> > And that is just fine.
> >
> > > >
> > > > > >
> > > > > > >
> > > > > > > And exposing the whole device to the guest drivers will have
> > > > > > > security implications, your proposal has demonstrated that
> > > > > > > you need a workaround for
> > > > > > There is no security implications in passthrough.
> > > > >
> > > > > How can you prove this or is it even possible for you to prove this?
> > > > Huh, when you claim that it is not secure, please point out
> > > > exactly what is not
> > > secure.
> > > > Please take with PCI SIG and file CVE to PCI sig.
> > >
> > > I am saying it has security implications. That is why you need to
> > > explain why you think it doesn't. What's more, the implications are
> > > obviously nothing related to PCI SIG but a vendor virtio hardware
> implementation.
> > >
> > PCI passthough for virtio member devices and non virtio devices with P2P, and
> their interaction is already there in the VM.
> > Device migration is not adding/removing anything, nor touching any security
> aspect of it.
> > Because it does not need to it either.
> > Device migration is making sure that it continue to exists.
> 
> Since we are discussing in the virtio community, what we care about is the
> chance that guest(driver) can explore device security vulnerabilities. In this
> context, exposing more means the increasing of the attacking surfaces since we
> (cloud vendor) can't control guests but the hypervisor.
> 
Guest (driver) can explore device security vulnerabilities in many areas not just device status and cvq.
So it is not good answer to me.

> >
> > > >
> > > > > You expose all device details to guests (especially the
> > > > > transport specific details), the attack surface is increased in this way.
> > > > One can say it is the opposite.
> > > > Attack surface is increased in hypervisor due to mediation poking
> > > > at
> > > everything controlled by the guest.
> > > >
> > >
> > > We all know such a stack has been widely used for decades. But you
> > > want to say your new stack is much more secure than this?
> > >
> > It can be yes, because it exposes all necessary things defined in the virtio spec
> boundary today.
> > And not involving hypervisor in core device operation.
> 
> That's perfectly fine if we can do this. But you need to justify this.
> 
We are not inventing any new things here. As you acknowledged passthtrough devices are already there...
> >
> > > >
> > > > >
> > > > > What's more, a simple passthrough may lose the chance to
> > > > > workaround hardware erratas and you will finally get back to the trap
> and emulation.
> > > > Hardware errata's is not the starting point to build the software
> > > > stack and
> > > spec.
> > >
> > > It's not the starting point. But it's definitely something that
> > > needs to be considered, go and see kernel codes (especially the KVM
> > > part) and you will get the answer.
> > >
> > There are kernels which cannot be updated in field today in Nvidia cloud
> shipped by Redhat's OS variant.
> >
> > So it is invalid assumption that somehow data path does not have bug, but
> large part of the control plane has bug, hence it should be done in software...
> 
> Well, for sure there are cases that can't be worked around. But for the case that
> it can, trap and emulation gives much more flexibility.
> 
It can. So it is not your or mine decision.
The user will pick what they want to use.
So it is invalid assumption and hence invalid point to discuss.

> >
> > > > What you imply is, one must never use vfio stack, one must not use
> > > > vcpu
> > > acceleration and everything must be emulated.
> > >
> > > Do I say so? Trap and emulation is the common methodology used in
> > > KVM and VFIO. And if you want to replace it with a complete
> > > passthrough, you need to prove your method can work.
> > >
> > Please review patches. I do not plan to _replace_ is either.
> 
> You define all the migration stuffs in the admin commands section, isn't this an
> implicit coupling?
> 
RSS is done in receive packet section. Is this coupling RSS with receive q? Yes, because it is meant for it.

You are questioning, 
How one can receive packets and post descriptors without virtqueues? Oh, descriptors are implicitly tied to virtqueues.. too bad..

If above mechanism looks coupling, than it is one connection.
May be there is another use case and more efficicent way without admin commands, I didnât hear it so far.

I donât see a point of writing some non-practical abstract spec.

> > Those users who want to use passthrough, can use passthrough with major
> traps+emulation on FLR, device_status, cvq, avq and without implementing AQ
> on every single member device.
> > And those users who prefer trap+emualation can use that.
> >
> > > >
> > > > Same argument of hardware errata applied to data path too.
> > >
> > > Anything makes datapath different? Xen used to fallback to shadow
> > > page tables to workaround hardware TDP errata in the past.
> > >
> > > > One should not implement in hw...
> > > >
> > > > I disagree with such argument.
> > >
> > > It's not my argument.
> > >
> > You claimed that to overcome hw errata, one should use trap_emulation,
> somehow only for portion of the functionality.
> > And rest portion of the functionality does not have hw errata, hence
> > hw should be use (for example for data path). :)
> 
> I've explained before, we all know there're errata that can't be a workaround in
> any way.
> 
No point in discussing hw errata. As it is not the goal or anti-goal for trap+emulation or passthrough either.

> >
> > > >
> > > > You can say nesting is requirement for some use cases, so spec
> > > > should support
> > > it without blocking the passthrough mode.
> > > > Then it is fair discussion.
> > > >
> > > > I will not debate further on passthrough vs control path mediation
> > > > as
> > > either_or approach.
> > > >
> > > > >
> > > > > >
> > > > > > > FLR at least.
> > > > > > It is actually the opposite.
> > > > > > FLR is supported with the proposal without any workarounds and
> > > mediation.
> > > > >
> > > > > It's an obvious drawback but not an advantage. And it's not a
> > > > > must for live migration to work. You need to prove the FLR
> > > > > doesn't conflict with the live migration, and it's not only FLR
> > > > > but also all the other
> > > PCI facilities.
> > > > I donât know what you mean by prove. It is already clear from the
> > > > proposal
> > > FLR is not messing with rest of the device migration infrastructure.
> > > > You should read [1].
> > >
> > > I don't think you answered my question in that thread.
> > >
> > Please ask the question in that series if any, because there is no FLR, device
> reset interaction in passthrough between owner and member device.
> >
> > > >
> > > > > one other
> > > > > example is P2P and what's the next? As more features were added
> > > > > to the PCI spec, you will have endless work in auditing the
> > > > > possible conflict with the passthrough based live migration.
> > > > >
> > > > This drawback equally applies to mediation route where one need to
> > > > do more
> > > than audit where the mediation layer to be extended.
> > >
> > > No, for trap and emulation we don't need to do that. We only do
> > > datapath assignments.
> > >
> > It is required, because also such paths to be audited and extended as without
> it the feature does not visible to the guest.
> 
> You need first answer the following questions:
> 
> 1) Why FLR is a must for the guest
Because passthrough device guest does it.

> 2) What's wrong with the current Qemu emulation of FLR for virtio-pci device
> 
I am not sure QEMU discussion is relevant here.

But in general, when passthrough device is given to an example software like QEMU, it will not bisect the FLR differently from other VF.

> >
> > > > So each method has its pros and cons. One suits one use case,
> > > > other suits
> > > other use case.
> > > > Therefore, again attempting to claim that only mediation approach
> > > > is the only
> > > way to progress is incorrect.
> > >
> > > I never say things like this, it is your proposal that mandates
> > > migration with admin commands. Could you please read what is
> > > proposed in this series carefully?
> > >
> > Admin commands are split from the AQ so one can use the admin commands
> inband as well.
> 
> How can it? It couples a lot of concepts like group, owner and members. All of
> these have only existed in SR-IOV so far.
> 
It is drafted nicely by Michael that admin commands has one transport as AQ.
It can be extended for MMIO when one needs it, I will let it to the creativity of MMIO supporter to have admin queue on MMIO device like other virtqueues.

So far no one seems interested in to extend MMIO other than theoretical questions.

> I don't know how to define those for MMIO where the design wants to be as
> simple as possible.
> 
What prevents MMIO to have AQ?

> > Though, I donât see how it can functionality work without mediation.
> > This is the key technical difference between two approaches.
> >
> > > On top of this series, you can build your amd commands easily. But
> > > there's nothing that can be done on top of your proposal.
> > >
> > I donât see what more to be done on top of our proposal.
> 
> Actually it really has one, that is moving the description/definition of those
> states to the basc facility part. But if we do this, why not do it from the start?
> This is exactly what Lingshan's proposal did.
> 
Device context is defined in basic facility section. It is under admin commands because reading something large without command doesn't seem possible.
Lingshan is not showing how a giant RSS context, many other fields of the device can be read/written during device migration flow.
So its incomplete work, which is covered in [1] using admin commands.

[1] 20230909142911.524407-4-parav@nvidia.com/T/#md9fcfa1ba997463de8c7fb8c6d1786b224b0bead">https://lore.kernel.org/virtio-comment/20230909142911.524407-4-parav@nvidia.com/T/#md9fcfa1ba997463de8c7fb8c6d1786b224b0bead.

> > If you hint nesting, than it can be done through a peer admin device to delete
> such admin role.
> >
> > > >
> > > > In fact audit is still better than mediation because most audits
> > > > are read only
> > > work as opposed to endlessly extending trapping and adding support
> > > in core stack.
> > >
> > > One reality that you constantly ignore is that such trapping and
> > > device models have been widely used by a lot of cloud vendors for more
> than a decade.
> > >
> > It may be but, it is not the only option.
> 
> I don't say it's the only option. If most of the devices were built in this way, we
> should first allow any new function to be available to those devices and then
> consider other cases. Inventing a mechanism that can't work for most of the
> existing devices is sub-optimal.
> 
I donât agree to "first", "cant work" and "sub-optimal".

It seems to work for more than one vendor.
Proposal [1] should work for passthrough devices that users are using.
[1] 20230909142911.524407-4-parav@nvidia.com/T/#md9fcfa1ba997463de8c7fb8c6d1786b224b0bead">https://lore.kernel.org/virtio-comment/20230909142911.524407-4-parav@nvidia.com/T/#md9fcfa1ba997463de8c7fb8c6d1786b224b0bead

> >
> > > > Again, it is a choice that user make with the tradeoff.
> > > >
> > > > > >
> > > > > > >
> > > > > > > For non standard device we don't have choices other than
> > > > > > > passthrough, but for standard devices we have other choices.
> > > > > >
> > > > > > Passthrough is basic requirement that we will be fulfilling.
> > > > >
> > > > > It has several drawbacks that I would not like to repeat. We all
> > > > > know even for VFIO, it requires a trap instead of a complete
> passthrough.
> > > > >
> > > > Sure. Both has pros and cons.
> > > > And both can co-exist.
> > >
> > > I don't see how it can co-exist with your proposal. I can see how
> > > admin commands can co-exist on top of this series.
> > >
> > The reason to me both has difficulty is because both are solving different
> problem.
> > And they can co-exist as two different methods to two different problems.
> 
> It's not hard to demonstrate how admin commands can be built on top.
> 
I donât see a reason why it should be on top.

Look, if you have _real_ interest in both use cases to utilize lets work toward such definition.
If you donât have interest, and I donât see a point of objecting and pointing figure at using trap+emulation.

> >
> > > >
> > > > > > If one wants to do special nesting, may be, there.
> > > > >
> > > > > Nesting is not special. Go and see how it is supported by major
> > > > > cloud vendors and you will get the answer. Introducing an
> > > > > interface in virtio that is hard to be virtualized is even worse
> > > > > than writing a compiler that can not do bootstrap compilation.
> > > > We checked with more than two major cloud vendors and passthrough
> > > > suffice
> > > their use cases and they are not doing nesting.
> > > > And other virtio vendor would also like to support native devices.
> > > > So again,
> > > please do not portray that nesting is the only thing and passthrough
> > > must not be done.
> > >
> > > Where do I say passthrough must not be done? I'm saying you need to
> > > justify your proposal instead of simply saying "hey, you are wrong".
> > >
> > I never said you are wrong. I replied to Lingshan that resuming/suspending
> queues after the device is suspended, is wrong, and it should not be done.
> >
> > > Again, nesting is not the only issue, the key point is that it's
> > > partial and not self contained.
> >
> > Admin commands are self-contained to the owner device.
> > They are not self contained in the member device, because it cannot be.
> 
> There're cases that self contained is not required for example the provisioning.
> Admin commands/queues fit perfectly there.
And it is not limited to it.

> 
> > Self containment cannot work with device reset, flr, dma flow.
> 
> How do you define self containment? We all know that virtio can't fly without
> transporting specific things ...
> 
Self containment is only member device drives following without need of owner or peer device.
1. device reset
2. FLR
3. dirty page tracking
4. device context read + write

> For the context of "self contain" I mean the basic virtio facility needs to be self
> contained.
> 
> > Self containment requires mediation or renamed trap+emulation; which is the
> anti-goal of passtrough.
> > And I am very interested if you can show how admin commands can work
> with device reset, flr flow WITHOUT mediation approach.
> 
> Why is it the job for me? This proposal doesn't use admin commands at all.
Because few days back Lingshan claimed that he wants to see both needs must be addressed somehow
And you claimed that basic facility that you build in this patch can work for _any_ software, not some specific software (otherwise you quoted it as failure).


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]