OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE


> From: Jason Wang <jasowang@redhat.com>
> Sent: Thursday, September 14, 2023 8:41 AM
> 
> On Wed, Sep 13, 2023 at 2:06âPM Parav Pandit <parav@nvidia.com> wrote:
> >
> >
> > > From: Jason Wang <jasowang@redhat.com>
> > > Sent: Wednesday, September 13, 2023 10:14 AM
> > > To: Parav Pandit <parav@nvidia.com>
> >
> > > > One can build infinite level of nesting to not do passthrough, at
> > > > the end user
> > > applications remains slow.
> > >
> > > We are talking about nested virtualization but nested emulation. I
> > > won't repeat the definition of virtualization but no matter how much
> > > level of nesting, the hypervisor will try hard to let the
> > > application run natively for most of the time, otherwise it's not the nested
> virtualization at all.
> > >
> > > Nested virtualization has been supported by all major cloud vendors,
> > > please read the relevant documentation for the performance
> > > implications. Virtio community is not the correct place to debate
> > > whether a nest is useful. We need to make sure the datapath could be
> > > assigned to any nest layers without losing any fundamental facilities like
> migration.
> > >
> > I am not debating. You or Lingshan claim or imply that mediation is the only
> way to progress.
> 
> Let me correct your temiology again. It's "trap and emulation" . It means the
> workload runs mostly native but sometimes is trapped by the hypervisor.
>
 
> And it's not the only way. It's the start point since all current virtio spec is built
> upon this methodology.
Current spec is not the steering point to define new methods.
So we will build the spec infra to support passthrough.

Mediation/trap-emulation where hypervisor is involved is also second use case that you are addressing.

And hence, both are not mutually exclusive.
Hence we should not debate that anymore.

> 
> > And for sure virtio do not need to live in the dark shadow of mediation always.
> 
> 99% of virtio devices are implemented in this way (which is what you call dark
> and shadow) now.
> 
What I am saying is one should not say mediation/trap-emulation is the only way for virtio.
So let passthrough device migration to progress.

> > For nesting use case sure one can do mediation related mode.
> >
> > So only mediation is not the direction.
> 
> CPU and MMU virtualization were all built in this way.
> 
Not anymore. Both of them have vcpus and viommu where may things are not trapped.
So as I said both has pros and cons and users will pick what fits their need and use case.

> >
> > > > So for such N and M being > 1, one can use software base emulation
> anyway.
> > >
> > > No, only the control path is trapped, the datapath is still passthrough.
> > >
> > Again, it depends on the use case.
> 
> No matter what use case, the definition and methodology of virtualization
> stands still.
> 
I will stop debating this because the core technical question is not answered.
I donât see a technology available that virtio can utilize to it.
That is interface that can work without messing with device status and flr while device migration is ongoing.
Hence, methodology for passthrough and mediation/trap-emulation is fundamentally different.
And that is just fine.

> >
> > > >
> > > > >
> > > > > And exposing the whole device to the guest drivers will have
> > > > > security implications, your proposal has demonstrated that you
> > > > > need a workaround for
> > > > There is no security implications in passthrough.
> > >
> > > How can you prove this or is it even possible for you to prove this?
> > Huh, when you claim that it is not secure, please point out exactly what is not
> secure.
> > Please take with PCI SIG and file CVE to PCI sig.
> 
> I am saying it has security implications. That is why you need to explain why you
> think it doesn't. What's more, the implications are obviously nothing related to
> PCI SIG but a vendor virtio hardware implementation.
> 
PCI passthough for virtio member devices and non virtio devices with P2P, and their interaction is already there in the VM.
Device migration is not adding/removing anything, nor touching any security aspect of it.
Because it does not need to it either.
Device migration is making sure that it continue to exists.

> >
> > > You expose all device details to guests (especially the transport
> > > specific details), the attack surface is increased in this way.
> > One can say it is the opposite.
> > Attack surface is increased in hypervisor due to mediation poking at
> everything controlled by the guest.
> >
> 
> We all know such a stack has been widely used for decades. But you want to say
> your new stack is much more secure than this?
> 
It can be yes, because it exposes all necessary things defined in the virtio spec boundary today.
And not involving hypervisor in core device operation.

> >
> > >
> > > What's more, a simple passthrough may lose the chance to workaround
> > > hardware erratas and you will finally get back to the trap and emulation.
> > Hardware errata's is not the starting point to build the software stack and
> spec.
> 
> It's not the starting point. But it's definitely something that needs to be
> considered, go and see kernel codes (especially the KVM part) and you will get
> the answer.
> 
There are kernels which cannot be updated in field today in Nvidia cloud shipped by Redhat's OS variant.

So it is invalid assumption that somehow data path does not have bug, but large part of the control plane has bug, hence it should be done in software...

> > What you imply is, one must never use vfio stack, one must not use vcpu
> acceleration and everything must be emulated.
> 
> Do I say so? Trap and emulation is the common methodology used in KVM and
> VFIO. And if you want to replace it with a complete passthrough, you need to
> prove your method can work.
> 
Please review patches. I do not plan to _replace_ is either.
Those users who want to use passthrough, can use passthrough with major traps+emulation on FLR, device_status, cvq, avq and without implementing AQ on every single member device.
And those users who prefer trap+emualation can use that.

> >
> > Same argument of hardware errata applied to data path too.
> 
> Anything makes datapath different? Xen used to fallback to shadow page tables
> to workaround hardware TDP errata in the past.
> 
> > One should not implement in hw...
> >
> > I disagree with such argument.
> 
> It's not my argument.
> 
You claimed that to overcome hw errata, one should use trap_emulation, somehow only for portion of the functionality.
And rest portion of the functionality does not have hw errata, hence hw should be use (for example for data path). :)

> >
> > You can say nesting is requirement for some use cases, so spec should support
> it without blocking the passthrough mode.
> > Then it is fair discussion.
> >
> > I will not debate further on passthrough vs control path mediation as
> either_or approach.
> >
> > >
> > > >
> > > > > FLR at least.
> > > > It is actually the opposite.
> > > > FLR is supported with the proposal without any workarounds and
> mediation.
> > >
> > > It's an obvious drawback but not an advantage. And it's not a must
> > > for live migration to work. You need to prove the FLR doesn't
> > > conflict with the live migration, and it's not only FLR but also all the other
> PCI facilities.
> > I donât know what you mean by prove. It is already clear from the proposal
> FLR is not messing with rest of the device migration infrastructure.
> > You should read [1].
> 
> I don't think you answered my question in that thread.
> 
Please ask the question in that series if any, because there is no FLR, device reset interaction in passthrough between owner and member device.

> >
> > > one other
> > > example is P2P and what's the next? As more features were added to
> > > the PCI spec, you will have endless work in auditing the possible
> > > conflict with the passthrough based live migration.
> > >
> > This drawback equally applies to mediation route where one need to do more
> than audit where the mediation layer to be extended.
> 
> No, for trap and emulation we don't need to do that. We only do datapath
> assignments.
> 
It is required, because also such paths to be audited and extended as without it the feature does not visible to the guest.

> > So each method has its pros and cons. One suits one use case, other suits
> other use case.
> > Therefore, again attempting to claim that only mediation approach is the only
> way to progress is incorrect.
> 
> I never say things like this, it is your proposal that mandates migration with
> admin commands. Could you please read what is proposed in this series
> carefully?
> 
Admin commands are split from the AQ so one can use the admin commands inband as well.
Though, I donât see how it can functionality work without mediation.
This is the key technical difference between two approaches.

> On top of this series, you can build your amd commands easily. But there's
> nothing that can be done on top of your proposal.
> 
I donât see what more to be done on top of our proposal.
If you hint nesting, than it can be done through a peer admin device to delete such admin role.

> >
> > In fact audit is still better than mediation because most audits are read only
> work as opposed to endlessly extending trapping and adding support in core
> stack.
> 
> One reality that you constantly ignore is that such trapping and device models
> have been widely used by a lot of cloud vendors for more than a decade.
> 
It may be but, it is not the only option.

> > Again, it is a choice that user make with the tradeoff.
> >
> > > >
> > > > >
> > > > > For non standard device we don't have choices other than
> > > > > passthrough, but for standard devices we have other choices.
> > > >
> > > > Passthrough is basic requirement that we will be fulfilling.
> > >
> > > It has several drawbacks that I would not like to repeat. We all
> > > know even for VFIO, it requires a trap instead of a complete passthrough.
> > >
> > Sure. Both has pros and cons.
> > And both can co-exist.
> 
> I don't see how it can co-exist with your proposal. I can see how admin
> commands can co-exist on top of this series.
> 
The reason to me both has difficulty is because both are solving different problem.
And they can co-exist as two different methods to two different problems.

> >
> > > > If one wants to do special nesting, may be, there.
> > >
> > > Nesting is not special. Go and see how it is supported by major
> > > cloud vendors and you will get the answer. Introducing an interface
> > > in virtio that is hard to be virtualized is even worse than writing
> > > a compiler that can not do bootstrap compilation.
> > We checked with more than two major cloud vendors and passthrough suffice
> their use cases and they are not doing nesting.
> > And other virtio vendor would also like to support native devices. So again,
> please do not portray that nesting is the only thing and passthrough must not be
> done.
> 
> Where do I say passthrough must not be done? I'm saying you need to justify
> your proposal instead of simply saying "hey, you are wrong".
> 
I never said you are wrong. I replied to Lingshan that resuming/suspending queues after the device is suspended, is wrong, and it should not be done.

> Again, nesting is not the only issue, the key point is that it's partial and not self
> contained.

Admin commands are self-contained to the owner device.
They are not self contained in the member device, because it cannot be. Self containment cannot work with device reset, flr, dma flow.
Self containment requires mediation or renamed trap+emulation; which is the anti-goal of passtrough.
And I am very interested if you can show how admin commands can work with device reset, flr flow WITHOUT mediation approach.
Lingshan so far didnât answer this.


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]