OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [virtio-comment] RE: [PATCH V2 3/6] virtio: dont reset vqs when SUSPEND


On Fri, Nov 24, 2023 at 5:05âPM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Fri, Nov 24, 2023 at 11:35:59AM +0800, Jason Wang wrote:
> > > > > > > I donât see a need for hypervisor to reset the device in passthrough
> > > > mode.
> > > > > > Can you explain why is it needed?
> > > > > >
> > > > > > Qemu has a command "system_reset".
> > > > > >
> > > > > I mean, what does this translate to reset the device in passthrough mode?
> > > >
> > > > It needs to reset the virtio device.
> > > >
> > > > > If this is FLR, it is there.
> > > >
> > > > Please explain how it works. (It's not only a FLR, it also need virtio level reset)
> > > >
> > > FLR obviously covers the virtio level reset as FLR covers the PCI + virtio reset.
> >
> > Which part of the spec says this? And at least Qemu is not implemented
> > in this way.
>
> PCI spec says this.
> And yes I believe qemu will fully reset the function
> on FLR.

Ok, I have another glance at the spec and code. It works like this.

>
>
> > And there's another conflict, you said FLR is under the control of guests ...
> >
> > >
> > > > >
> > > > > > > Do you mean, it is needed in vdpa mode? If yes, the registers are
> > > > > > > emulated
> > > > > > anyway, so why the member device's native channel cannot be used in
> > > > > > vdpa mode?
> > > > > > >
> > > > > > > > >
> > > > > > > > > > > For large device I could imagine it could be short.
> > > > > > > > > > >
> > > > > > > > > > > In that case if there is suspend the device available, it
> > > > > > > > > > > will be used by the guest
> > > > > > > > > > driver itself, hypervisor wouldnât know about it when those
> > > > > > > > > > registers are not trapped.
> > > > > > > > > > > So we need two ways to suspend.
> > > > > > > > > > > One is guest visible, and guest controlled.
> > > > > > > > > > > Second is hypervisor control to fulfill the device migration needs.
> > > > > > > > > >
> > > > > > > > > > Can you explain why suspend is special but not reset or why
> > > > > > > > > > reset can work but not suspend? If reset can work, so does
> > > > > > > > > > suspend. If reset can't, neither does suspend.
> > > > > > > > > >
> > > > > > > > > As long as reset and suspend both are under guest control, I am fine.
> > > > > > > >
> > > > > > > > Well, you seem to ignore my question below. Hypervisor needs to
> > > > > > > > reset the device as well.
> > > > > > > >
> > > > > > > Why is it needed in passthrough mode?
> > > > > > >
> > > > > > > > >
> > > > > > > > > > For example, can you explain how a system_reset in Qemu can
> > > > > > > > > > work with your proposal?
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > So if you can please take a look if the proposed admin
> > > > > > > > > > > command to
> > > > > > > > > > freeze/stop mode can be used in the emulated register case or not.
> > > > > > > > > >
> > > > > > > > > > Again, if you design those for PCI, it's a layer violation.
> > > > > > > > > > You have answered
> > > > > > > > > They are used by the PCI layer, just like your suspend bit.
> > > > > > > > > Andy other transport can also use it.
> > > > > > > > >
> > > > > > > > > > yourself that PM is the right way to go.
> > > > > > > > > >
> > > > > > > > > > > It helps to have the suspend bit in guest control as well
> > > > > > > > > > > with/without
> > > > > > > > > > emulation mode.
> > > > > > > > > >
> > > > > > > > > > I won't repeat it again. You will find you need a full
> > > > > > > > > > transport to satisfy all the requirements.
> > > > > > > > > I disagree for full transport.
> > > > > > > >
> > > > > > > > See above and the discussion in another thread.
> > > > > > > >
> > > > > > > > > If you want to get discuss transport for sure it is some other
> > > > > > > > > thread and I want to see "driver notifications via such
> > > > > > > > > transport VQ" to fully qualify it
> > > > > > > > as transport, And that would be just sub-optimal for actual working.
> > > > > > > >
> > > > > > > > Sub-optimal since the function is duplicated with a transport
> > > > > > > > but it doesn't claim or design as a transport.
> > > > > > > >
> > > > > > > It is not sub-optimal because of duplication. It is because you
> > > > > > > want to
> > > > > > transport notifications via virtqueue.
> > > > > >
> > > > > > Have you ever read the series of tvq? You won't get this conclusion
> > > > > > if you do that.
> > > > > >
> > > > > I have read those 4 patches and I have seen that transportvq do not want to
> > > > transport notifications.
> > > > > Hence it does not qualify as transport vq.
> > > >
> > > > It exposes the platform MMIO area for driver notification. This is sufficient.
> > > > Any issue you see?
> > > Yes, the issue is, it is not transporting the driver notifications.
> > > Hence, it is not a transport virtqueue.
> >
> > Please read how driver/device notification is done in the spec for
> > existing transports.
>
> Yea Parav I don't really see what you are driving at here.
> But the bigger problem with tvq is that it was supposed to
> add a new group type and be reworked on top of admin command
> infrastructure and it never was.
>
> > >
> > > >
> > > > >
> > > > > Frankly, transport vq seems a way to formalize mediation forever in virtio.
> > > >
> > > > Nope, it can be accessed by a guest driver directly.
> > > >
> > > > > It is very weird way to build new SIOV device.
> > > > > For most things it should be the direct channel that virtio has already from
> > > > driver to the device.
> > > >
> > > > See above. SIOV might require a new transport or not.
> > > >
> > > It depends on the performance tests that Lingshan will show at scale.
> > >
> > > > >
> > > > >
> > > > > > >
> > > > > > > > > And hence, I wouldnât call it a transport anymore.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > This can also be used for debugging I think.
> > > > > > > > > > >
> > > > > > > > > > > As Michael listed, a dedicated debug interface is usually
> > > > > > > > > > > more useful instead
> > > > > > > > > > of in-band.
> > > > > > > > > >
> > > > > > > > > > Well, I've shown you the in-band facilities like debugging
> > > > > > > > > > via ethtool and kernel has a lot of other ones. If you have
> > > > > > > > > > ever tried to debug in a real production environment, you
> > > > > > > > > > will find how useful such handy information is where out-of-
> > > > > > > > > > band facilities are often dangerous
> > > > > > > > and usually prohibited or even unsupported.
> > > > > > > > > Guest driver can always read and write the device status
> > > > > > > > > without adding a
> > > > > > > > suspend bit.
> > > > > > > >
> > > > > > > > I don't get here. Suspend make sure the device state is frozen
> > > > > > > > which helps for debugging for sure.
> > > > > > > You wanted to debug some vq live, you suspend the device, the vq
> > > > > > > state got
> > > > > > changed.
> > > > > > >
> > > > > > > I just donât see that suspend is a debug tool.
> > > > > >
> > > > > > It's not a tool, it's a function that can be used as a debug tool.
> > > > > >
> > > > > > > Every feature is a debug feature literally.
> > > > > > > Classic heisenbug effect.
> > > > > > >
> > > > > > > Once can change driver notification frequency to see if interrupt
> > > > > > > rate
> > > > > > changed for debugging.
> > > > > > > One can disabled few RQs and see RSS...
> > > > > > > Blk can change blk_size to higher value to perf debug..
> > > > > > > The list continues..
> > > > > >
> > > > > > Let's not shift concepts.
> > > > > >
> > > > > Your comment to attribute device migration as debug feature is actually
> > > > shifting the concept.
> > > >
> > > > It's not.
> > > >
> > > > Ling Shan put it in the basic facilities as part of device status. You wonder why,
> > > > we explained it can be used beyond migration. You asked where, we told you
> > > > for example things like debugging. We never claim it can only be used in debug.
> > > > Then you shift the concept to say debug could be achieved by a lot of other
> > > > facilities. For sure this is correct, but it doesn't have any relationship with the
> > > > discussion here.
>
> I frankly don't see how a bit which is completely non-orthogonal with
> device and driver state can be useful for debug.

The bit is to make sure the state of a device doesn't change. It may
help or not just like if you want to pause a cpu/process during the
debug.

> For debug you want something that just always works.
> Not an interface that has so many requirements it will break if
> you look at it sidewise.
>
>
> > > >
> > > I donât see wasting time here.
> > > If its debug, its debug.
> > > If its migration, it is migration.
> > > If its pm, its pm.
> >
> > Obviously not. Migration should leverage existing facilities as much
> > as possible instead of duplicating them.
>
> I don't think there's anything obvious here.  A lot of device state
> can't be easily accessed with existing facilities.

Then we can invent new things.

> The logical
> continuation of your reasoning would be to add state introspection
> commands e.g. to cvq in virtio net and then use tricks like shadow vq to
> issue these.

For the device state, yes. Because it is device logic and it is not
what platform or transport can know.

> Yea, possible, but can we not go there please?  Nothing is
> wrong with just building commands that do exactly what we want them to
> do instead of trying to build a ship in a bottle.

But it's not the case for others.

E.g in Parav's proposal, it tries to rule P2P behaviour via virtio
admin commands when there is a duplication and a layer violation.

Thanks


>
> --
> MST
>



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]