OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-dev message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state


On Tue, Sep 26, 2023 at 11:40âAM Parav Pandit <parav@nvidia.com> wrote:
>
>
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Tuesday, September 26, 2023 8:16 AM
> >
> > On Mon, Sep 25, 2023 at 6:41âPM Parav Pandit <parav@nvidia.com> wrote:
> > >
> > >
> > >
> > > > From: Jason Wang <jasowang@redhat.com>
> > > > Sent: Friday, September 22, 2023 8:38 AM
> > >
> > > > > Device context has no overlap.
> > > >
> > > > I can give you one example, e.g debugging.
> > > >
> > > Almost every feature needs debugging. :) So I am omitting it for time
> > > being.
> >
> > Well, I don't think so. We have a lot of handy tools for that (ethtool -d?).
> Sure add something specific for debug and explicitly mention that it is for debug like -d.
> Every feature and functionality needs debug, not specifically device context.
> So add infra for debug.

Why do you think it's an infra? All you need to do is a simple decoupling.

> Device migration series is not the vehicle to piggy back on.
>
> > > > 1) define virtqueue state, inflight descriptors in the section of
> > > > basic facility but not under the admin commands
> > > It will be part of the device context such a way that so that one can only read
> > the vq state only instead of full device context.
> > > This will work.
> >
> > I'm not sure what it looks like, but I think they are well decoupled in this series.
> > E.g driver can choose to just read e.g last_avail_idx and report to ethtool or
> > watchdog.
> >
> Once its done it will be visible how it looks like.
> The key is it needs to cover BOTH use cases.
>
> > As I replied in other thread, I see several problems:
> >
> > 1) layer violation, PCI specific state were mixed into the basic facilities
>
> After agreeing to see merged donctext, now you are hinting that you donât agree to merge two.

It's not a merging, it's about decoupling. I'm fine if you don't do
coupling and layer violation.

> I disagree if you are leaning towards that direction.
> I hope my deduction from your above comment is incorrect.

I agree to seek a way to unify but it doesn't mean everything in your
current proposal is correct. Basic facility part should be transport
independent.

>
> There is no violation. PCI specific device context will be captured in PCI specific section.

Is that what you've done in your series now?

> Device type contexts will be captured in those device type sections.
>
> TLVs will cover many of the device context information.
>
> > 2) I don't see a good definition on "device context"
> > 3) TLV works fine for queue but not registers
> >
> Please definition of device context in [1].
>
> [1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00064.html
>
> > What needs to be done first is to describe what device context means and what
> > it contains. Not the actual data structure since it may vary.
> >
> Sure, it is already defined in device migration theory of operation section in [2].

It's too vague, for example it's not easy to infer if the following
belong to device context:

1) dirty pages
2) virtqueue adresses

etc.

> I will try to take it out and put in device management section, so that device migration can refer to it and
> Some other basic facility also can refer to it (which must need to explain a use case beyond silly debug point).

It's silly to ship a product without any debugging facilities.

And I've given you other examples like hibernation but you ignore them
again. And what's more important, you ignores my question, let me ask
you again here:

1) Try to prove your facility can only work for one specific case
2) Try to prove your facility can work for more than one cases

Which one is easier and more beneficial to virtio?

The decoupling is just a matter of relocating the text, does it block
any of your proposals? Why do you choose to refuse such a simple
change with such a huge advantage?

>
> [2] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00062.html
>
> > >
> > > > 3) define transport specific interfaces or admin commands to access
> > > > them
> > > >
> > > As you also envisioned, it is done using admin commands to access it.
> >
> > That's fine, but it should allow other ways.
> >
> For passthrough admin commands fits the design.
> Sure, you need to draft it how to do other ways..

So, it looks to me you never examine this series carefully.

>  > >
> > > > Does this work? It seems you refused such steps in the past.
> > > >
> > > I didnât.
> > > There must be some confusion in many emails we both exchanged, because
> > already posted v0 has it split such as device context.
> > >
> > > For dirty page tracking I couldnât find a solid use case without device
> > migration, so I asked which you already replied above.
> >
> > First, you never explain why such coupling gives us any benefit.
> It is not coupled. Device migration uses this facility. So it is matter of text organization in the spec.
> Not the design.
>
> > Second, I've given you sufficient examples but you tend to ignore them. Why
> > not go through Qemu codes then you will see the answer.
> You gave example of debug, and profiling. You didnât explain the use case of how to actually connect and how to profile etc.

How much evidence do you need me to provide? Do I need to give you the
manpages of ethtool -d? For profiling, is it too hard to think, for
example, to give non retire hints to IOTLB via a specific IOPTE if we
found a page is dirty?

Compare the evidence or proof you've provided, how much you have
provided so far? Or could you please prove the device context can only
work for migration?

>
> > > > Actually, I would like to leave 2) as it's very complicated which
> > > > might not converge easily.
> > > >
> > > I will split current "device migration" section to two.
> > > 1. device management
> > > 2. device migration
> > >
> > > Device management covers device mode, device context and dirty page
> > tracking.
> >
> > I don't see any connection between "management" and "device context".
> >
> You have better name than device management?
> May be device operation.
> May be it is just better to have the device context just the way it is in [1] under basic facility.

I don't understand this.

>
> > > Device migration refers to device management section.
> > >
> > > We can omit the dirty page tracking commands for a moment and first close
> > on the device mode and device context as first step.
> > > Since it is already part of v0 and it is needed, I will keep it in subsequent v1 but
> > moved to device management section.
> >
> > Could you please answer what's wrong with the first 4 patches in this series?
> >

I'd leave Ling Shan to comment on this.

> 1. cover letter is missing the problem statement and use case
> 2. why queue suspend and both device suspend are introduced, only one should be there. The design description is missing.
> 3. Even though it claims under some random basic facility, cover letter clearly states the main use case is "live migration".
> 4. Patch 4 is not needed at all. When device is suspended, it is _suspended_. It does not do any bifurcation.
> 5. only suspend bit of patch2 is not enough to cover P2P. One needs suspend and freeze both covered in series [1].
> 6. Finally the whole description of 1 to 4 need to be split in the device operation, so that both passthrough and medication can utilize it using admin cmd and otherwise.
> Since Zhu, told that dirty tracking and inflight descriptors will be done, I presume he will propose to do over admin q or command interface.
> And since all can run over the admin commands, the plumbing done in 1 to 4 can be made using admin commands.
>
> Until now we could not establish creating yet another DMA interface that is better than q interface.
> So ...
> To me both the methods will start looking more converged to me over admin command and queues.

This is self-contradictory with what you've said before.

We're in an endless circle now. The main reason is that you keep
ignoring comments.

Another question you keep ignoring is: what prevents this proposal
from being used in the passthrough setups? I'm pretty sure if you can
get the right answer, you will understand the big picture.

Thanks




>
> Passthrough will use them over owner device.
> Mediation somehow need to do over member device.
> Mediation will not use any device suspend command because it needs to keep bisecting everything.



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]