OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state




On 9/26/2023 11:40 AM, Parav Pandit wrote:

From: Jason Wang <jasowang@redhat.com>
Sent: Tuesday, September 26, 2023 8:16 AM

On Mon, Sep 25, 2023 at 6:41âPM Parav Pandit <parav@nvidia.com> wrote:


From: Jason Wang <jasowang@redhat.com>
Sent: Friday, September 22, 2023 8:38 AM
Device context has no overlap.
I can give you one example, e.g debugging.

Almost every feature needs debugging. :) So I am omitting it for time
being.
Well, I don't think so. We have a lot of handy tools for that (ethtool -d?).
Sure add something specific for debug and explicitly mention that it is for debug like -d.
Every feature and functionality needs debug, not specifically device context.
So add infra for debug. Device migration series is not the vehicle to piggy back on.

1) define virtqueue state, inflight descriptors in the section of
basic facility but not under the admin commands
It will be part of the device context such a way that so that one can only read
the vq state only instead of full device context.
This will work.
I'm not sure what it looks like, but I think they are well decoupled in this series.
E.g driver can choose to just read e.g last_avail_idx and report to ethtool or
watchdog.

Once its done it will be visible how it looks like.
The key is it needs to cover BOTH use cases.

As I replied in other thread, I see several problems:

1) layer violation, PCI specific state were mixed into the basic facilities
After agreeing to see merged donctext, now you are hinting that you donât agree to merge two.
I disagree if you are leaning towards that direction.
I hope my deduction from your above comment is incorrect.

There is no violation. PCI specific device context will be captured in PCI specific section.
Device type contexts will be captured in those device type sections.

TLVs will cover many of the device context information.

2) I don't see a good definition on "device context"
3) TLV works fine for queue but not registers

Please definition of device context in [1].

[1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00064.html

What needs to be done first is to describe what device context means and what
it contains. Not the actual data structure since it may vary.

Sure, it is already defined in device migration theory of operation section in [2].
I will try to take it out and put in device management section, so that device migration can refer to it and
Some other basic facility also can refer to it (which must need to explain a use case beyond silly debug point).

[2] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00062.html

3) define transport specific interfaces or admin commands to access
them

As you also envisioned, it is done using admin commands to access it.
That's fine, but it should allow other ways.

For passthrough admin commands fits the design.
Sure, you need to draft it how to do other ways..
  > >
Does this work? It seems you refused such steps in the past.

I didnât.
There must be some confusion in many emails we both exchanged, because
already posted v0 has it split such as device context.
For dirty page tracking I couldnât find a solid use case without device
migration, so I asked which you already replied above.

First, you never explain why such coupling gives us any benefit.
It is not coupled. Device migration uses this facility. So it is matter of text organization in the spec.
Not the design.

Second, I've given you sufficient examples but you tend to ignore them. Why
not go through Qemu codes then you will see the answer.
You gave example of debug, and profiling. You didnât explain the use case of how to actually connect and how to profile etc.

Actually, I would like to leave 2) as it's very complicated which
might not converge easily.

I will split current "device migration" section to two.
1. device management
2. device migration

Device management covers device mode, device context and dirty page
tracking.

I don't see any connection between "management" and "device context".

You have better name than device management?
May be device operation.
May be it is just better to have the device context just the way it is in [1] under basic facility.
Device migration refers to device management section.

We can omit the dirty page tracking commands for a moment and first close
on the device mode and device context as first step.
Since it is already part of v0 and it is needed, I will keep it in subsequent v1 but
moved to device management section.

Could you please answer what's wrong with the first 4 patches in this series?

1. cover letter is missing the problem statement and use case
I only reply to this section of comments, this does not mean I agree with you on your other statements, Instead I agree with
Jason on his replies to you.

In my cover letter:
"The main usecase of these new facilities is Live Migration."

Did you miss it?
2. why queue suspend and both device suspend are introduced, only one should be there. The design description is missing.
there are no queue suspend, they are device suspend and vq state accessors. please read the patch if you want to comment.
3. Even though it claims under some random basic facility, cover letter clearly states the main use case is "live migration".
it is not random, they are precisely defined virtio basic facilities.
4. Patch 4 is not needed at all. When device is suspended, it is _suspended_. It does not do any bifurcation.
The device should not accept vq reset and the driver should reset vqs, please read previous discussions with MST
and please don't ignore the conclusions
5. only suspend bit of patch2 is not enough to cover P2P. One needs suspend and freeze both covered in series [1].
we have discussed this for many times, P2P is out of virtio spec, do you want to mediate every PCI state/functionality?
6. Finally the whole description of 1 to 4 need to be split in the device operation, so that both passthrough and medication can utilize it using admin cmd and otherwise.
Do you see any reasons this solution can not be used for passthrough and mediation?
Or does features_OK work for passthrough or mediation? Any difference?
Since Zhu, told that dirty tracking and inflight descriptors will be done, I presume he will propose to do over admin q or command interface.
And since all can run over the admin commands, the plumbing done in 1 to 4 can be made using admin commands.
No

Until now we could not establish creating yet another DMA interface that is better than q interface.
So ...
To me both the methods will start looking more converged to me over admin command and queues.
I don't think so, again, we are introducing basic facilities and these facilities don't
depend on or rely on admin vq.

Passthrough will use them over owner device.
Mediation somehow need to do over member device.
Mediation will not use any device suspend command because it needs to keep bisecting everything.
please read QEMU vhost live migration solution



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]