virtio-comment message

Subject: RE: [virtio-comment] Re: [PATCH v1 3/8] device-context: Define the device context fields for device migration

From: Parav Pandit <parav@nvidia.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Date: Wed, 1 Nov 2023 08:49:40 +0000

> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Wednesday, November 1, 2023 1:59 PM
> 
> On Wed, Nov 01, 2023 at 06:47:57AM +0000, Parav Pandit wrote:
> >
> >
> > > From: Michael S. Tsirkin <mst@redhat.com>
> > > Sent: Wednesday, November 1, 2023 12:07 PM
> > >
> > > On Wed, Nov 01, 2023 at 05:42:56AM +0000, Parav Pandit wrote:
> > > >
> > > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > > Sent: Wednesday, November 1, 2023 11:01 AM
> > > > >
> > > > > On Wed, Nov 01, 2023 at 02:54:47AM +0000, Parav Pandit wrote:
> > > > > >
> > > > > >
> > > > > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > > > > Sent: Tuesday, October 31, 2023 3:44 PM
> > > > > > >
> > > > > > > On Tue, Oct 31, 2023 at 05:42:29PM +0800, Zhu, Lingshan wrote:
> > > > > > > > > Your answer is not relevant to this discussion at all.
> > > > > > > > > Why?
> > > > > > > > > Because we were discussing the schemes where registers
> > > > > > > > > are not
> > > used.
> > > > > > > > > One example of that was IMS. It does not matter MSI or MSIX.
> > > > > > > > > As explained in Intel's commit message, the key to focus
> > > > > > > > > for IMS is "queue
> > > > > > > memory" not some hw register like MSI or MSI-X.
> > > > > > > > you know the device always need to know a address and the
> > > > > > > > data to send a MSI, right?
> > > > > > >
> > > > > > > So if virtio is to use IMS then we'll need to add interfaces
> > > > > > > to program IMS, I think. As part of that patch - it's
> > > > > > > reasonable to assume - we will also need to add a way to
> > > > > > > retrieve IMS so it can be
> > > > > migrated.
> > > > > > >
> > > > > > > However, what this example demonstrates is that the approach
> > > > > > > taken by this proposal to migrate control path structures -
> > > > > > > namely, by defining a structure used just for migration -
> > > > > > > means that we will need to come up with a migration interface each
> time.
> > > > > > > And that is unfortunate.
> > > > > > >
> > > > > > When the device supports a new feature it has supported new
> > > functionality.
> > > > > > Hence the live migration side also got updated.
> > > > > > However, the live migration driver does not have to understand
> > > > > > what is inside
> > > > > the control path structures.
> > > > > > It is just byte stream.
> > > > > > Only if the hypervisor live migration drive involved in
> > > > > > emulating, it will parse
> > > > > and that is fine as like other control structures.
> > > > >
> > > > > The point is that any new field needs to be added in two places
> > > > > now and that is not great at all.
> > > > >
> > > > Most control structs are well defined. So only its type field is
> > > > added to
> > > migrating driver side.
> > > > This is very low overhead field and handled in generic way for all
> > > > device types
> > > and for all common types.
> > >
> > > Weird, not what I see.  E.g. you seem to have a structure
> > > duplicating queue fields. Each new field will have to be added there in
> addition to the transport.
> > >
> > Didn't follow.
> > Which structure is duplicated?
> > PCI does not even have a structure for q configuration.
> 
> So this:
> 
> +struct virtio_dev_ctx_pci_vq_cfg {
> +        le16 vq_index;
> +        le16 queue_size;
> +        le16 queue_msix_vector;
> +        le64 queue_descÍ
> +        le64 queue_driverÍ
> +        le64 queue_deviceÍ
> +};
> 
> 
> duplicates a bunch of fields from this:
> 
Not really. Above is current VQ's configuration not visible in the config space directly.
Below is already captured as part of VIRTIO_DEV_CTX_PCI_COMMON_CFG.

> struct virtio_pci_common_cfg {
>         /* About the whole device. */
>         __le32 device_feature_select;   /* read-write */
>         __le32 device_feature;          /* read-only */
>         __le32 guest_feature_select;    /* read-write */
>         __le32 guest_feature;           /* read-write */
>         __le16 msix_config;             /* read-write */
>         __le16 num_queues;              /* read-only */
>         __u8 device_status;             /* read-write */
>         __u8 config_generation;         /* read-only */
> 
>         /* About a specific virtqueue. */
>         __le16 queue_select;            /* read-write */
>         __le16 queue_size;              /* read-write, power of 2. */
>         __le16 queue_msix_vector;       /* read-write */
>         __le16 queue_enable;            /* read-write */
>         __le16 queue_notify_off;        /* read-only */
>         __le32 queue_desc_lo;           /* read-write */
>         __le32 queue_desc_hi;           /* read-write */
>         __le32 queue_avail_lo;          /* read-write */
>         __le32 queue_avail_hi;          /* read-write */
>         __le32 queue_used_lo;           /* read-write */
>         __le32 queue_used_hi;           /* read-write */
> };
> 
> 
> Except it's incomplete and I suspect that's actually a bug.
> 
It is not a bug.
There is some information duplicated. Above struct virtio_pci_common_cfg is the snapshot of registers being updated.
While struct virtio_dev_ctx_pci_vq_cfg is capturing what is not visible in above config snapshot.
i.e. queues which are already configured.
For at that rare instance there is some duplication of some fields. but this is the exception part not be too much worried about.

> 
> Here's an idea: have a record per field. Use transport offsets as tags.
> 
There are just too many of them.
They are logically clubbed in their native structures which are already defined.

> 
> 
> 
> > > > > We need a stronger compatiblity story here I think.
> > > > >
> > > > > One way to show how it's designed to work would be to split the
> > > > > patches. For example, add queue notify data and queue reset separately.
> > > > I didn't follow the suggestion. Can you explain splitting patches
> > > > and its relation
> > > to the structure?
> > Did I miss your response? Or it is in below msix?
> 
> exactly.
> 
> > > >
> > > > >
> > > > > Another is to add MSIX table migration option for when MSIX
> > > > > table is passed through to guest.
> > > > Yes, this will be added in future when there is actual hypervisor for it.
> > >
> > > You are tying the architecture to an extremely implementation specific
> detail.
> > >
> > Which part?
> >
> > Not really. can you please which software will use MSI-X table migration?
> 
> Like, all and any? All hypervisors migrate the msi-x table.
> 
> > > Hypervisors *already* have migrate the MSIX table. Just in a
> > > hypervisor specific way.
> > MSI-X table is in the PCI BAR memory.
> 
> Exactly. And that means hypervisor should not read it from the device directly -
> e.g. with an encrypted device it won't be able to.
> 
Yep.

> > > queue vector is an index into this table. So the index is migrated
> > > through the device but the table itself has to be trapped and emulated by
> hypervisor?
> > Do you have a hypervisor and a platform that has not done MSI-X table
> emulation for which you are asking to add?
> > I don't know any.
> > I am asking to add it when there is a _real_ user of it. Why it cannot be added
> when the _real_ user arrive?
> 
> Real meaning actual hardware and software implementing it?  By this definition
> there's no real user for migration in the spec at all - all of it can be done by
> device specific means. 
This is also an option if vendor specific commands are allowed.

> What's your point? By now we have a reasonably good
> idea what hypervisors need to make migration portable. Let's either put all of it
> in the spec or not bother at all.
> 
Maybe I was not clear.
I am saying lets put the msix table when a user will find it useful in incremental manner.
For example, lets say we put in the spec in 1.4 version Nov 23, will device implement it? mostly yes.
Will existing software use it in 2024-25? mostly no, because most platform has complexity in this area as cpus have hard coded certain values.

So we are asking device to implement something that is not going to be used by any forcible future.

Hence, the request is, when the hypervisor/cpu vendor asks for it, it will be possible to add into the device.

> In other words, we need a bright line. I suggest a simple one: memory is
> migrated by device, config space by hypervisor.  If not, suggest another one -
> but it needs a reasonable rule based on a hardware not whatever software
> found expedient to use.
> 
I really liked this bright line.
Msix table like rest of the memory area of common config and device config will become part of this memory area.

I wish we also draw a good bright line between for near term vs long term to be practical.

Adding MSI-X table in the spec is not hard now or in future, frankly.

Follow-Ups:
- Re: [virtio-comment] Re: [PATCH v1 3/8] device-context: Define the device context fields for device migration
  - From: "Michael S. Tsirkin" <mst@redhat.com>

References:
- RE: [virtio-comment] Re: [PATCH v1 3/8] device-context: Define the device context fields for device migration
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] Re: [PATCH v1 3/8] device-context: Define the device context fields for device migration
  - From: "Michael S. Tsirkin" <mst@redhat.com>
- RE: [virtio-comment] Re: [PATCH v1 3/8] device-context: Define the device context fields for device migration
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] Re: [PATCH v1 3/8] device-context: Define the device context fields for device migration
  - From: "Michael S. Tsirkin" <mst@redhat.com>
- RE: [virtio-comment] Re: [PATCH v1 3/8] device-context: Define the device context fields for device migration
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] Re: [PATCH v1 3/8] device-context: Define the device context fields for device migration
  - From: "Michael S. Tsirkin" <mst@redhat.com>