virtio-comment message

Subject: RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE

From: Parav Pandit <parav@nvidia.com>
To: "Zhu, Lingshan" <lingshan.zhu@intel.com>, Jason Wang <jasowang@redhat.com>
Date: Mon, 11 Sep 2023 10:21:06 +0000

> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Monday, September 11, 2023 3:03 PM

> So implement AQ on the "admin" VF? This require the HW reserve dedicated
> resource for every VF?
> So expensive, Overkill?
> 
> And a VF may be managed by the PF and its admin "vf"?
Yes.

> > If UDP packets are dropped, even application can fail who do no retry.
> UDP is not reliable, and performance overhead does not mean fail.
It largely depends on application.
I have seen iperf UDP failing on packet drop and never recovered.
A retransmission over UDP can fail.

> >
> >> But too few AQ to serve too high volume of VMs may be a problem.
> > It is left for the device to implement the needed scale requirement.
> Yes, so how many HW resource should the HW implementation reserved to
> serve the worst case? Half of the board resource?
The board designer can decide how to manage the resource.
Administration commands are explicit instructions to the device.
It knows how many members device's dirty tracking is ongoing, which device context is being read/written.

Admin command can even fail with EAGAIN error code when device is out of resource and software can retry the command.

They key part is all of these happens outside of the VM's downtime.
Majority of the work in proposal [1] is done when the VM is _live_.
Hence, the resource consumption or reservation is significantly less.


> >> Naming a number or an algorithm for the ratio of devices / num_of_AQs
> >> is beyond this topic, but I made my point clear.
> > Sure. It is beyond.
> > And it is not a concern either.
> It is, the user expect the LM process success than fail.
I still fail to understand why LM process fails.
The migration process is slow, but downtime is not in [1].

> >> can depth = 1K introduce significant latency?
> > AQ command execution is not done serially. There is enough text on the AQ
> chapter as I recall.
> Then require more HW resource, I don't see difference.
Difference compared to what, multiple AQs?
If so, sure.
The device who prefers to do only one AQ command at a time, sure it can work with less resource and do one at a time.

> >
> >> And 1K depths is
> >> almost identical to 2 X 500 queue depths, so still the same problem,
> >> how many resource does the HW need to reserve to serve the worst case?
> >>
> > You didnât describe the problem.
> > Virtqueue is generic infrastructure to execute commands, be it admin
> command, control command, flow filter command, scsi command.
> > How many to execute in parallel, how many queues to have are device
> implementation specific.
> So the question is how many to serve the worst case? Does the HW vendor need
> to reserve half of the board resource?
No. It does not need to.

> >
> >> Let's forget the numbers, the point is clear.
> > Ok. I agree with you.
> > Number of AQs and its depth matter for this discussion, and its performance
> characterization is outside the spec.
> > Design wise, key thing to have the queuing interface between driver and
> device for device migration commands.
> > This enables both entities to execute things in parallel.
> >
> > This is fully covered in [1].
> > So let's improve [1].
> >
> > [1]
> > https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.h
> > tml
> I am not sure, why [1] is a must? There are certain issues discussed in this
> thread for [1] stay unsolved.
> 
> By the way, do you see anything we need to improve in this series?
In [1], device context needs to more rich as we progress in v1/v2 versions.

[..]

> > A nested guest VM is not aware and should not.
> > The VM hosting the nested VM, is aware on how to execute administrative
> commands using the owner device.
> The VM does not talk to admin vq either, the admin vq is a host facility, host
> owns it.
Admin VQ is owned by the device whichever has it.
As I explained before, it is on the owner device.
If needed one can do on more than owner device.
A VM_A which is hosting another VM_B, a VM_A can have peer VF with AQ to be the admin device or migration manager device.

> >
> > At present for PCI transport, owner device is PF.
> >
> > In future for nesting, may be another peer VF can be delegated such task and
> it can perform administration command.
> Then it may run into the problems explained above.
> >
> > For bare metal may be some other admin device like DPU can do that role.
> So [1] is not ready
> >
> >>>> Why this series can not support nested?
> >>> I donât see all the aspects that I covered in series [1] ranging
> >>> from flr, device
> >> context migration, virtio level reset, dirty page tracking, p2p support, etc.
> >> covered in some device, vq suspend resume piece.
> >>> [1]
> >>> https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061
> >>> .h
> >>> tml
> >> We have discussed many other issues in this thread.
> >>>>>> And it does not serve bare-metal live migration either.
> >>>>> A bare-metal migration seems a distance theory as one need side
> >>>>> cpu,
> >>>> memory accessor apart from device accessor.
> >>>>> But somehow if that exists, there will be similar admin device to
> >>>>> migrate it
> >>>> may be TDDISP will own this whole piece one day.
> >>>> Bare metal live migration require other components like firmware OS
> >>>> and partitioning, that's why the device live migration should not
> >>>> be a
> >> blocker.
> >>> Device migration is not blocker.
> >>> In-fact it facilitates for this future in case if that happens where
> >>> side cpu like
> >> DPU or similar sideband virtio admin device can migrate over its admin vq.
> >>> Long ago when admin commands were discussed, this was discussed too
> >> where a admin device may not be an owner device.
> >> The admin vq can not migrate it self therefore baremetal can not be
> >> migrated by admin vq
> > May be I was not clear. The admin commands are executed by some other
> device than the PF.
>  From SW perspective, it should be the admin vq and the device it resides.
> > In above I call it admin device, which can be a DPU may be some other
> dedicated admin device or something else.
> > Large part of non virtio infrastructure at platform, BIOS, cpu, memory level
> needs to evolve before virtio can utilize it.
> virito device should be self-contained. Not depend on other components.
> >
> > We donât need to cook all now, as long as we have administration commands
> its good.
> > The real credit owner for detaching the administration command from
> > the admin vq is Michael. :) We like to utilize this in future for DPU case where
> admin device is not the PCI PF.
> > Eswitch, PF migration etc may utilize it in future when needed.
> Again, the design should not rely on other host components.
It does not. It relies on the administration commands.

> 
> And it is not about the credit, this is reliable work outcome
I didnât follow the comment.

Follow-Ups:
- Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>

References:
- [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  - From: Zhu Lingshan <lingshan.zhu@intel.com>
- [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: Zhu Lingshan <lingshan.zhu@intel.com>
- Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: "Michael S. Tsirkin" <mst@redhat.com>
- Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: Jason Wang <jasowang@redhat.com>
- RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: Jason Wang <jasowang@redhat.com>
- RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
- RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
- RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
- RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
- RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>