virtio-comment message

Subject: RE: [PATCH v6 1/1] content: Add new feature VIRTIO_F_PRESERVE_RESOURCES

From: Parav Pandit <parav@nvidia.com>
To: "Chen, Jiqian" <Jiqian.Chen@amd.com>
Date: Wed, 25 Oct 2023 03:51:41 +0000


> From: Chen, Jiqian <Jiqian.Chen@amd.com>
> Sent: Tuesday, October 24, 2023 5:44 PM
> 
> On 2023/10/24 18:51, Parav Pandit wrote:
> >
> >> From: Chen, Jiqian <Jiqian.Chen@amd.com>
> >> Sent: Tuesday, October 24, 2023 4:06 PM
> >>
> >> On 2023/10/23 21:35, Parav Pandit wrote:
> >>>
> >>>> From: Chen, Jiqian <Jiqian.Chen@amd.com>
> >>>> Sent: Monday, October 23, 2023 4:09 PM
> >>>
> >>>>> I think this can be done without introducing the new register.
> >>>>> Can you please check if the PM register itself can serve the
> >>>>> purpose instead
> >>>> of new virtio level register?
> >>>> Do you mean the system PM register?
> >>> No, the device's PM register at transport level.
> >> I tried to find this register(pci level or virtio pci level or virtio
> >> driver level), but I didn't find it in Linux kernel or Qemu codes.
> >> May I know which register you are referring to specifically? Or which
> >> PM state bit you mentioned below?
> >>
> > PCI spec's PCI Power Management Capability Structure in section 7.5.2.
> Yes, what you point to is PM capability for PCIe device.
> But the problem is still that in Qemu code, it will check the
> condition(pci_bus_is_express or pci_is_express) of all virtio-pci devices in
> function virtio_pci_realize(), if the virtio devices aren't a PCIe device, it will not
> add PM capability for them.
PCI PM capability is must for PCIe devices. So may be QEMU code has put it only under is_pcie check.
But it can be done outside of that check as well because this capability exists on PCI too for long time and it is backward compatible.

> And another problem is how about the MMIO transport devices? Since
> preserve resources is need for all transport type devices.
> 
MMIO lacks such rich PM definitions. If in future MMIO wants to support, it will be extended to match to other transports like PCI.

> >>>
> >>>> I think it is unreasonable to let virtio- device listen the PM
> >>>> state of Guest system.
> >>> Guest driver performs any work on the guest systems PM callback
> >>> events in
> >> the virtio driver.
> >> I didn't find any PM state callback in the virtio driver.
> >>
> > There are virtio_suspend and virtio_resume in case of Linux.
> I think what you said virtio_suspend/resume is freeze/restore callback from
> "struct virtio_driver" or suspend/resume callback from "static const struct
> dev_pm_ops virtio_pci_pm_ops".
> And yes, I agree, if virtio devices have PM capability, maybe we can set PM state
> in those callback functions.
> 
> >
> >>>
> >>>> It's more suitable that each device gets notifications from driver,
> >>>> and then do preserving resources operation.
> >>> I agree that each device gets the notification from driver.
> >>> The question is, should it be virtio driver, or existing pci driver
> >>> which
> >> transitions the state from d0->d3 and d3->d0 is enough.
> >> It seems there isn't existing pci driver to transitions d0 or d3
> >> state. Could you please tell me which one it is specifically? I am very willing to
> give a try.
> >>
> > Virtio-pci modern driver of Linux should be able to.
> Yes, I know, it is the VIRTIO_PCI_FLAG_INIT_PM_BIT. But still the two problems I
> said above.
>
Both can be resolved without switching to pcie.
 
> >
> >>> Can you please check that?
> >>>
> >>>>>> --- a/transport-pci.tex
> >>>>>> +++ b/transport-pci.tex
> >>>>>> @@ -325,6 +325,7 @@ \subsubsection{Common configuration
> structure
> >>>>>> layout}\label{sec:Virtio Transport
> >>>>>>          /* About the administration virtqueue. */
> >>>>>>          le16 admin_queue_index;         /* read-only for driver */
> >>>>>>          le16 admin_queue_num;         /* read-only for driver */
> >>>>>> +        le16 preserve_resources;        /* read-write */
> >>>>> Preserving these resources in the device implementation takes
> >>>>> finite amount
> >>>> of time.
> >>>>> Possibly more than 40nsec (time of PCIe write TLP).
> >>>>> Hence this register must be a polling register to indicate that
> >>>> preservation_done.
> >>>>> This will tell the guest when the preservation is done, and when
> >>>>> restoration is
> >>>> done, so that it can resume upper layers.
> >>>>>
> >>>>> Please refer to queue_reset definition to learn more about such
> >>>>> register
> >>>> definition.
> >>>> Thanks, I will refer to "queue_reset". So, I need three values,
> >>>> driver write 1 to let device do preserving resources, driver write
> >>>> 2 to let device do restoring resources, device write 0 to tell
> >>>> driver that preserving or restoring done, am I right?
> >>>>
> >>> Right.
> >>>
> >>> And if the existing pcie pm state bits can do, we can leverage that.
> >>> If it cannot be used, lets add that reasoning in the commit log to
> >>> describe this
> >> register.
> >>>
> >>>>>
> >>>>> Lets please make sure that PCIe PM level registers are
> >>>>> sufficient/not-sufficient
> >>>> to decide the addition of this register.
> >>>> But if the device is not a PCIe device, it doesn't have PM
> >>>> capability, then this will not work. Actually in my local
> >>>> environment, pci_is_express() return false in Qemu, they are not
> >>>> PCIe
> >> device.
> >>> It is reasonable to ask to plug in as PCIe device in 2023 to get new
> >>> functionality that too you mentioned a gpu device. ð
> >>> Which does not have very long history of any backward compatibility.
> >> Do you suggest me to add PM capability for virtio-gpu or change
> >> virtio-gpu to a PCIe device?
> >>
> > PCI Power Management Capability Structure does not seem to be limited to
> PCIe.
> I am not sure, but in current Qemu code, I can see the check "pci_is_express"
> for all virtio-pci devices. If we want to add PM capability for virtio-pci devices,
> we need to change them to PCIe device I think.
> 
That is one option.
Second option to extend PCI PM cap for non pci device because it is supported.

> >
> 
> --
> Best regards,
> Jiqian Chen.

Follow-Ups:
- Re: [PATCH v6 1/1] content: Add new feature VIRTIO_F_PRESERVE_RESOURCES
  - From: "Chen, Jiqian" <Jiqian.Chen@amd.com>

References:
- [PATCH v6 0/1] Add new feature VIRTIO_F_PRESERVE_RESOURCES
  - From: Jiqian Chen <Jiqian.Chen@amd.com>
- [PATCH v6 1/1] content: Add new feature VIRTIO_F_PRESERVE_RESOURCES
  - From: Jiqian Chen <Jiqian.Chen@amd.com>
- RE: [PATCH v6 1/1] content: Add new feature VIRTIO_F_PRESERVE_RESOURCES
  - From: Parav Pandit <parav@nvidia.com>
- Re: [PATCH v6 1/1] content: Add new feature VIRTIO_F_PRESERVE_RESOURCES
  - From: "Chen, Jiqian" <Jiqian.Chen@amd.com>
- RE: [PATCH v6 1/1] content: Add new feature VIRTIO_F_PRESERVE_RESOURCES
  - From: Parav Pandit <parav@nvidia.com>
- Re: [PATCH v6 1/1] content: Add new feature VIRTIO_F_PRESERVE_RESOURCES
  - From: "Chen, Jiqian" <Jiqian.Chen@amd.com>
- RE: [PATCH v6 1/1] content: Add new feature VIRTIO_F_PRESERVE_RESOURCES
  - From: Parav Pandit <parav@nvidia.com>
- Re: [PATCH v6 1/1] content: Add new feature VIRTIO_F_PRESERVE_RESOURCES
  - From: "Chen, Jiqian" <Jiqian.Chen@amd.com>