virtio-comment message

Subject: Re: [PATCH v6 1/1] content: Add new feature VIRTIO_F_PRESERVE_RESOURCES

From: "Chen, Jiqian" <Jiqian.Chen@amd.com>
To: Parav Pandit <parav@nvidia.com>
Date: Thu, 26 Oct 2023 10:24:26 +0000

On 2023/10/25 11:51, Parav Pandit wrote:
> 
> 
>> From: Chen, Jiqian <Jiqian.Chen@amd.com>
>> Sent: Tuesday, October 24, 2023 5:44 PM
>>
>> On 2023/10/24 18:51, Parav Pandit wrote:
>>>
>>>> From: Chen, Jiqian <Jiqian.Chen@amd.com>
>>>> Sent: Tuesday, October 24, 2023 4:06 PM
>>>>
>>>> On 2023/10/23 21:35, Parav Pandit wrote:
>>>>>
>>>>>> From: Chen, Jiqian <Jiqian.Chen@amd.com>
>>>>>> Sent: Monday, October 23, 2023 4:09 PM
>>>>>
>>>>>>> I think this can be done without introducing the new register.
>>>>>>> Can you please check if the PM register itself can serve the
>>>>>>> purpose instead
>>>>>> of new virtio level register?
>>>>>> Do you mean the system PM register?
>>>>> No, the device's PM register at transport level.
>>>> I tried to find this register(pci level or virtio pci level or virtio
>>>> driver level), but I didn't find it in Linux kernel or Qemu codes.
>>>> May I know which register you are referring to specifically? Or which
>>>> PM state bit you mentioned below?
>>>>
>>> PCI spec's PCI Power Management Capability Structure in section 7.5.2.
>> Yes, what you point to is PM capability for PCIe device.
>> But the problem is still that in Qemu code, it will check the
>> condition(pci_bus_is_express or pci_is_express) of all virtio-pci devices in
>> function virtio_pci_realize(), if the virtio devices aren't a PCIe device, it will not
>> add PM capability for them.
> PCI PM capability is must for PCIe devices. So may be QEMU code has put it only under is_pcie check.
> But it can be done outside of that check as well because this capability exists on PCI too for long time and it is backward compatible.
Do you suggest me to implement PM capability for virtio devices in Qemu firstly, and then to try if the PM capability can work for this scenario?
If so, we will complicate a simple problem. Because there are no other needs to add PM capability for virtio devices for now, if we add it just for preserving resources, it seems unnecessary and unreasonable. And we are not sure if there are other scenarios that are not during the process of PM state changing also need to preserve resources, if have, then the PM register can't cover, but preserve_resources register can.
Can I add some notes like "If PM capability is implemented for virtio devices, it may cover this scenario, and if there are no other scenarios that PM can't cover, then we can remove this register " in commit message or spec description and let us continue to add preserve_resources register?

> 
>> And another problem is how about the MMIO transport devices? Since
>> preserve resources is need for all transport type devices.
>>
> MMIO lacks such rich PM definitions. If in future MMIO wants to support, it will be extended to match to other transports like PCI.
> 
>>>>>
>>>>>> I think it is unreasonable to let virtio- device listen the PM
>>>>>> state of Guest system.
>>>>> Guest driver performs any work on the guest systems PM callback
>>>>> events in
>>>> the virtio driver.
>>>> I didn't find any PM state callback in the virtio driver.
>>>>
>>> There are virtio_suspend and virtio_resume in case of Linux.
>> I think what you said virtio_suspend/resume is freeze/restore callback from
>> "struct virtio_driver" or suspend/resume callback from "static const struct
>> dev_pm_ops virtio_pci_pm_ops".
>> And yes, I agree, if virtio devices have PM capability, maybe we can set PM state
>> in those callback functions.
>>
>>>
>>>>>
>>>>>> It's more suitable that each device gets notifications from driver,
>>>>>> and then do preserving resources operation.
>>>>> I agree that each device gets the notification from driver.
>>>>> The question is, should it be virtio driver, or existing pci driver
>>>>> which
>>>> transitions the state from d0->d3 and d3->d0 is enough.
>>>> It seems there isn't existing pci driver to transitions d0 or d3
>>>> state. Could you please tell me which one it is specifically? I am very willing to
>> give a try.
>>>>
>>> Virtio-pci modern driver of Linux should be able to.
>> Yes, I know, it is the VIRTIO_PCI_FLAG_INIT_PM_BIT. But still the two problems I
>> said above.
>>
> Both can be resolved without switching to pcie.
>  
>>>
>>>>> Can you please check that?
>>>>>
>>>>>>>> --- a/transport-pci.tex
>>>>>>>> +++ b/transport-pci.tex
>>>>>>>> @@ -325,6 +325,7 @@ \subsubsection{Common configuration
>> structure
>>>>>>>> layout}\label{sec:Virtio Transport
>>>>>>>>          /* About the administration virtqueue. */
>>>>>>>>          le16 admin_queue_index;         /* read-only for driver */
>>>>>>>>          le16 admin_queue_num;         /* read-only for driver */
>>>>>>>> +        le16 preserve_resources;        /* read-write */
>>>>>>> Preserving these resources in the device implementation takes
>>>>>>> finite amount
>>>>>> of time.
>>>>>>> Possibly more than 40nsec (time of PCIe write TLP).
>>>>>>> Hence this register must be a polling register to indicate that
>>>>>> preservation_done.
>>>>>>> This will tell the guest when the preservation is done, and when
>>>>>>> restoration is
>>>>>> done, so that it can resume upper layers.
>>>>>>>
>>>>>>> Please refer to queue_reset definition to learn more about such
>>>>>>> register
>>>>>> definition.
>>>>>> Thanks, I will refer to "queue_reset". So, I need three values,
>>>>>> driver write 1 to let device do preserving resources, driver write
>>>>>> 2 to let device do restoring resources, device write 0 to tell
>>>>>> driver that preserving or restoring done, am I right?
>>>>>>
>>>>> Right.
>>>>>
>>>>> And if the existing pcie pm state bits can do, we can leverage that.
>>>>> If it cannot be used, lets add that reasoning in the commit log to
>>>>> describe this
>>>> register.
>>>>>
>>>>>>>
>>>>>>> Lets please make sure that PCIe PM level registers are
>>>>>>> sufficient/not-sufficient
>>>>>> to decide the addition of this register.
>>>>>> But if the device is not a PCIe device, it doesn't have PM
>>>>>> capability, then this will not work. Actually in my local
>>>>>> environment, pci_is_express() return false in Qemu, they are not
>>>>>> PCIe
>>>> device.
>>>>> It is reasonable to ask to plug in as PCIe device in 2023 to get new
>>>>> functionality that too you mentioned a gpu device. ð
>>>>> Which does not have very long history of any backward compatibility.
>>>> Do you suggest me to add PM capability for virtio-gpu or change
>>>> virtio-gpu to a PCIe device?
>>>>
>>> PCI Power Management Capability Structure does not seem to be limited to
>> PCIe.
>> I am not sure, but in current Qemu code, I can see the check "pci_is_express"
>> for all virtio-pci devices. If we want to add PM capability for virtio-pci devices,
>> we need to change them to PCIe device I think.
>>
> That is one option.
> Second option to extend PCI PM cap for non pci device because it is supported.
> 
>>>
>>
>> --
>> Best regards,
>> Jiqian Chen.

-- 
Best regards,
Jiqian Chen.

Follow-Ups:
- Re: [PATCH v6 1/1] content: Add new feature VIRTIO_F_PRESERVE_RESOURCES
  - From: "Michael S. Tsirkin" <mst@redhat.com>

References:
- [PATCH v6 0/1] Add new feature VIRTIO_F_PRESERVE_RESOURCES
  - From: Jiqian Chen <Jiqian.Chen@amd.com>
- [PATCH v6 1/1] content: Add new feature VIRTIO_F_PRESERVE_RESOURCES
  - From: Jiqian Chen <Jiqian.Chen@amd.com>
- RE: [PATCH v6 1/1] content: Add new feature VIRTIO_F_PRESERVE_RESOURCES
  - From: Parav Pandit <parav@nvidia.com>
- Re: [PATCH v6 1/1] content: Add new feature VIRTIO_F_PRESERVE_RESOURCES
  - From: "Chen, Jiqian" <Jiqian.Chen@amd.com>
- RE: [PATCH v6 1/1] content: Add new feature VIRTIO_F_PRESERVE_RESOURCES
  - From: Parav Pandit <parav@nvidia.com>
- Re: [PATCH v6 1/1] content: Add new feature VIRTIO_F_PRESERVE_RESOURCES
  - From: "Chen, Jiqian" <Jiqian.Chen@amd.com>
- RE: [PATCH v6 1/1] content: Add new feature VIRTIO_F_PRESERVE_RESOURCES
  - From: Parav Pandit <parav@nvidia.com>
- Re: [PATCH v6 1/1] content: Add new feature VIRTIO_F_PRESERVE_RESOURCES
  - From: "Chen, Jiqian" <Jiqian.Chen@amd.com>
- RE: [PATCH v6 1/1] content: Add new feature VIRTIO_F_PRESERVE_RESOURCES
  - From: Parav Pandit <parav@nvidia.com>