[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [PATCH v6 1/1] content: Add new feature VIRTIO_F_PRESERVE_RESOURCES
On 2024/1/12 16:02, Parav Pandit wrote: > Hi Jiqian, > >> From: Chen, Jiqian <Jiqian.Chen@amd.com> >> Sent: Friday, January 12, 2024 1:11 PM > >> >> Hi all, >> Sorry to reply late. >> I don't know if you still remember this problem, let me briefly descript it. >> I am working to implement virtgpu S3 function on Xen. >> Currently on Xen, if we start a guest through qemu with enabling virtgpu, and >> then suspend and resume guest. We can find that the guest kernel comes >> back, but the display doesn't. It just shown a black screen. >> That is because during suspending, guest called into qemu and qemu >> destroyed all display resources and reset renderer. This made the display gone >> after guest resumed. >> So, I add a new command for virtio-gpu that when guest is suspending, it will >> notify qemu and set parameter(preserver_resource) to 1 and then qemu will >> preserve resources, and when resuming, guest will notify qemu to set >> parameter to 0, and then qemu will keep the normal actions. That can help >> guest's display come back. >> When I upstream above implementation, Parav and MST suggest me to use >> the PM capability to fix this problem instead of adding a new command or >> state bit. >> Now, I have tried the PM capability of virtio-gpu, it can't be used to solve this >> problem. >> The reason is: >> during guest suspending, it will write D3 state through PM cap, then I can save >> the resources of virtio-gpu on Qemu side(set preserver_resource to 1), but >> during the process of resuming, the state of PM cap will be cleared by qemu >> resetting(qemu_system_wakeup-> qemu_devices_reset-> >> virtio_vga_base_reset-> virtio_gpu_gl_reset), it causes that when guest reads >> state from PM cap, it will find the virtio-gpu has already been D0 state, > This behavior needs to be fixed. As the spec listed out, " This 2-bit field is used both to determine the current power state of a Function" Do you mean it is wrong to reset PM cap when vritio_gpu reset in current qemu code? Why? Shouldn't all device states, including PM registers, be reset during the process of virtio-gpu reset? > > So device needs to return D3 in the PowerState field. This is must. But current behavior is a kind of soft reset, I think(!PCI_PM_CTRL_NO_SOFT_RESET). The pm cap is reset by qemu is reasonable. > > In addition to it, an additional busy_poll register is helpful that indicates the device is ready to use. > This is because 10msec is the timeout set by the PCI spec. > This can be hard for the devices to implement if the large GPU state is being read from files or slow media or for other hw devices in large quantities. > > This limit comes from the PCI spec normative of below. > > After transitioning a VF from D3Hot to D0, at least one of the following is true: > â At least 10 ms has passed since the request to enter D0 was issued. > > So Readiness Time Reporting capability is not so useful. > > Hence, after PM state to D3->D0 transition is successful, virtio level PCI register is useful to ensure that device is resumed to drive rest of the registers. > >> so >> guest will not write D0 through PM cap, so I can't know when to restore the >> resources(set preserver_resource to 0). >> Do you have any other suggestions? >> Or can I just fallback to the version that add a new >> command(VIRTIO_GPU_CMD_PRESERVE_RESOURCE) in virtio-gpu? I think >> that way is more reasonable and feasible for virtio-gpu to protect display >> resources during S3. As for other devices, if necessary, they can also refer to >> the implementation of the virtio-gpu to add new commands to prevent >> resource loss during S3. >> >> On 2023/10/27 11:03, Chen, Jiqian wrote: >>> Hi Michael S. Tsirkin and Parav Pandit, Thank you for your detailed >>> explanation. I will try to use PM cap to fix this issue. >>> >>> On 2023/10/26 18:30, Michael S. Tsirkin wrote: >>>> On Thu, Oct 26, 2023 at 10:24:26AM +0000, Chen, Jiqian wrote: >>>>> >>>>> On 2023/10/25 11:51, Parav Pandit wrote: >>>>>> >>>>>> >>>>>>> From: Chen, Jiqian <Jiqian.Chen@amd.com> >>>>>>> Sent: Tuesday, October 24, 2023 5:44 PM >>>>>>> >>>>>>> On 2023/10/24 18:51, Parav Pandit wrote: >>>>>>>> >>>>>>>>> From: Chen, Jiqian <Jiqian.Chen@amd.com> >>>>>>>>> Sent: Tuesday, October 24, 2023 4:06 PM >>>>>>>>> >>>>>>>>> On 2023/10/23 21:35, Parav Pandit wrote: >>>>>>>>>> >>>>>>>>>>> From: Chen, Jiqian <Jiqian.Chen@amd.com> >>>>>>>>>>> Sent: Monday, October 23, 2023 4:09 PM >>>>>>>>>> >>>>>>>>>>>> I think this can be done without introducing the new register. >>>>>>>>>>>> Can you please check if the PM register itself can serve the >>>>>>>>>>>> purpose instead >>>>>>>>>>> of new virtio level register? >>>>>>>>>>> Do you mean the system PM register? >>>>>>>>>> No, the device's PM register at transport level. >>>>>>>>> I tried to find this register(pci level or virtio pci level or >>>>>>>>> virtio driver level), but I didn't find it in Linux kernel or Qemu codes. >>>>>>>>> May I know which register you are referring to specifically? Or >>>>>>>>> which PM state bit you mentioned below? >>>>>>>>> >>>>>>>> PCI spec's PCI Power Management Capability Structure in section >> 7.5.2. >>>>>>> Yes, what you point to is PM capability for PCIe device. >>>>>>> But the problem is still that in Qemu code, it will check the >>>>>>> condition(pci_bus_is_express or pci_is_express) of all virtio-pci >>>>>>> devices in function virtio_pci_realize(), if the virtio devices >>>>>>> aren't a PCIe device, it will not add PM capability for them. >>>>>> PCI PM capability is must for PCIe devices. So may be QEMU code has put >> it only under is_pcie check. >>>>>> But it can be done outside of that check as well because this capability >> exists on PCI too for long time and it is backward compatible. >>>>> Do you suggest me to implement PM capability for virtio devices in >>>>> Qemu firstly, and then to try if the PM capability can work for this >>>>> scenario? >>>> >>>> virtio devices in qemu already have a PM capability. >>>> >>>> >>>>> If so, we will complicate a simple problem. Because there are no other >> needs to add PM capability for virtio devices for now, if we add it just for >> preserving resources, it seems unnecessary and unreasonable. And we are not >> sure if there are other scenarios that are not during the process of PM state >> changing also need to preserve resources, if have, then the PM register can't >> cover, but preserve_resources register can. >>>> >>>> One of the selling points of virtio is precisely reusing existing >>>> platform capabilities as opposed to coming up with our own. >>>> See abstract.tex >>>> >>>> >>>>> Can I add some notes like "If PM capability is implemented for virtio >> devices, it may cover this scenario, and if there are no other scenarios that PM >> can't cover, then we can remove this register " in commit message or spec >> description and let us continue to add preserve_resources register? >>>> >>>> We can't remove registers. >>>> >>>>>> >>>>>>> And another problem is how about the MMIO transport devices? Since >>>>>>> preserve resources is need for all transport type devices. >>>>>>> >>>>>> MMIO lacks such rich PM definitions. If in future MMIO wants to >> support, it will be extended to match to other transports like PCI. >>>>>> >>>>>>>>>> >>>>>>>>>>> I think it is unreasonable to let virtio- device listen the PM >>>>>>>>>>> state of Guest system. >>>>>>>>>> Guest driver performs any work on the guest systems PM callback >>>>>>>>>> events in >>>>>>>>> the virtio driver. >>>>>>>>> I didn't find any PM state callback in the virtio driver. >>>>>>>>> >>>>>>>> There are virtio_suspend and virtio_resume in case of Linux. >>>>>>> I think what you said virtio_suspend/resume is freeze/restore >>>>>>> callback from "struct virtio_driver" or suspend/resume callback >>>>>>> from "static const struct dev_pm_ops virtio_pci_pm_ops". >>>>>>> And yes, I agree, if virtio devices have PM capability, maybe we >>>>>>> can set PM state in those callback functions. >>>>>>> >>>>>>>> >>>>>>>>>> >>>>>>>>>>> It's more suitable that each device gets notifications from >>>>>>>>>>> driver, and then do preserving resources operation. >>>>>>>>>> I agree that each device gets the notification from driver. >>>>>>>>>> The question is, should it be virtio driver, or existing pci >>>>>>>>>> driver which >>>>>>>>> transitions the state from d0->d3 and d3->d0 is enough. >>>>>>>>> It seems there isn't existing pci driver to transitions d0 or d3 >>>>>>>>> state. Could you please tell me which one it is specifically? I >>>>>>>>> am very willing to >>>>>>> give a try. >>>>>>>>> >>>>>>>> Virtio-pci modern driver of Linux should be able to. >>>>>>> Yes, I know, it is the VIRTIO_PCI_FLAG_INIT_PM_BIT. But still the >>>>>>> two problems I said above. >>>>>>> >>>>>> Both can be resolved without switching to pcie. >>>>>> >>>>>>>> >>>>>>>>>> Can you please check that? >>>>>>>>>> >>>>>>>>>>>>> --- a/transport-pci.tex >>>>>>>>>>>>> +++ b/transport-pci.tex >>>>>>>>>>>>> @@ -325,6 +325,7 @@ \subsubsection{Common configuration >>>>>>> structure >>>>>>>>>>>>> layout}\label{sec:Virtio Transport >>>>>>>>>>>>> /* About the administration virtqueue. */ >>>>>>>>>>>>> le16 admin_queue_index; /* read-only for driver */ >>>>>>>>>>>>> le16 admin_queue_num; /* read-only for driver */ >>>>>>>>>>>>> + le16 preserve_resources; /* read-write */ >>>>>>>>>>>> Preserving these resources in the device implementation takes >>>>>>>>>>>> finite amount >>>>>>>>>>> of time. >>>>>>>>>>>> Possibly more than 40nsec (time of PCIe write TLP). >>>>>>>>>>>> Hence this register must be a polling register to indicate >>>>>>>>>>>> that >>>>>>>>>>> preservation_done. >>>>>>>>>>>> This will tell the guest when the preservation is done, and >>>>>>>>>>>> when restoration is >>>>>>>>>>> done, so that it can resume upper layers. >>>>>>>>>>>> >>>>>>>>>>>> Please refer to queue_reset definition to learn more about >>>>>>>>>>>> such register >>>>>>>>>>> definition. >>>>>>>>>>> Thanks, I will refer to "queue_reset". So, I need three >>>>>>>>>>> values, driver write 1 to let device do preserving resources, >>>>>>>>>>> driver write >>>>>>>>>>> 2 to let device do restoring resources, device write 0 to tell >>>>>>>>>>> driver that preserving or restoring done, am I right? >>>>>>>>>>> >>>>>>>>>> Right. >>>>>>>>>> >>>>>>>>>> And if the existing pcie pm state bits can do, we can leverage that. >>>>>>>>>> If it cannot be used, lets add that reasoning in the commit log >>>>>>>>>> to describe this >>>>>>>>> register. >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Lets please make sure that PCIe PM level registers are >>>>>>>>>>>> sufficient/not-sufficient >>>>>>>>>>> to decide the addition of this register. >>>>>>>>>>> But if the device is not a PCIe device, it doesn't have PM >>>>>>>>>>> capability, then this will not work. Actually in my local >>>>>>>>>>> environment, pci_is_express() return false in Qemu, they are >>>>>>>>>>> not PCIe >>>>>>>>> device. >>>>>>>>>> It is reasonable to ask to plug in as PCIe device in 2023 to >>>>>>>>>> get new functionality that too you mentioned a gpu device. ð >>>>>>>>>> Which does not have very long history of any backward >> compatibility. >>>>>>>>> Do you suggest me to add PM capability for virtio-gpu or change >>>>>>>>> virtio-gpu to a PCIe device? >>>>>>>>> >>>>>>>> PCI Power Management Capability Structure does not seem to be >>>>>>>> limited to >>>>>>> PCIe. >>>>>>> I am not sure, but in current Qemu code, I can see the check >> "pci_is_express" >>>>>>> for all virtio-pci devices. If we want to add PM capability for >>>>>>> virtio-pci devices, we need to change them to PCIe device I think. >>>>>>> >>>>>> That is one option. >>>>>> Second option to extend PCI PM cap for non pci device because it is >> supported. >>>>>> >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Best regards, >>>>>>> Jiqian Chen. >>>>> >>>>> -- >>>>> Best regards, >>>>> Jiqian Chen. >>>> >>> >> >> -- >> Best regards, >> Jiqian Chen. -- Best regards, Jiqian Chen.
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]