virtio-comment message

Subject: Re: [virtio-comment] RE: [PATCH v6 1/1] content: Add new feature VIRTIO_F_PRESERVE_RESOURCES

From: "Chen, Jiqian" <Jiqian.Chen@amd.com>
To: Parav Pandit <parav@nvidia.com>
Date: Mon, 15 Jan 2024 09:40:12 +0000

On 2024/1/15 17:16, Parav Pandit wrote:
> 
>> From: Chen, Jiqian <Jiqian.Chen@amd.com>
>> Sent: Monday, January 15, 2024 2:40 PM
>>
>> On 2024/1/15 16:52, Parav Pandit wrote:
>>>
>>>> From: Chen, Jiqian <Jiqian.Chen@amd.com>
>>>> Sent: Monday, January 15, 2024 1:51 PM
>>>>
>>>> On 2024/1/15 15:55, Parav Pandit wrote:
>>>>>
>>>>>> From: Chen, Jiqian <Jiqian.Chen@amd.com>
>>>>>> Sent: Monday, January 15, 2024 1:18 PM
>>>>>>
>>>>>> On 2024/1/15 15:37, Parav Pandit wrote:
>>>>>>>
>>>>>>>> From: Chen, Jiqian <Jiqian.Chen@amd.com>
>>>>>>>> Sent: Monday, January 15, 2024 1:03 PM
>>>>>>>>
>>>>>>>> On 2024/1/12 17:44, Parav Pandit wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> From: Chen, Jiqian <Jiqian.Chen@amd.com>
>>>>>>>>>> Sent: Friday, January 12, 2024 2:55 PM
>>>>>>>>>>
>>>>>>>>>> On 2024/1/12 16:47, Parav Pandit wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> From: Chen, Jiqian <Jiqian.Chen@amd.com>
>>>>>>>>>>>> Sent: Friday, January 12, 2024 1:55 PM
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 2024/1/12 16:02, Parav Pandit wrote:
>>>>>>>>>>>>> Hi Jiqian,
>>>>>>>>>>>>>
>>>>>>>>>>>>>> From: Chen, Jiqian <Jiqian.Chen@amd.com>
>>>>>>>>>>>>>> Sent: Friday, January 12, 2024 1:11 PM
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>> Sorry to reply late.
>>>>>>>>>>>>>> I don't know if you still remember this problem, let me
>>>>>>>>>>>>>> briefly descript
>>>>>>>>>> it.
>>>>>>>>>>>>>> I am working to implement virtgpu S3 function on Xen.
>>>>>>>>>>>>>> Currently on Xen, if we start a guest through qemu with
>>>>>>>>>>>>>> enabling virtgpu, and then suspend and resume guest. We can
>>>>>>>>>>>>>> find that the guest kernel comes back, but the display doesn't.
>>>>>>>>>>>>>> It just shown a black
>>>>>>>>>>>> screen.
>>>>>>>>>>>>>> That is because during suspending, guest called into qemu
>>>>>>>>>>>>>> and qemu destroyed all display resources and reset
>>>>>>>>>>>>>> renderer. This made the display gone after guest resumed.
>>>>>>>>>>>>>> So, I add a new command for virtio-gpu that when guest is
>>>>>>>>>>>>>> suspending, it will notify qemu and set
>>>>>>>>>>>>>> parameter(preserver_resource) to 1 and then qemu will
>>>>>>>>>>>>>> preserve resources, and when resuming, guest will notify
>>>>>>>>>>>>>> qemu to set parameter to 0, and then qemu will keep the
>>>>>>>>>>>>>> normal
>>>> actions.
>>>>>>>>>>>>>> That can
>>>>>>>>>> help guest's display come back.
>>>>>>>>>>>>>> When I upstream above implementation, Parav and MST
>> suggest
>>>>>>>>>>>>>> me
>>>>>>>> to
>>>>>>>>>>>> use
>>>>>>>>>>>>>> the PM capability to fix this problem instead of adding a
>>>>>>>>>>>>>> new command or state bit.
>>>>>>>>>>>>>> Now, I have tried the PM capability of virtio-gpu, it can't
>>>>>>>>>>>>>> be used to solve this problem.
>>>>>>>>>>>>>> The reason is:
>>>>>>>>>>>>>> during guest suspending, it will write D3 state through PM
>>>>>>>>>>>>>> cap, then I can save the resources of virtio-gpu on Qemu
>>>>>>>>>>>>>> side(set preserver_resource to 1), but during the process
>>>>>>>>>>>>>> of resuming, the state of PM cap will be cleared by qemu
>>>>>>>>>>>>>> resetting(qemu_system_wakeup-> qemu_devices_reset->
>>>>>>>>>>>>>> virtio_vga_base_reset-> virtio_gpu_gl_reset), it causes
>>>>>>>>>>>>>> that when guest reads state from PM cap, it will find the
>>>>>>>>>>>>>> virtio-gpu has already been D0 state,
>>>>>>>>>>>>> This behavior needs to be fixed. As the spec listed out, "
>>>>>>>>>>>>> This 2-bit field is
>>>>>>>>>>>> used both to determine the current power state of a Function"
>>>>>>>>>>>> Do you mean it is wrong to reset PM cap when vritio_gpu reset
>>>>>>>>>>>> in current qemu code? Why?
>>>>>>>>>>> Because PM is implementing the support from D3->d0 transition
>>>>>>>>>>> and if it
>>>>>>>>>> device in D3, the device needs to respond that it is in D3 to
>>>>>>>>>> match the PCI spec.
>>>>>>>>>>>
>>>>>>>>>>>> Shouldn't all device states, including PM registers, be reset
>>>>>>>>>>>> during the process of virtio-gpu reset?
>>>>>>>>>>> If No_Soft_Reset== 0, no device context to be preserved.
>>>>>>>>>>> If No_Soft_Reset == 1, device to preserve minimal registers
>>>>>>>>>>> and other
>>>>>>>>>> things listed in pci spec.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> So device needs to return D3 in the PowerState field.  This is
>> must.
>>>>>>>>>>>> But current behavior is a kind of soft reset, I
>>>>>>>>>>>> think(!PCI_PM_CTRL_NO_SOFT_RESET). The pm cap is reset by
>>>> qemu
>>>>>> is
>>>>>>>>>>>> reasonable.
>>>>>>>>>>> What you described is, you need to do No_Soft_Reset=1, so
>>>>>>>>>>> please set the
>>>>>>>>>> cap accordingly to achieve the restore.
>>>>>>>>>>>
>>>>>>>>>>> Also in your case the if the QEMU knows that it will have to
>>>>>>>>>>> resume the
>>>>>>>>>> device soon.
>>>>>>>>>>> Hence, it can well prepare the gpu context before resuming the
>> vcpus.
>>>>>>>>>>> In such case, the extra register wouldnât be necessary.
>>>>>>>>>>> Having PM control register is anyway needed to drive the
>>>>>>>>>>> device
>>>>>> properly.
>>>>>>>>>> Even as you said, the reset behavior of PM in the current QEMU
>>>>>>>>>> code is incorrect, and even if it is modified, it also cannot
>>>>>>>>>> fix this problem with
>>>>>> S3.
>>>>>>>>> You meant D3?
>>>>>>>>>
>>>>>>>>>> Because there is twice device reset during resuming.
>>>>>>>>>> First is when we trigger a resume command to qemu, qemu will
>>>>>>>>>> reset device(qemu_system_wakeup-> qemu_devices_reset->
>>>>>>>>>> virtio_vga_base_reset-> virtio_gpu_gl_reset).
>>>>>>>>> A device implementation that supports PM should not reset the
>> device.
>>>>>>>> Please fix it.
>>>>>>>> Do you mean the devices_reset behavior of qemu shouldn't happen
>>>>>>>> when a device has PM cap?
>>>>>>> Right. Device reset should not happen on PM capable device that
>>>>>>> offers
>>>>>> no_soft_reset==1.
>>>>>> But current qemu code, no_soft_reset==0.
>>>>>>
>>>>> So please fix the qemu when you are adding the functionality of
>>>>> restoring
>>>> the device context on PM.
>>>> My patch didn't add the functionality of restoring the device context on
>> PM.
>>>> Is no_soft_reset must set 1 when a device has PM cap?
>>>
>>> It the device wants to expose the ability to save and restore context, then
>> yes, no_soft_reset = 1, as defined by the PCI spec.
>>>
>>>> It should also be
>>>> allowed for situations that are not equal to 1, right?
>>> I donât see the need of it. It would not be aligned to no_soft_reset bit already
>> defined bit.
>>>
>>>> There should be devices that has PM cap but no_soft_reset is 0
>>>> because that devices can't store and restore the device context on
>>>> d3->d0 transition, like current qemu code.
>>> Right, PM cap with no_soft_reset=0, will not restore the context on d3->d0
>> as today.
>>>
>>>> So the fix should be feasible for all situations that whether the
>>>> no_soft_reset is
>>>> 1 or not.
>>> I donât follow the "fix should be feasible".
>> I mean the fix that to solve the problem(virtio-gpu's display resources are
>> destroyed during S3) should also be feasible to the situation that
>> no_soft_reset == 0.
>>
>>> QEMU reset the device on d3->d0 transition when no_soft_reset = 0 seems
>> fine because it follows the defined pci spec.
>> Right.
>> Now that you also think the current implementation of QEMU is reasonable,
>> the fix to this problem(display resources are destroyed during S3) should be
>> applicable to the current QEMU code.
> Yes.
> 
> To conclude our discussion,
> 
> Display resources should be only restored when following conditions are met.
> 1. PCI PM cap is reported.
> 2. PCI PM cap has non_soft_reset=1
> 3. virtio guest driver do not perform reset when transport offers a restore capability using #1 and #2.
> 
> Do you agree? Yes?
Yes, I think this problem (display resources are destroyed during S3) can be sorted to two situations:
First is what you said above, in this situation, the devices_reset of qemu is unreasonable, if a device has PM cap and non_soft_reset=1, qemu should not do resetting.
Second is without #1 or #2, the reset behavior is fine. My patch is to fix this situation, that sending a new virtio-gpu command to notify qemu to prevent destroying display resources during S3.

> 
>> If using PM cap, it cannot solve the problem encountered in current QEMU
>> code.
>> I think you may not have had a detailed understanding of my modifications. I
>> will resend the latest modifications on the Linux kernel and QEMU to
>> upstream.
>> Welcome to review.

-- 
Best regards,
Jiqian Chen.

Follow-Ups:
- RE: [virtio-comment] RE: [PATCH v6 1/1] content: Add new feature VIRTIO_F_PRESERVE_RESOURCES
  - From: Parav Pandit <parav@nvidia.com>

References:
- RE: [PATCH v6 1/1] content: Add new feature VIRTIO_F_PRESERVE_RESOURCES
  - From: Parav Pandit <parav@nvidia.com>
- Re: [PATCH v6 1/1] content: Add new feature VIRTIO_F_PRESERVE_RESOURCES
  - From: "Chen, Jiqian" <Jiqian.Chen@amd.com>
- RE: [PATCH v6 1/1] content: Add new feature VIRTIO_F_PRESERVE_RESOURCES
  - From: Parav Pandit <parav@nvidia.com>
- Re: [PATCH v6 1/1] content: Add new feature VIRTIO_F_PRESERVE_RESOURCES
  - From: "Chen, Jiqian" <Jiqian.Chen@amd.com>
- RE: [PATCH v6 1/1] content: Add new feature VIRTIO_F_PRESERVE_RESOURCES
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] RE: [PATCH v6 1/1] content: Add new feature VIRTIO_F_PRESERVE_RESOURCES
  - From: "Chen, Jiqian" <Jiqian.Chen@amd.com>
- RE: [virtio-comment] RE: [PATCH v6 1/1] content: Add new feature VIRTIO_F_PRESERVE_RESOURCES
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] RE: [PATCH v6 1/1] content: Add new feature VIRTIO_F_PRESERVE_RESOURCES
  - From: "Chen, Jiqian" <Jiqian.Chen@amd.com>
- RE: [virtio-comment] RE: [PATCH v6 1/1] content: Add new feature VIRTIO_F_PRESERVE_RESOURCES
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] RE: [PATCH v6 1/1] content: Add new feature VIRTIO_F_PRESERVE_RESOURCES
  - From: "Chen, Jiqian" <Jiqian.Chen@amd.com>
- RE: [virtio-comment] RE: [PATCH v6 1/1] content: Add new feature VIRTIO_F_PRESERVE_RESOURCES
  - From: Parav Pandit <parav@nvidia.com>