virtio-comment message

Subject: RE: [virtio-comment] RE: [PATCH v6 1/1] content: Add new feature VIRTIO_F_PRESERVE_RESOURCES

From: Parav Pandit <parav@nvidia.com>
To: "Chen, Jiqian" <Jiqian.Chen@amd.com>
Date: Tue, 16 Jan 2024 08:27:20 +0000

> From: Chen, Jiqian <Jiqian.Chen@amd.com>
> Sent: Tuesday, January 16, 2024 1:51 PM
> 
> On 2024/1/16 15:19, Parav Pandit wrote:
> >
> >> From: Chen, Jiqian <Jiqian.Chen@amd.com>
> >> Sent: Tuesday, January 16, 2024 12:08 PM
> >>
> >> On 2024/1/15 19:10, Parav Pandit wrote:
> >>>
> >>>> From: Chen, Jiqian <Jiqian.Chen@amd.com>
> >>>> Sent: Monday, January 15, 2024 4:37 PM
> >>>>
> >>>> On 2024/1/15 18:52, Parav Pandit wrote:
> >>>>>
> >>>>>> From: Chen, Jiqian <Jiqian.Chen@amd.com>
> >>>>>> Sent: Monday, January 15, 2024 4:17 PM
> >>>>>>
> >>>>>> On 2024/1/15 17:46, Parav Pandit wrote:
> >>>>>>>
> >>>>>>>> From: Chen, Jiqian <Jiqian.Chen@amd.com>
> >>>>>>>> Sent: Monday, January 15, 2024 3:10 PM
> >>>>>>>
> >>>>>>>>> Display resources should be only restored when following
> >>>>>>>>> conditions are
> >>>>>>>> met.
> >>>>>>>>> 1. PCI PM cap is reported.
> >>>>>>>>> 2. PCI PM cap has non_soft_reset=1 3. virtio guest driver do
> >>>>>>>>> not perform reset when transport offers a restore
> >>>>>>>> capability using #1 and #2.
> >>>>>>>>>
> >>>>>>>>> Do you agree? Yes?
> >>>>>>>> Yes, I think this problem (display resources are destroyed
> >>>>>>>> during
> >>>>>>>> S3) can be sorted to two situations:
> >>>>>>>> First is what you said above, in this situation, the
> >>>>>>>> devices_reset of qemu is unreasonable, if a device has PM cap
> >>>>>>>> and non_soft_reset=1, qemu should not do resetting.
> >>>>>>>
> >>>>>>>> Second is without #1 or #2, the reset behavior is fine. My
> >>>>>>>> patch is to fix this situation, that sending a new virtio-gpu
> >>>>>>>> command to notify qemu to prevent destroying display resources
> during S3.
> >>>>>>>
> >>>>>>> I still have hard time following "My patch is to fix this situation...".
> >>>>>>>
> >>>>>>> When #1 and #2 is not done, there is nothing to restore.
> >>>>>>> Driver should not send some new virtio specific command when #1
> >>>>>>> and
> >>>>>>> #2 is
> >>>>>> not there.
> >>>>>>> Instead, if the device wants to restore the context, all #1, #2
> >>>>>>> and
> >>>>>>> #3 should
> >>>>>> be done to implement the restore functionality.
> >>>>>> When #1 and #2 is done. The "device reset behavior" should not
> >>>>>> happen. And then the display resources are not destroyed.
> >>>>>> I didnât say send command to restore context.
> >>>>>
> >>>>>> I mean when #1 and #2 is not done. Driver and qemu both reset
> >>>>>> devices when resuming, at that time,
> >>>>> Above behavior is as per the spec guidelines. Hence it is fine.
> >>>>>
> >>>>>> we need a method to prevent the display resources from destroying.
> >>>>> I disagree to above as it is not as per the spec guidelines.
> >>>>> Just follow #1, #2 and #3 to not destroy the display resource destroying.
> >>>> I agree that follow #1, #2 and #3 will not destroy display resource.
> >>>> So the question goes back: If there is no #2 and you say that the
> >>>> reset behavior complies with the spec guidelines, how can I make
> >>>> the virtio gpu work properly with display resources destroyed?
> >>> Implement #2. :)
> >> We can't simply implement #2 if a device doesn't have #2,
> > How do you define "We" above?
> > Isn't we == device?
> >
> >> see Section 7.5.2.2
> >> in PCI Express Spec, about No_Soft_Reset:
> >> " If a VF implements the Power Management Capability, the VF's value
> >> of this field must be identical to the associated PF's value. "
> >> What's more, a device doesn't have #2 is allowed, we should consider
> >> how to solve the two situations(No_Soft_Reset=1 and No_Soft_Reset=0),
> >> rather than simply considering implementing #2 for the device when it does
> not have #2.
> > A device implementation (sw or hw) that wants to offer maintain the
> function context will implement No_Soft_reset=1.
> >
> >> Also in PCI Express Spec:
> >> " Functional context is required to be maintained by Functions in the
> >> D3Hot state if the No_Soft_Reset field in the PMCSR is Set. In this
> >> case, System Software is not required to re-initialize the Function
> >> after a transition from D3Hot to D0 (the Function will be in the
> >> D0active state). If the No_Soft_Reset bit is Clear, functional
> >> context is not required to be maintained by the Function in the D3Hot
> state."
> > Right. It is NOT required to maintain. Hence, lets not maintain.
> >
> >> This description corresponds to the two situations we are discussing.
> >> First is a device has #2(No_Soft_Reset=1), the device resetting will
> >> not happen, then the display resources of virtio-gpu problem will not
> happen.
> > Right.
> >
> >> But in Second, a device doesn't have #2(No_Soft_Reset=0), it is fine
> >> for system to do device resetting, at this time, the display
> >> resources of virtio-gpu problem will happen, we should also consider
> >> a method to solve that problem, but not to implement #2.
> > No, because as wrote in the spec, ", functional context is not required to be
> maintained". Hence no need to maintain.
> >
> >>
> >>>
> >>>> What my patch do is not to prevent resetting, is to prevent
> >>>> destroying resources during resetting. Gerd Hoffmann has agreed to
> >>>> this point in my qemu patch reviewing.
> >>>> Even if I use PM cap, I still need to prevent destroying resources
> >>>> during resetting.
> >>> Do you mean when virtio device is reset, you want to prevent?
> >> I want to protect display resources, not to prevent the whole process
> >> of device resetting.
> > This is contradicting.
> > When you reset the device it will reset the functional context as well as
> guided by pci.
> >
> >> Also in PCI Express Spec:
> >> "Note that a Function's software driver participates in the process
> >> of transitioning the Function from D0 to D3Hot. It contributes to the
> >> process by saving any functional state that would otherwise be lost
> >> with removal of main power, and otherwise preparing the Function for the
> transition to D3Hot."
> >> I let guest driver to notify qemu to protect the display resources
> >> that will be loss during S3, it is also compliant with Spec.
> > This is already done by the guest PCI driver using the PM control bits. Right?
> > Only thing needed is to fix the virtio layer to honor the PCI PM capabilities
> and skip resetting the device.
> >
> >>
> >>> If so, that is yet additional virtio spec hack that must be avoided
> >>> as reset flow
> >> should be one.
> >>> Please do #3 to avoid resetting the device.
> >> The same reason as above. According to the regulations of the spec,
> >> we should allow for the occurrence of two situations(No_Soft_Reset=1
> >> and
> >> No_Soft_Reset=0) and provide solutions for each situation, rather
> >> than simply categorizing them into one.
> >>
> > No_soft_reset=0 has no obligation to restore the functional context.
> >
> > It seems to me that you are trying to maintain the function context EVEN on
> device reset for the motivation of not modifying the guest driver.
> > If that is the motivation, is certainly not right to work around things that
> way.
> > I hope my interpretation is wrong. :)
> You are wrong, I am not for the motivation of not modifying the guest driver.
Ok. good.

> Because guest driver doesn't have enough information to recreate all the
> display resources, this has been discussed and agreed upon in my qemu patch
> review.
Ok. I am also in favor of restoring the resources on d3->d0.
We are aligned for #1 and #3.

> I just want to ask you if a device has no #2 (No_Soft_Reset=0), how do you
> solve the display resources of virtio-gpu destroyed problem? 
By implementing #2.

(Not to> implement #2, because this situation exists,
Which situation exists? Device implementation is missing #2?
If so, lets improve the device implementation to do #2.

 is reasonable, and complies with
> the Spec. Shouldn't we also consider how to modify code to make the virtio-
> gpu work properly in this situation?  You can't say to directly implement #2 to
> eliminate the existence of this situation, can you?)
I didnât follow, what is the problem in implementing #2?

References:
- RE: [virtio-comment] RE: [PATCH v6 1/1] content: Add new feature VIRTIO_F_PRESERVE_RESOURCES
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] RE: [PATCH v6 1/1] content: Add new feature VIRTIO_F_PRESERVE_RESOURCES
  - From: "Chen, Jiqian" <Jiqian.Chen@amd.com>
- RE: [virtio-comment] RE: [PATCH v6 1/1] content: Add new feature VIRTIO_F_PRESERVE_RESOURCES
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] RE: [PATCH v6 1/1] content: Add new feature VIRTIO_F_PRESERVE_RESOURCES
  - From: "Chen, Jiqian" <Jiqian.Chen@amd.com>
- RE: [virtio-comment] RE: [PATCH v6 1/1] content: Add new feature VIRTIO_F_PRESERVE_RESOURCES
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] RE: [PATCH v6 1/1] content: Add new feature VIRTIO_F_PRESERVE_RESOURCES
  - From: "Chen, Jiqian" <Jiqian.Chen@amd.com>
- RE: [virtio-comment] RE: [PATCH v6 1/1] content: Add new feature VIRTIO_F_PRESERVE_RESOURCES
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] RE: [PATCH v6 1/1] content: Add new feature VIRTIO_F_PRESERVE_RESOURCES
  - From: "Chen, Jiqian" <Jiqian.Chen@amd.com>
- RE: [virtio-comment] RE: [PATCH v6 1/1] content: Add new feature VIRTIO_F_PRESERVE_RESOURCES
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] RE: [PATCH v6 1/1] content: Add new feature VIRTIO_F_PRESERVE_RESOURCES
  - From: "Chen, Jiqian" <Jiqian.Chen@amd.com>
- RE: [virtio-comment] RE: [PATCH v6 1/1] content: Add new feature VIRTIO_F_PRESERVE_RESOURCES
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] RE: [PATCH v6 1/1] content: Add new feature VIRTIO_F_PRESERVE_RESOURCES
  - From: "Chen, Jiqian" <Jiqian.Chen@amd.com>