OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: RE: [virtio-comment] RE: [PATCH V2 3/6] virtio: dont reset vqs when SUSPEND


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Thursday, November 16, 2023 3:40 PM
> 
> On 11/16/2023 1:35 AM, Parav Pandit wrote:
> >
> >> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >> Sent: Monday, November 13, 2023 2:53 PM
> >>
> >> On 11/10/2023 2:31 PM, Parav Pandit wrote:
> >>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >>>> Sent: Friday, November 10, 2023 11:52 AM
> >>>>
> >>>> On 11/9/2023 6:15 PM, Parav Pandit wrote:
> >>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >>>>>> Sent: Thursday, November 9, 2023 3:28 PM
> >>>>>>
> >>>>>> On 11/9/2023 1:46 AM, Michael S. Tsirkin wrote:
> >>>>>>> On Tue, Nov 07, 2023 at 05:27:23PM +0800, Zhu, Lingshan wrote:
> >>>>>>>> On 11/6/2023 5:49 PM, Michael S. Tsirkin wrote:
> >>>>>>>>> On Fri, Nov 03, 2023 at 06:34:34PM +0800, Zhu Lingshan wrote:
> >>>>>>>>>> When SUSPEND is set, device states and virtqueue states
> >>>>>>>>>> should be stablized, therefore the driver should not reset
> >>>>>>>>>> vqs when SUSPEND is set in device status.
> >>>>>>>>>>
> >>>>>>>>>> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
> >>>>>>>>>> ---
> >>>>>>>>>>       content.tex | 3 +++
> >>>>>>>>>>       1 file changed, 3 insertions(+)
> >>>>>>>>>>
> >>>>>>>>>> diff --git a/content.tex b/content.tex index bcc9d4b..060b5c2
> >>>>>>>>>> 100644
> >>>>>>>>>> --- a/content.tex
> >>>>>>>>>> +++ b/content.tex
> >>>>>>>>>> @@ -444,6 +444,9 @@ \subsubsection{Virtqueue
> >>>>>>>>>> Reset}\label{sec:Basic
> >>>>>> Facilities of a Virtio Device /
> >>>>>>>>>>       The device MUST reset any state of a virtqueue to the
> >>>>>>>>>> default
> >> state,
> >>>>>>>>>>       including the available state and the used state.
> >>>>>>>>>> +If VIRTIO_F_SUSPEND is negotiated and SUSPEND is set in
> >>>>>>>>>> +\field{device status}, the driver SHOULD NOT reset any
> virtqueues.
> >>>>>>>>>> +
> >>>>>>>>>>       \drivernormative{\paragraph}{Virtqueue Reset}{Basic
> >>>>>>>>>> Facilities of a
> >>>>>> Virtio Device / Virtqueues / Virtqueue Reset / Virtqueue Reset}
> >>>>>>>>>>       After the driver tells the device to reset a queue, the
> >>>>>>>>>> driver MUST verify that
> >>>>>>>>> Seems somewhat arbitrary and breaks the claim that the feature
> >>>>>>>>> is orthogonal and can have uses besides migration.
> >>>>>>>> when suspended, the device is frozen.
> >>>>>>>> The driver is aware of this process and so should not reset the
> >>>>>>>> vqs I
> >> think.
> >>>>>>> Again that is only true because you want to use it for migration.
> >>>>>>> But then you can't claim it's a generic facility.
> >>>>>> I don't get it. The device status is a basic facility.
> >>>>>>
> >>>>>> We need to SUSPEND the device by setting SUSPEND bit, to
> >>>>>> stabilize the device states for migration.
> >>>>> Is the PCI's PM time not enough to suspend the device?
> >>>>> For large device I could imagine it could be short.
> >>>> As you see, PCI PM, so this is a layer violation, virtio should be
> >>>> self contained,
> >>> If you think it is layer violation, than suspend bit for sure is not
> >>> needed. PCI
> >> PM interface should suspend/resume the device on D0<->D3 state
> transitions.
> >> Doesn't make sense logically, because it is layer violation, so you
> >> want it to be worse? For example, virito writes 0 to device status to reset a
> device, not by PCI.
> > All these layer violation thing is just abstract to me.
> > Your argument contradicts with your fellow author and yourself.
> I don't see how, we keep telling you virtio should be self contained, and suspend
> by PCI PM is a layer volition, this is a fact, right?
I donât see PCI PM as any layer violation. It is used by hundreds of industry devices.
You might want to ask PCI-SIG to eliminate PCI PM and get their feedback.

> >
> > I donât want to make it worse.
> > If you think its layer violation, just depend on the PCI PM, no need to include
> new suspend bit.
> Again, virtio should be self-contained, not layer volited, for example, we reset
> virito devices by writing 0 to device status, not by PCI FLR.
It is not for layer adherence or layer violation purpose.

It exists because it gives much flexibility in device implementation to not depend on FLR or PM timings.

> >
> >>>> and what about MMIO and CCW?
> >>> They have largely lacked the richness of PCI transport. So those
> >>> transport
> >> needs to evolve.
> >> I am not sure CCW and MMIO maintainers want to hear this.
> >>> Otherwise, PCI offers rich transport facilities compared to MMIO,
> >>> hence, it will
> >> continue wider use.
> >> you know this SUSPEND bit work fine on all transport, right? Because
> >> device_status is transport independent.
> > I want to emphasize that I am not against the suspend bit as long as it is guest
> driver controlled without interfering the device migration flow (like rest of the
> state).
> When migrate a device, it is the host who suspends the device. The reason is
> the live migration process should be transparent to the guest, so we should
> suspend the guest first, then suspend the device(by host).
This miss the fundamental point that I explained in the first paragraph of theory of operation.
I.e. hypervisor administers the device even when the guest is running.
This is the usually referreed as pre-copy phase.
During this stage large part of the device context read and write is done on src and dst hypervisor respectively.
This cuts down majority of the downtime.

> >
> > The practical reason for suspending functionality under guest control is, that
> resuming/suspending the large device can take time.
> > So let it be in guest driver control. No need to muddy with device migration
> flow.
> The time cost is reasonable in O(N) no matter how you suspend/resume the
> device.
In the proposed approach using admin commands, it is nearly O(1) for N devices and M queues per each.

> >
> >>>> This should be a basic facility.
> >>> Other transport can also offer like PCI.
> >> Do you want to work for these transport? Implementing the new
> >> features as PCI?
> > Not presently as PCI as more features than rest of the two.
> > What I read about ccw is: " S/390 based virtual machines support neither PCI
> nor MMIO".
> >
> > And I also read, "The IBM System/390 is a discontinued mainframe product
> family implementing".
> >
> > So I donât know who needs to extend ccw.
> > And if one needs, those maintainers will extend it to match to PCI standard.
> So these features are even not planned, so don't depend on them.
My series does not depend on ccw or mmio.
Not sure what you mean by donât depend on them.

> >
> >>>>> In that case if there is suspend the device available, it will be
> >>>>> used by the
> >>>> guest driver itself, hypervisor wouldnât know about it when those
> >>>> registers are not trapped.
> >>>>> So we need two ways to suspend.
> >>>>> One is guest visible, and guest controlled.
> >>>>> Second is hypervisor control to fulfill the device migration needs.
> >>>> The guest can eve reset the device.
> >>>>> So if you can please take a look if the proposed admin command to
> >>>> freeze/stop mode can be used in the emulated register case or not.
> >>>>> It helps to have the suspend bit in guest control as well
> >>>>> with/without
> >>>> emulation mode.
> >>>> Parav, please believe I have read your series, I didn't comment
> >>>> there because I want to avoid further conflicts/debating, we have
> >>>> done these
> >> enough.
> >>> I believe the series posted in v3 can support vdpa use case as well.
> >>> So I will progress to post v4.
> >>>
> >>>> As explained before, freeze/stop the device by PCI is a layer violation.
> >>> I am afraid, we have different vision.
> >>> I donât see any layer violation.
> >>> Suspend is enough in the PCI PM.
> >>> Our vision is more aligned with rest of the hypervisor knobs that
> >>> owns the
> >> migration framework.
> >> I think I have explained, virito builds on other transport and it
> >> should be self- contained, so far so good.
> > Virtio without any transport binding is just blank paper discussion.
> virtio is built on some transports, but not bind to any.
Not sure what you mean not bind to any.

Virtio objects has transport binding such as pci for driver notifications, device config, q enable etc.

> >
> >>>> And device status can be pass-through(without emulation, just map
> >>>> it to
> >>>> guest) to the guest or trapped(trap and emulate by the hypervisor,
> >>>> for example set_status in vDPA).
> >>> When it is pass-through, it is controlled by the guest, so for
> >>> example, if the
> >> guest resets the device, hypervisor has lost the control of migration context
> etc.
> >>> Hence, hypervisor needs a channel which is not guest owned.
> >>>
> >>> Same channel can work when trap+emulation is done.
> >> It is the guest owns the device, it can reset the device, once reset,
> >> the device context are cleared.
> > Hypervisor do not have the ability to read/write the device context. It lost the
> channel as hypervisor is not involved in trap+emulation.
> > So it is not helpful in one use case.
> >
> > Admin commands can work even with trap+emulation mode.
> >
> > What is missing, that should be added?
> as explained above, when live migration, the guest should be suspended first, at
> this point, the host owns the device, it has access to the device.
This is the missing piece for you for a long time.
The proposed admin command approach, the guest is not suspend first and accessed by the hypervisor.

Think of this as PML for device context... (just for analogy purpose).

> >
> >>>>>> This can also be used for debugging I think.
> >>>>> As Michael listed, a dedicated debug interface is usually more
> >>>>> useful instead
> >>>> of in-band.
> >>>> re-using another facility without extra efforts is not a bad thing anyway.
> >>> I just donât see how a suspend bit some debug feature.
> >>> Almost everything with that regard is a debug feature to me.
> >> suspend then check the device states?
> > You already suspended the device, so device state is already changed.
> > All debug information is changed, so not useful now.
> When suspended, the device should keep and stabilize its device states, at least
> in my series it should behave like this.



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]