virtio-comment message

Subject: Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP

From: Cornelia Huck <cohuck@redhat.com>
To: Jason Wang <jasowang@redhat.com>
Date: Tue, 22 Dec 2020 16:54:31 +0100

On Tue, 22 Dec 2020 20:51:04 +0800
Jason Wang <jasowang@redhat.com> wrote:

> On 2020/12/22 äå8:14, Cornelia Huck wrote:
> > On Tue, 22 Dec 2020 15:30:31 +0800
> > Jason Wang <jasowang@redhat.com> wrote:
> >  
> >> On 2020/12/22 äå2:50, Halil Pasic wrote:  
> >>> On Tue, 22 Dec 2020 10:36:41 +0800
> >>> Jason Wang <jasowang@redhat.com> wrote:
> >>>     
> >>>> On 2020/12/22 äå5:33, Halil Pasic wrote:  
> >>>>> On Fri, 18 Dec 2020 12:23:02 +0800
> >>>>> Jason Wang <jasowang@redhat.com> wrote:
> >>>>>     
> >>>>>> This patch introduces a new status bit DEVICE_STOPPED. This will be
> >>>>>> used by the driver to stop and resume a device. The main user will be
> >>>>>> live migration support for virtio device.
> >>>>>>     
> >>>>> Can you please provide some more background information, or point
> >>>>> me to the appropriate discussion?
> >>>>>
> >>>>> I mean AFAIK migration already works without this driver initiated
> >>>>> drain. What is the exact motivation? What about the big picture? I
> >>>>> guess some agent in the guest would have to make the driver issue
> >>>>> the DEVICE_STOP.  
> >>>> This is not necessary if the datapath is done inside qemu and when
> >>>> migration is initiated by qemu itself.
> >>>>
> >>>> But it's a must for using virtio-device as a backend for emulated virtio
> >>>> devices (e.g vhost-vDPA). In this case, qemu needs to stop the device
> >>>> then it can safely synchronize the state from them.
> >>>>     
> >>> You say, in this case qemu needs to stop the device, which makes sense
> >>> (it also has to do this when the datapath is implemented in qemu), but
> >>> AFAIU DEVICE_STOPPED is initiated by the guest and not by qemu. I'm
> >>> confused.  
> >>
> >> It's initiated by Qemu. Guest is unware of live migration.  
> > But isn't setting DEVICE_STOPPED a _driver_ initiated process? That
> > sounds like "guest" to me.  
> 
> 
> I think we need clarification on the "driver".

Looking through the spec, I think we have never properly defined what
we mean by "device" or "driver". Not sure what a good definition would
look like. I would consider a "device" an entity that is present and is
configured/used by a "driver". The "driver" is the means to actually
configure/use a "device", i.e. the "initiating" part (even though some
actions are initiated by the device once the driver has started
interacting with it).

> Think the following setup:
> 
> Guest driver <-> Qemu <-> vhost-vdpa <-> vDPA parent (virtio-pci vDPA 
> driver)
> 
> It's the Qemu that initiates the stop, and the request is forwarded to 
> virtio-pci vDPA driver. So the process is transparent to the guest (driver).

So it's

Guest driver <-> QEMU <-> vhost-vdpa <-> vDPA parent
|<-- guest -->|<-------------- host -------------->|

but the vDPA parent is itself acting as a "driver" if you take my
attempt at a definition above?

(Wouldn't that mean that the device in QEMU is configured/used from two
entities?)

> 
> 
> >  
> >>  
> >>> I'm still curious about how the different components in the stack
> >>> (guest OS, qemu, vdpa-vhost in host kernel, the PCI function) are
> >>> supposed to interact.  
> >>
> >> It works like:
> >>
> >>   From Qemu point of view, vhost-vDPA is just another type of vhost
> >> backend. Qemu needs to stop virtio (vhost) before it can do migration.
> >> So we require vDPA devices to have the ability of stopping or pausing
> >> its datapath. If the vDPA device is by chance the virtio-PCI device, it
> >> needs an interface for receiving stop/resume command from the driver.
> >>
> >> So the devce stop/resume command was sent from Qemu to vhost-VDPA, then
> >> to vDPA parent which could be a virtio-PCI device in this case.  
> > But QEMU implements the _device_, not the driver, doesn't it?  
> 
> 
> The device is implemented with the co-operation between Qemu and 
> vhost-vDPA. During migration qemu need to stop the virtio-net device, 
> then vhost must be stopped as well.

Yes, it makes sense that any vhost parts need to be stopped as well.

> 
> 
> > And IIUC,
> > vhost-VDPA and the vDPA parent are also on the device side. I feel like
> > I'm missing something essential here.  
> 
> 
> Virtio-PCI driver could be a vDPA parent as well in this case. So we 
> need stop the virtio-pci device.

Who is the "driver" in that case?

> 
> 
> >  
> >>  
> >>>     
> >>>>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
> >>>>>> ---
> >>>>>>     content.tex | 26 ++++++++++++++++++++++++--
> >>>>>>     1 file changed, 24 insertions(+), 2 deletions(-)
> >>>>>>
> >>>>>> diff --git a/content.tex b/content.tex
> >>>>>> index 61eab41..4392b60 100644
> >>>>>> --- a/content.tex
> >>>>>> +++ b/content.tex
> >>>>>> @@ -47,6 +47,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
> >>>>>>     \item[DRIVER_OK (4)] Indicates that the driver is set up and ready to
> >>>>>>       drive the device.
> >>>>>>     
> >>>>>> +\item[DEVICE_STOPPED (32)] When VIRTIO_F_DEVICE_STOPPED is negotiated,
> >>>>>> +  indicates that the device has been stopped by the driver.
> >>>>>> +  
> >>>>> AFAIU it is not only about indicating stopped, but also requesting to be
> >>>>> stopped.
> >>>>>
> >>>>> More importantly, that must not be set immediately, in a sense that the
> >>>>> one side initiates some action by requesting the bit to be set, and the
> >>>>> other side must not set the bit before the action is performed.  
> >>>> Yes.
> >>>>
> >>>>     
> >>>>> We also
> >>>>> seem to assume that every device implementation is capable of performing
> >>>>> this trick.  
> >>>> A dedicated feature bit is introduced for this.
> >>>>     
> >>> This is not about the feature bit, but about the mechanism. But your
> >>> subsequent answers explain, that this is nothing unusual, and then
> >>> we should be fine.
> >>>        
> >>>>> Is it for hardware devices (e.g. PCI) standard to request an
> >>>>> operation by writing some value into a register, and get feedback bout
> >>>>> a non-completion by reading different value that written,  
> >>>> This is not ununsal in other devices. And in fact, the FEATURES_OK works
> >>>> like this:
> >>>>
> >>>> """
> >>>>
> >>>> The device MUST NOT offer a feature which requires another feature which
> >>>> was not offered. The device SHOULD accept any valid subset of features
> >>>> the driver accepts, otherwise it MUST fail to set the FEATURES_OK device
> >>>> status bit when the driver writes it.
> >>>>
> >>>> """
> >>>>     
> >>> Thanks for the pointer. I intend to have another look at how FEATURES_OK
> >>> works, and how similar this is to DEVICE_STOPPED.  
> >>
> >> My understanding is that for both of them, driver can try to set the bit
> >> by writing to the status register but it's the device that decide when
> >> to set the bit.  
> > I think there's a difference: For FEATURES_OK, the driver can read back
> > the status and knows that the feature combination is not accepted if it
> > is not set. For DEVICE_STOPPED, it does not really know whether the
> > device has started to stop yet. It only knows that the device is
> > actually done stopping when DEVICE_STOPPED is set.  
> 
> 
> The only difference is that device may need sometime to be stopped. 
> FEATURES_OK might not be the best example. Let's see how reset work 
> which is more similar to DEVICE_STOPPED:
> 
> """
> 
> After writing 0 to device_status, the driver MUST wait for a read of 
> device_status to return 0 before reinitializing the device.
> 
> """

Hm, but that is PCI-specific writing of fields in the configuration
structure, right? CCW uses a different method for resetting (an
asynchronous channel command; channel commands create an interrupt when
done); "write 0 to status for reset" is not universal.

Which makes me think: is using the device status a good universal
method for stopping the device? If I look at it only from the CCW
perspective, I'd have used a new channel command with a payload of
stop/resume and only reflected the stopped state in the device status
(i.e. read-only from the driver, and only changed by the device when it
transitions to/from the stopped state.) Could PCI use a new field for
that?

> 
> >
> > Do we need a different mechanism for the device to signal the driver
> > that the device has actually been stopped? If the driver sees the
> > STOPPED status after reading back, it knows that the device has
> > acknowledged the request and is now stopping down. If it is not set, it
> > means that the device has not honoured the request. Same for clearing
> > it to resume.  
> 
> 
> I don't see much difference. Device reset doesn't have such notification 
> and driver may simply poll or use a timer.

See above; CCW does not really use the device status in the same way. I
think a notification would be useful.

Follow-Ups:
- Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP
  - From: Jason Wang <jasowang@redhat.com>

References:
- [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP
  - From: Jason Wang <jasowang@redhat.com>
- Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP
  - From: Halil Pasic <pasic@linux.ibm.com>
- Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP
  - From: Jason Wang <jasowang@redhat.com>
- Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP
  - From: Halil Pasic <pasic@linux.ibm.com>
- Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP
  - From: Jason Wang <jasowang@redhat.com>
- Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP
  - From: Cornelia Huck <cohuck@redhat.com>
- Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP
  - From: Jason Wang <jasowang@redhat.com>