OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [virtio-comment] [PATCH v1 1/8] admin: Add theory of operation for device migration


On Tue, Oct 10, 2023 at 02:09:27PM +0000, Parav Pandit wrote:
> 
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Tuesday, October 10, 2023 7:30 PM
> 
> > > The hypervisor driver composes the vPCI device. So there isnât a need to
> > migrate the pci state.
> > > Only exception is VIRTIO_PCI_CAP_PCI_CFG, which is covered in this v1.
> > >
> > 
> > yes but what seems implicit is that device is in some reasonable state when
> > thing thing happens. e.g. are there no limitations at all e.g. in which order
> > things happen? can you really first configure virtio then pci config? for sure?
> > 
> First pci config is setup, like bus master enable etc.
> After that point, the device is handed to virtio things.
> 
> From device context write perspective, I doubt the order matters.
> For example, if pci bus master and msix are enabled after device context restore or before would not matter much.
> As long as they are done before making the device mode to active.

whatever the requirements, document them.

> > > > > > >
> > > > > > > > >and device configuration space may change. \\
> > > > > > > > > +\hline
> > > > > > > >
> > > > > > > > I still don't get why we need a "stop" state in the middle.
> > > > > > > >
> > > > > > > All pci devices which belong to a single guest VM are not
> > > > > > > stopped
> > > > atomically.
> > > > > > > Hence, one device which is in freeze mode, may still receive
> > > > > > > driver notifications from other pci device,
> > > > > >
> > > > > > Device may choose to ignore those notifications, no?
> > > > > >
> > > > > > > or it may experience a read from the shared memory and get
> > > > > > > garbage
> > > > data.
> > > > > >
> > > > > > Could you give me an example for this?
> > > > > >
> > > > > Section 2.10 Shared Memory Regions.
> > > > >
> > > > > > > And things can break.
> > > > > > > Hence the stop mode, ensures that all the devices get enough
> > > > > > > chance to stop
> > > > > > themselves, and later when freezed, to not change anything internally.
> > > > > > >
> > > > > > > > > +0x2   & Freeze &
> > > > > > > > > + In this mode, the member device does not accept any
> > > > > > > > > +driver notifications,
> > > > > > > >
> > > > > > > > This is too vague. Is the device allowed to be freezed in
> > > > > > > > the middle of any virtio or PCI operations?
> > > > > > > >
> > > > > > > > For example, in the middle of feature negotiation etc. It
> > > > > > > > may cause implementation specific sub-states which can't be
> > migrated easily.
> > > > > > > >
> > > > > > > Yes. it is allowed in middle of feature negotiation, for sure.
> > > > > > > It is passthrough device, hence hypervisor layer do not get to
> > > > > > > see sub-
> > > > state.
> > > > > > >
> > > > > > > Not sure why you comment, why it cannot be migrated easily.
> > > > > > > The device context already covers this sub-state.
> > > > > >
> > > > > > 1) driver writes driver_features
> > > > > > 2) driver sets FEAUTRES_OK
> > > > > >
> > > > > > 3) device receive driver_features
> > > > > > 4) device validating driver_features
> > > > > > 5) device clears FEATURES_OK
> > > > > >
> > > > > > 6) driver read stats and realize FEATURES_OK is being cleared
> > > > > >
> > > > > > Is it valid to be frozen of the above?
> > > > > No. device mode is frozen when hypervisor is sure that no more
> > > > > access by the
> > > > guest will be done.
> > > > > What can happen between #2 and #3, is device mode may change to stop.
> > > > > And in stop mode, device context would capture #5 or #4, depending
> > > > > where is
> > > > device at that point.
> > > > >
> > > > > > >
> > > > > > > > And what's more, the above state machine seems to be virtio
> > > > > > > > specific, but you don't explain the interaction with the
> > > > > > > > device status state
> > > > > > machine.
> > > > > > > First, above is not a state machine.
> > > > > >
> > > > > > So how do readers know if a state can go to another state and when?
> > > > > >
> > > > > Not sure what you mean by reader. Can you please explain.
> > > > >
> > > > > > > Second, it is not virtio specific.
> > > > > >
> > > > > > It's somehow for sure, for example you said device context need
> > > > > > to be preserved. And as far as I see the device context is all
> > > > > > virtio specific in
> > > > patch 3.
> > > > > >
> > > > > Sure, device context is virtio specific. :) Device context will
> > > > > reflect if things changed in the stop mode.
> > > > >
> > > > > > > It is present in leading OS that has fundamental requirement
> > > > > > > to support P2P
> > > > > > devices.
> > > > > >
> > > > > > If it's PCI specific, instead of trying to do a workaround in
> > > > > > virtio, why not invent a mechanism there?
> > > > > >
> > > > > It is not a workaround in virtio.
> > > > > It is the way pci p2p devices work for which one needs to be
> > > > > receptive to
> > > > handle the interaction.
> > > > >
> > > > >
> > > > > > > Third, it is not, interacing with the _actua_ device status.
> > > > > > >
> > > > > > > In "SUSPEND" patch-5, you already asked this question. I
> > > > > > > assume you asked
> > > > > > again so that this series is complete.
> > > > > > >
> > > > > > > > For example,
> > > > > > > > what happens if the driver wants to reset but the device is
> > > > > > > > in stop mode? You told me it is addressed in your series but
> > > > > > > > looks not. Once you try to describe that, you're actually
> > > > > > > > try to connect states between the
> > > > > > two state machines.
> > > > > > > >
> > > > > > > As listed in the definition of the stop mode, the device do
> > > > > > > not act on the
> > > > > > incoming writes, it only keep tracks of its internal device
> > > > > > context change as part of this.
> > > > > >
> > > > > > So only the driver notification is allowed by not config write?
> > > > > > What's the consideration for allowing driver notification?
> > > > > >
> > > > > Because for most practical purposes, peer device wants to queue
> > > > > blk, net
> > > > other requests and not do device configuration.
> > > > >
> > > > > Do you know any device configuration space which is RW?
> > > > > For net and blk I recall it as RO?
> > > >
> > > > No it isn't. Pls look at the spec if you need to check that ;)
> > > >
> > > Ok. will check. But regardless, it is fine, because when STOP is done, config
> > writes should not occur anyway.
> > 
> > 
> > i don't see a statement like this but maybe i missed it.
> >
> I am missing it, will add.



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]