[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [virtio-comment] [PATCH v1 1/8] admin: Add theory of operation for device migration
On Tue, Oct 10, 2023 at 02:09:27PM +0000, Parav Pandit wrote: > > > From: Michael S. Tsirkin <mst@redhat.com> > > Sent: Tuesday, October 10, 2023 7:30 PM > > > > The hypervisor driver composes the vPCI device. So there isnât a need to > > migrate the pci state. > > > Only exception is VIRTIO_PCI_CAP_PCI_CFG, which is covered in this v1. > > > > > > > yes but what seems implicit is that device is in some reasonable state when > > thing thing happens. e.g. are there no limitations at all e.g. in which order > > things happen? can you really first configure virtio then pci config? for sure? > > > First pci config is setup, like bus master enable etc. > After that point, the device is handed to virtio things. > > From device context write perspective, I doubt the order matters. > For example, if pci bus master and msix are enabled after device context restore or before would not matter much. > As long as they are done before making the device mode to active. whatever the requirements, document them. > > > > > > > > > > > > > > > >and device configuration space may change. \\ > > > > > > > > > +\hline > > > > > > > > > > > > > > > > I still don't get why we need a "stop" state in the middle. > > > > > > > > > > > > > > > All pci devices which belong to a single guest VM are not > > > > > > > stopped > > > > atomically. > > > > > > > Hence, one device which is in freeze mode, may still receive > > > > > > > driver notifications from other pci device, > > > > > > > > > > > > Device may choose to ignore those notifications, no? > > > > > > > > > > > > > or it may experience a read from the shared memory and get > > > > > > > garbage > > > > data. > > > > > > > > > > > > Could you give me an example for this? > > > > > > > > > > > Section 2.10 Shared Memory Regions. > > > > > > > > > > > > And things can break. > > > > > > > Hence the stop mode, ensures that all the devices get enough > > > > > > > chance to stop > > > > > > themselves, and later when freezed, to not change anything internally. > > > > > > > > > > > > > > > > +0x2 & Freeze & > > > > > > > > > + In this mode, the member device does not accept any > > > > > > > > > +driver notifications, > > > > > > > > > > > > > > > > This is too vague. Is the device allowed to be freezed in > > > > > > > > the middle of any virtio or PCI operations? > > > > > > > > > > > > > > > > For example, in the middle of feature negotiation etc. It > > > > > > > > may cause implementation specific sub-states which can't be > > migrated easily. > > > > > > > > > > > > > > > Yes. it is allowed in middle of feature negotiation, for sure. > > > > > > > It is passthrough device, hence hypervisor layer do not get to > > > > > > > see sub- > > > > state. > > > > > > > > > > > > > > Not sure why you comment, why it cannot be migrated easily. > > > > > > > The device context already covers this sub-state. > > > > > > > > > > > > 1) driver writes driver_features > > > > > > 2) driver sets FEAUTRES_OK > > > > > > > > > > > > 3) device receive driver_features > > > > > > 4) device validating driver_features > > > > > > 5) device clears FEATURES_OK > > > > > > > > > > > > 6) driver read stats and realize FEATURES_OK is being cleared > > > > > > > > > > > > Is it valid to be frozen of the above? > > > > > No. device mode is frozen when hypervisor is sure that no more > > > > > access by the > > > > guest will be done. > > > > > What can happen between #2 and #3, is device mode may change to stop. > > > > > And in stop mode, device context would capture #5 or #4, depending > > > > > where is > > > > device at that point. > > > > > > > > > > > > > > > > > > > > And what's more, the above state machine seems to be virtio > > > > > > > > specific, but you don't explain the interaction with the > > > > > > > > device status state > > > > > > machine. > > > > > > > First, above is not a state machine. > > > > > > > > > > > > So how do readers know if a state can go to another state and when? > > > > > > > > > > > Not sure what you mean by reader. Can you please explain. > > > > > > > > > > > > Second, it is not virtio specific. > > > > > > > > > > > > It's somehow for sure, for example you said device context need > > > > > > to be preserved. And as far as I see the device context is all > > > > > > virtio specific in > > > > patch 3. > > > > > > > > > > > Sure, device context is virtio specific. :) Device context will > > > > > reflect if things changed in the stop mode. > > > > > > > > > > > > It is present in leading OS that has fundamental requirement > > > > > > > to support P2P > > > > > > devices. > > > > > > > > > > > > If it's PCI specific, instead of trying to do a workaround in > > > > > > virtio, why not invent a mechanism there? > > > > > > > > > > > It is not a workaround in virtio. > > > > > It is the way pci p2p devices work for which one needs to be > > > > > receptive to > > > > handle the interaction. > > > > > > > > > > > > > > > > > Third, it is not, interacing with the _actua_ device status. > > > > > > > > > > > > > > In "SUSPEND" patch-5, you already asked this question. I > > > > > > > assume you asked > > > > > > again so that this series is complete. > > > > > > > > > > > > > > > For example, > > > > > > > > what happens if the driver wants to reset but the device is > > > > > > > > in stop mode? You told me it is addressed in your series but > > > > > > > > looks not. Once you try to describe that, you're actually > > > > > > > > try to connect states between the > > > > > > two state machines. > > > > > > > > > > > > > > > As listed in the definition of the stop mode, the device do > > > > > > > not act on the > > > > > > incoming writes, it only keep tracks of its internal device > > > > > > context change as part of this. > > > > > > > > > > > > So only the driver notification is allowed by not config write? > > > > > > What's the consideration for allowing driver notification? > > > > > > > > > > > Because for most practical purposes, peer device wants to queue > > > > > blk, net > > > > other requests and not do device configuration. > > > > > > > > > > Do you know any device configuration space which is RW? > > > > > For net and blk I recall it as RO? > > > > > > > > No it isn't. Pls look at the spec if you need to check that ;) > > > > > > > Ok. will check. But regardless, it is fine, because when STOP is done, config > > writes should not occur anyway. > > > > > > i don't see a statement like this but maybe i missed it. > > > I am missing it, will add.
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]