OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [virtio-comment] Re: [PATCH V2 0/6] introduce basic facilities for virito live migration




On 11/7/2023 4:01 PM, Michael S. Tsirkin wrote:
On Fri, Nov 03, 2023 at 06:34:31PM +0800, Zhu Lingshan wrote:
This series introduces basic facilities to support
virtio live migration, includes:

1)a new SUSPEND bit in the device status
Which is used to suspend the device, so that the device states
and virtqueue states are stabilized.

2)virtqueue state and its accessor, to get and set last_avail_idx
and last_used_idx of virtqueues.

3)dirty page tracking

So looking at this from 100ft:
- SUSPEND bit looks like something that might have value as a generic
   component. For example, maybe for NUMA balancing we could suspend,
   quickly copy ring to a different node and resume.  However current
   restrictions make it very limited, e.g.  apparently you can't change
   config space while suspended.
Maybe don't need to change the source side config space.
SUSPEND the device to stabilize the device config,
so that the hypervisor can fetch reliable device states,
then if any changes are required, just make modifications
at the destination side before setting SUSPEND there.
   As another example, changing config while suspended might be
   needed e.g. for net announcements.
I think link announce should happen after the device back alive, not before.
   Also, do we want to suspend individual
   queues then? what exactly happens with config changes while suspended
   that would happen otherwise is also unclear. Also as is, proposal is
   very light on detail. Other patches in the series make it look like
   there are more assumptions made about e.g. how vq enters the
   suspended state.
Not sure we need an interface to suspend a individual vq, it suspends all vqs.

I am not sure I totally get you, if you find anything I should add, and any suggestions, please let me know. I should provide more details in the cover letter for sure,
I will add the live migration process in V3 cover letter.

- virtqueue state proposal looks very vague. A couple of 16 bit indices
   are insufficient to fully describe internal vq state at an arbitrary
   time. Some assumptions seem to be made that make this possible and
   yes, these would need to be stated and/or lifted.
   Preferably lifted since another use-case proposed was debugging -
   you do not, while debugging, want to depend on device following
   a complex set of assumptions.
I see there are two kinds of vq states:
1) on device, the device internal states.
I see they are avail idx, used idx and in-flight descriptors.
2) states in the guest memory. This part migrates with guest memory.

I may miss something, please let me what I should add in the vq states,
and I can improve.
- dirty page tracking as described does not seem practical for
   many systems.  increasing page size x8 is just being nasty
   towards other network users. CAS + retry could be a solution,
   but this needs to be documented thoroughly then and it appears this is not what author expects to implement
   anyway - instead, there's an assumption that platform itself
   will support dirty tracking. By itself, this is not
   an impossible assumption - will possibly result in a cheaper,
   slower device. why not have an option like this?
   I would probably just drop it from this proposal completely.
   Also, tracking memory on the device means we'll lose state
   around reset. Solving that could be tricky. Finally,
   dependence on PASID can not be removed apparently.
   So maybe, people who want to track memory changes on the
   device itself should just bite the bullet and use
   admin vq in the PF.







[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]