OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [PATCH V2 0/6] introduce basic facilities for virito live migration


On Fri, Nov 03, 2023 at 06:34:31PM +0800, Zhu Lingshan wrote:
> This series introduces basic facilities to support
> virtio live migration, includes:
> 
> 1)a new SUSPEND bit in the device status
> Which is used to suspend the device, so that the device states
> and virtqueue states are stabilized.
> 
> 2)virtqueue state and its accessor, to get and set last_avail_idx
> and last_used_idx of virtqueues.
> 
> 3)dirty page tracking


So looking at this from 100ft:
- SUSPEND bit looks like something that might have value as a generic
  component. For example, maybe for NUMA balancing we could suspend,
  quickly copy ring to a different node and resume.  However current
  restrictions make it very limited, e.g.  apparently you can't change
  config space while suspended.
  As another example, changing config while suspended might be
  needed e.g. for net announcements.
  Also, do we want to suspend individual
  queues then? what exactly happens with config changes while suspended
  that would happen otherwise is also unclear. Also as is, proposal is
  very light on detail. Other patches in the series make it look like
  there are more assumptions made about e.g. how vq enters the
  suspended state.

- virtqueue state proposal looks very vague. A couple of 16 bit indices
  are insufficient to fully describe internal vq state at an arbitrary
  time. Some assumptions seem to be made that make this possible and
  yes, these would need to be stated and/or lifted.
  Preferably lifted since another use-case proposed was debugging -
  you do not, while debugging, want to depend on device following
  a complex set of assumptions.
  
- dirty page tracking as described does not seem practical for
  many systems.  increasing page size x8 is just being nasty
  towards other network users. CAS + retry could be a solution,
  but this needs to be documented thoroughly then and it appears this is not what author expects to implement
  anyway - instead, there's an assumption that platform itself
  will support dirty tracking. By itself, this is not
  an impossible assumption - will possibly result in a cheaper,
  slower device. why not have an option like this?
  I would probably just drop it from this proposal completely.
  Also, tracking memory on the device means we'll lose state
  around reset. Solving that could be tricky. Finally,
  dependence on PASID can not be removed apparently.
  So maybe, people who want to track memory changes on the
  device itself should just bite the bullet and use
  admin vq in the PF.




-- 
MST



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]