virtio message

Subject: Re: [virtio] New virtio balloon...
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Rusty Russell <rusty@au1.ibm.com>
Date: Sun, 2 Feb 2014 18:21:14 +0200
On Fri, Jan 31, 2014 at 04:01:39PM +1030, Rusty Russell wrote:
> "Michael S. Tsirkin" <mst@redhat.com> writes:
> > Also copy virtio-dev since this in clearly implementation ...
> >
> > On Thu, Jan 30, 2014 at 07:34:30PM +1030, Rusty Russell wrote:
> >> Hi,
> >> 
> >>         I tried to write a new balloon driver; it's completely untested
> >> (as I need to write the device).  The protocol is basically two vqs, one
> >> for the guest to send commands, one for the host to send commands.
> >> 
> >> Some interesting things come out:
> >> 1) We do need to explicitly tell the host where the page is we want.
> >>    This is required for compaction, for example.
> >> 
> >> 2) We need to be able to exceed the balloon target, especially for page
> >>    migration.  Thus there's no mechanism for the device to refuse to
> >>    give us the pages.
> >> 
> >> 3) The device can offer multiple page sizes, but the driver can only
> >>    accept one.  I'm not sure if this is useful, as guests are either
> >>    huge page backed or not, and returning sub-pages isn't useful.
> >> 
> >> Linux demo code follows.
> >> 
> >> Cheers,
> >> Rusty.
> >
> > More comments:
> > 	- for projects like auto-ballooning that Luiz works on,
> > 	  it's not nice that to swap page 1 for page 2
> > 	  you have to inflate then deflate
> > 	  besides overhead this confuses the host:
> > 	  imagine you tell QEMU to increase target,
> > 	  meanwhile guest inflates temporarily,
> > 	  QEMU thinks okay done, now you suddenly deflate.
> 
> I originally allowed the host to deny the deflate, which was why I
> reversed it.  Then I realized that was a bad idea.  I can switch it back.

I think explicit swap that you suggested sounds better to me.

> > 	- what's the status of page returned from balloon?
> > 	  is it zeroed or can it have old data in there?
> > 	  I think in practice Linux will sometimes map in a zero page,
> > 	  so guest can save cycles and avoid zeroing it out.
> > 	  I think we should tell this to guest when returning
> > 	  pages.
> 
> QEMU may not know, since the kernel may not tell it.

Depends on what QEMU does.
I think kernel always gives us zero pages when we allocate
memory, they must be initialized otherwise it's an information leak.


>  We should assume
> nothing, and let the guest zero if it needs to.  Seems like a premuture
> optimization.

Possibly.

> > 	- I am guessing EXTRA_MEM is for uses like the ones proposed by
> > 	  Frank Swiderski from google that inflate/deflate balloon
> >           whenever guest wants (look for "Add a page cache-backed balloon
> > 	  device driver").
> >
> >           this is useful but - we need to distinguish pages
> > 	  like this from regular inflate.
> > 	  it's not just counter and host needs a way to know
> > 	  that it's target is reached
> 
> The driver needs to explicitly ask for pages in that region.

OK so we'll have an extra flag for that?


> > 	- do we even want to allow guest not telling host when it wants
> > 	  to reuse the page?
> > 	  if yes, I think this should be per-page somehow: when balloon
> > 	  is inflated guest should tell host whether it
> > 	  expects to use this page.
> 
> I decided against it.  Making that optional got us into a mess, so now
> it's compulsory.  That also fits better with the idea of a negative
> balloon.
> 
> > So I think we should accomodate these uses, and so we want the following flags:
> >
> > 	- WEAK_TARGET (that's the EXTRA_MEM but I think done in a better way)
> >           flag that specifies pages do not count against target,
> > 	  can be taken out of balloon.
> > 	  EXTRA_MEM suggests there's an upper limit on balloon size
> > 	  but IMHO that's just extra work for host: host does not care
> > 	  I think, give it as much as you want.
> > 	  set by guest, used by host
> 
> I think that Daniel really does want more memory than the guest starts
> with.  And I think he still wants to use the balloon to control it.
> Daniel?
> 
> > 	- TELL_HOST flag that specifies guest will tell host before using pages
> > 	  (that's VIRTIO_BALLOON_F_MUST_TELL_HOST
> > 	  at the moment, listed here for completeness)
> > 	  set by guest, used by host
> 
> Dislike.
> 
> > 	- ZEROED
> > 	  flag that specifies that page returned to guest
> > 	  is zeroed
> > 	  set by host, used by guest
> 
> I think that's silly.  Under Linux the guest doesn't need to know it's
> zeroed or not, it just frees the page.

Yes but it's possible that linux will try to zero page right
after free. It won't be too hard to set a flag that it's
zeroed when we free it.


> > Each of the flags can be just a feature flag, and then
> > if we wants a mix of them host can create multiple
> > balloon devices with differnet flags, and guest looks for best
> > balloon for its purposes.
> >
> > Alternatively flags can be set and reported per page.
> >
> >
> > A couple of other suggestions:
> >
> > - how to accomodate memory pressure in guest?
> >   Let's add a field telling host how hard do we
> >   want our memory back
> 
> That's very hard to define across guests.  Should we be using stats for
> that instead?  In fact, should we allow gratuitous stats sending,
> instead of a simple NEED_MEM flag?
> 
> > - assume you want to over-commit host and start
> >   inflating balloon.
> >   If low on memory it might be better for guest to
> >   wait a bit before inflating.
> >   Also, if host asks for a lot of memory a ton of
> >   allocations will slow guest significantly.
> >   But for guest to do the right thing we need host to tell guest what
> >   are its memory and time contraints.
> >   Let's add a field telling guest how hard do we
> >   want it to give us memory (e.g. time limit)
> 
> We can't have intelligence at both ends, I think.  We've chosen a
> host-led model, so we should stick to that

I'm saying let's control speed of allocations from host,
that's still host-led?

> unless someone has an
> implementation which proves its worth doing otherwise.
> 
> Cheers,
> Rusty.
Follow-Ups:
- Re: [virtio] New virtio balloon...
  - From: Daniel Kiper <daniel.kiper@oracle.com>
- Re: [virtio] New virtio balloon...
  - From: Rusty Russell <rusty@au1.ibm.com>
References:
- Re: [virtio] New virtio balloon...
  - From: Rusty Russell <rusty@au1.ibm.com>