OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-dev message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [virtio] New virtio balloon...


On Mon, Feb 03, 2014 at 01:37:17PM +1030, Rusty Russell wrote:
> "Michael S. Tsirkin" <mst@redhat.com> writes:
> > On Fri, Jan 31, 2014 at 04:01:39PM +1030, Rusty Russell wrote:
> >> I originally allowed the host to deny the deflate, which was why I
> >> reversed it.  Then I realized that was a bad idea.  I can switch it back.
> >
> > I think explicit swap that you suggested sounds better to me.
> 
> OK, I've added this.  It takes 2N PFNS, though Linux only uses 2.
> 
> >> > 	- what's the status of page returned from balloon?
> >> > 	  is it zeroed or can it have old data in there?
> >> > 	  I think in practice Linux will sometimes map in a zero page,
> >> > 	  so guest can save cycles and avoid zeroing it out.
> >> > 	  I think we should tell this to guest when returning
> >> > 	  pages.
> >> 
> >> QEMU may not know, since the kernel may not tell it.
> >
> > Depends on what QEMU does.
> > I think kernel always gives us zero pages when we allocate
> > memory, they must be initialized otherwise it's an information leak.
> 
> MADV_DONTNEED is a bit of a mess.  Under Linux it will zero pages,
> under POSIX (POSIX_MADV_DONTNEED) it will keep them intact.  Under
> FreeBSD we'd really want MADV_FREE, but linux doesn't support it.
> 
> Let's not design this for today's QEMU.
> 
> >>  We should assume
> >> nothing, and let the guest zero if it needs to.  Seems like a premuture
> >> optimization.
> >
> > Possibly.
> >
> >> > 	- I am guessing EXTRA_MEM is for uses like the ones proposed by
> >> > 	  Frank Swiderski from google that inflate/deflate balloon
> >> >           whenever guest wants (look for "Add a page cache-backed balloon
> >> > 	  device driver").
> >> >
> >> >           this is useful but - we need to distinguish pages
> >> > 	  like this from regular inflate.
> >> > 	  it's not just counter and host needs a way to know
> >> > 	  that it's target is reached
> >> 
> >> The driver needs to explicitly ask for pages in that region.
> >
> > OK so we'll have an extra flag for that?
> 
> No, I mean that the driver explicitly requests every page by PFN.  If it
> requests a page in the extramem region, it will get one.

Maybe I misunderstand. What is EXTRA_MEM is aid of?

> >> > 	- do we even want to allow guest not telling host when it wants
> >> > 	  to reuse the page?
> >> > 	  if yes, I think this should be per-page somehow: when balloon
> >> > 	  is inflated guest should tell host whether it
> >> > 	  expects to use this page.
> >> 
> >> I decided against it.  Making that optional got us into a mess, so now
> >> it's compulsory.  That also fits better with the idea of a negative
> >> balloon.
> >> 
> >> > So I think we should accomodate these uses, and so we want the following flags:
> >> >
> >> > 	- WEAK_TARGET (that's the EXTRA_MEM but I think done in a better way)
> >> >           flag that specifies pages do not count against target,
> >> > 	  can be taken out of balloon.
> >> > 	  EXTRA_MEM suggests there's an upper limit on balloon size
> >> > 	  but IMHO that's just extra work for host: host does not care
> >> > 	  I think, give it as much as you want.
> >> > 	  set by guest, used by host
> >> 
> >> I think that Daniel really does want more memory than the guest starts
> >> with.  And I think he still wants to use the balloon to control it.
> >> Daniel?
> >> 
> >> > 	- TELL_HOST flag that specifies guest will tell host before using pages
> >> > 	  (that's VIRTIO_BALLOON_F_MUST_TELL_HOST
> >> > 	  at the moment, listed here for completeness)
> >> > 	  set by guest, used by host
> >> 
> >> Dislike.
> >> 
> >> > 	- ZEROED
> >> > 	  flag that specifies that page returned to guest
> >> > 	  is zeroed
> >> > 	  set by host, used by guest
> >> 
> >> I think that's silly.  Under Linux the guest doesn't need to know it's
> >> zeroed or not, it just frees the page.
> >
> > Yes but it's possible that linux will try to zero page right
> > after free. It won't be too hard to set a flag that it's
> > zeroed when we free it.
> 
> We could, but I don't see that this is a bottleneck.  See above.
> 
> >> > - how to accomodate memory pressure in guest?
> >> >   Let's add a field telling host how hard do we
> >> >   want our memory back
> >> 
> >> That's very hard to define across guests.  Should we be using stats for
> >> that instead?  In fact, should we allow gratuitous stats sending,
> >> instead of a simple NEED_MEM flag?
> >> 
> >> > - assume you want to over-commit host and start
> >> >   inflating balloon.
> >> >   If low on memory it might be better for guest to
> >> >   wait a bit before inflating.
> >> >   Also, if host asks for a lot of memory a ton of
> >> >   allocations will slow guest significantly.
> >> >   But for guest to do the right thing we need host to tell guest what
> >> >   are its memory and time contraints.
> >> >   Let's add a field telling guest how hard do we
> >> >   want it to give us memory (e.g. time limit)
> >> 
> >> We can't have intelligence at both ends, I think.  We've chosen a
> >> host-led model, so we should stick to that
> >
> > I'm saying let's control speed of allocations from host,
> > that's still host-led?
> 
> You want the guest to wait a bit, and control the rate at which it
> allocates memory.  If that's what we want, let's get the host to delay
> telling it to inflate, and then inflate slowly.  Otherwise we have to
> debug both host and guest sides when we hit performance problems.

I wonder if a single bit wil be enough.

> 
> I changed the STATS_REPLY to STATS, and included a "want more mem"
> flag.  The implication is that the host compare stats across different
> guests.
> 
> Cheers,
> Rusty.
> diff --git a/drivers/virtio/virtio_balloon2.c b/drivers/virtio/virtio_balloon2.c
> index 93f13e7c561d..6c0151f12f8b 100644
> --- a/drivers/virtio/virtio_balloon2.c
> +++ b/drivers/virtio/virtio_balloon2.c
> @@ -39,12 +39,15 @@ struct gcmd_give_pages {
>  	__le64 pages[256];
>  };
>  
> -struct gcmd_need_mem {
> -	__le64 type; /* VIRTIO_BALLOON_GCMD_NEED_MEM */
> +struct gcmd_exchange_pages {
> +	__le64 type; /* VIRTIO_BALLOON_GCMD_EXCHANGE_PAGES */
> +	__le64 from_balloon;
> +	__le64 to_balloon;
>  };
>  
> -struct gcmd_stats_reply {
> +struct gcmd_stats {
>  	__le64 type; /* VIRTIO_BALLOON_GCMD_STATS_REPLY */
> +	__le64 need_more;
>  	struct virtio_balloon_statistic stats[VIRTIO_BALLOON_S_NR];
>  };
>  
> @@ -90,8 +93,8 @@ struct virtio_balloon {
>  		__le64 type;
>  		struct gcmd_get_pages get_pages;
>  		struct gcmd_give_pages give_pages;
> -		struct gcmd_need_mem need_mem;
> -		struct gcmd_stats_reply stats_reply;
> +		struct gcmd_exchange_pages exchange_pages;
> +		struct gcmd_stats stats;
>  	} gcmd;
>  
>  	union hcmd {
> @@ -382,19 +385,11 @@ static int virtballoon_migratepage(struct address_space *mapping,
>  	if (!mutex_trylock(&vb->lock))
>  		return -EAGAIN;
>  
> -	/* Try to get the page out of the balloon. */
> -	vb->gcmd.get_pages.type = cpu_to_le64(VIRTIO_BALLOON_GCMD_GET_PAGES);
> -	vb->gcmd.get_pages.pages[0] = page_to_pfn(page) << PAGE_SHIFT;
> -	if (!send_gcmd(vb, offsetof(struct gcmd_get_pages, pages[1]))) {
> -		err = -EIO;
> -		goto unlock;
> -	}
> -
> -	/* Now put newpage into balloon. */
> -	vb->gcmd.give_pages.type = cpu_to_le64(VIRTIO_BALLOON_GCMD_GIVE_PAGES);
> -	vb->gcmd.give_pages.pages[0] = page_to_pfn(newpage) << PAGE_SHIFT;
> -	if (!send_gcmd(vb, offsetof(struct gcmd_give_pages, pages[1]))) {
> -		/* We leak a page here, but only happens if balloon broken. */
> +	vb->gcmd.exchange_pages.type =
> +		cpu_to_le64(VIRTIO_BALLOON_GCMD_EXCHANGE_PAGES);
> +	vb->gcmd.exchange_pages.from_balloon = page_to_pfn(page) << PAGE_SHIFT;
> +	vb->gcmd.exchange_pages.to_balloon = page_to_pfn(newpage) << PAGE_SHIFT;
> +	if (!send_gcmd(vb, sizeof(vb->gcmd.exchange_pages))) {
>  		err = -EIO;
>  		goto unlock;
>  	}
> diff --git a/include/uapi/linux/virtio_balloon.h b/include/uapi/linux/virtio_balloon.h
> index cdca2934668a..925d79ad5c90 100644
> --- a/include/uapi/linux/virtio_balloon.h
> +++ b/include/uapi/linux/virtio_balloon.h
> @@ -48,21 +48,36 @@ struct virtio_balloon_statistic {
>  };
>  
>  /* Guest->host command queue. */
> -/* Ask the host for more pages.
> -   Followed by array of 1 or more readable le64 pageaddr's. */
> +
> +/*
> + * Ask the host for more pages.
> + * Followed by array of 1 or more readable le64 pageaddr's.
> + */
>  #define VIRTIO_BALLOON_GCMD_GET_PAGES	((__le64)0)
> -/* Give the host more pages.
> -   Followed by array of 1 or more readable le64 pageaddr's */
> +/*
> + * Give the host more pages.
> + * Followed by array of 1 or more readable le64 pageaddr's
> + */
>  #define VIRTIO_BALLOON_GCMD_GIVE_PAGES	((__le64)1)
> -/* Dear host: I need more memory. */
> -#define VIRTIO_BALLOON_GCMD_NEEDMEM	((__le64)2)
> -/* Dear host: here are your stats.
> - * Followed by 0 or more struct virtio_balloon_statistic structs. */
> -#define VIRTIO_BALLOON_GCMD_STATS_REPLY	((__le64)3)
> +/*
> + * Exchange pages in the ballon.
> + * Followed by array of 2N readable le64 pageaddr's.  First N: to extract from
> + * balloon, next N: to add to the balloon
> +*/
> +#define VIRTIO_BALLOON_GCMD_EXCHANGE_PAGES ((__le64)2)
> +/*
> + * Stats, and optional request for memory.
> + * __le64: 0 if we don't want target increased, 1 if we do.
> + * Followed by 0 or more struct virtio_balloon_statistic structs.
> + */
> +#define VIRTIO_BALLOON_GCMD_STATS	((__le64)3)
>  
>  /* Host->guest command queue. */
> -/* Followed by s64 of new balloon target size (only negative if
> - * VIRTIO_BALLOON_F_EXTRA_MEM). */
> +
> +/*
> + * Followed by s64 of new balloon target size (only negative if
> + * VIRTIO_BALLOON_F_EXTRA_MEM).
> + */
>  #define VIRTIO_BALLOON_HCMD_SET_BALLOON	((__le64)0x8000)
>  /* Ask for statistics */
>  #define VIRTIO_BALLOON_HCMD_GET_STATS	((__le64)0x8001)
>


One other comment: for NUMA VMs we might want
to have separate counters for each node,
and separate memory pressure notification.

Maybe just separate balloon for each node?

-- 
MST


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]