[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [virtio] New virtio balloon...
On Mon, Feb 03, 2014 at 01:37:17PM +1030, Rusty Russell wrote: > "Michael S. Tsirkin" <mst@redhat.com> writes: > > On Fri, Jan 31, 2014 at 04:01:39PM +1030, Rusty Russell wrote: > >> I originally allowed the host to deny the deflate, which was why I > >> reversed it. Then I realized that was a bad idea. I can switch it back. > > > > I think explicit swap that you suggested sounds better to me. > > OK, I've added this. It takes 2N PFNS, though Linux only uses 2. > > >> > - what's the status of page returned from balloon? > >> > is it zeroed or can it have old data in there? > >> > I think in practice Linux will sometimes map in a zero page, > >> > so guest can save cycles and avoid zeroing it out. > >> > I think we should tell this to guest when returning > >> > pages. > >> > >> QEMU may not know, since the kernel may not tell it. > > > > Depends on what QEMU does. > > I think kernel always gives us zero pages when we allocate > > memory, they must be initialized otherwise it's an information leak. > > MADV_DONTNEED is a bit of a mess. Under Linux it will zero pages, > under POSIX (POSIX_MADV_DONTNEED) it will keep them intact. Under > FreeBSD we'd really want MADV_FREE, but linux doesn't support it. > > Let's not design this for today's QEMU. > > >> We should assume > >> nothing, and let the guest zero if it needs to. Seems like a premuture > >> optimization. > > > > Possibly. > > > >> > - I am guessing EXTRA_MEM is for uses like the ones proposed by > >> > Frank Swiderski from google that inflate/deflate balloon > >> > whenever guest wants (look for "Add a page cache-backed balloon > >> > device driver"). > >> > > >> > this is useful but - we need to distinguish pages > >> > like this from regular inflate. > >> > it's not just counter and host needs a way to know > >> > that it's target is reached > >> > >> The driver needs to explicitly ask for pages in that region. > > > > OK so we'll have an extra flag for that? > > No, I mean that the driver explicitly requests every page by PFN. If it > requests a page in the extramem region, it will get one. Maybe I misunderstand. What is EXTRA_MEM is aid of? > >> > - do we even want to allow guest not telling host when it wants > >> > to reuse the page? > >> > if yes, I think this should be per-page somehow: when balloon > >> > is inflated guest should tell host whether it > >> > expects to use this page. > >> > >> I decided against it. Making that optional got us into a mess, so now > >> it's compulsory. That also fits better with the idea of a negative > >> balloon. > >> > >> > So I think we should accomodate these uses, and so we want the following flags: > >> > > >> > - WEAK_TARGET (that's the EXTRA_MEM but I think done in a better way) > >> > flag that specifies pages do not count against target, > >> > can be taken out of balloon. > >> > EXTRA_MEM suggests there's an upper limit on balloon size > >> > but IMHO that's just extra work for host: host does not care > >> > I think, give it as much as you want. > >> > set by guest, used by host > >> > >> I think that Daniel really does want more memory than the guest starts > >> with. And I think he still wants to use the balloon to control it. > >> Daniel? > >> > >> > - TELL_HOST flag that specifies guest will tell host before using pages > >> > (that's VIRTIO_BALLOON_F_MUST_TELL_HOST > >> > at the moment, listed here for completeness) > >> > set by guest, used by host > >> > >> Dislike. > >> > >> > - ZEROED > >> > flag that specifies that page returned to guest > >> > is zeroed > >> > set by host, used by guest > >> > >> I think that's silly. Under Linux the guest doesn't need to know it's > >> zeroed or not, it just frees the page. > > > > Yes but it's possible that linux will try to zero page right > > after free. It won't be too hard to set a flag that it's > > zeroed when we free it. > > We could, but I don't see that this is a bottleneck. See above. > > >> > - how to accomodate memory pressure in guest? > >> > Let's add a field telling host how hard do we > >> > want our memory back > >> > >> That's very hard to define across guests. Should we be using stats for > >> that instead? In fact, should we allow gratuitous stats sending, > >> instead of a simple NEED_MEM flag? > >> > >> > - assume you want to over-commit host and start > >> > inflating balloon. > >> > If low on memory it might be better for guest to > >> > wait a bit before inflating. > >> > Also, if host asks for a lot of memory a ton of > >> > allocations will slow guest significantly. > >> > But for guest to do the right thing we need host to tell guest what > >> > are its memory and time contraints. > >> > Let's add a field telling guest how hard do we > >> > want it to give us memory (e.g. time limit) > >> > >> We can't have intelligence at both ends, I think. We've chosen a > >> host-led model, so we should stick to that > > > > I'm saying let's control speed of allocations from host, > > that's still host-led? > > You want the guest to wait a bit, and control the rate at which it > allocates memory. If that's what we want, let's get the host to delay > telling it to inflate, and then inflate slowly. Otherwise we have to > debug both host and guest sides when we hit performance problems. I wonder if a single bit wil be enough. > > I changed the STATS_REPLY to STATS, and included a "want more mem" > flag. The implication is that the host compare stats across different > guests. > > Cheers, > Rusty. > diff --git a/drivers/virtio/virtio_balloon2.c b/drivers/virtio/virtio_balloon2.c > index 93f13e7c561d..6c0151f12f8b 100644 > --- a/drivers/virtio/virtio_balloon2.c > +++ b/drivers/virtio/virtio_balloon2.c > @@ -39,12 +39,15 @@ struct gcmd_give_pages { > __le64 pages[256]; > }; > > -struct gcmd_need_mem { > - __le64 type; /* VIRTIO_BALLOON_GCMD_NEED_MEM */ > +struct gcmd_exchange_pages { > + __le64 type; /* VIRTIO_BALLOON_GCMD_EXCHANGE_PAGES */ > + __le64 from_balloon; > + __le64 to_balloon; > }; > > -struct gcmd_stats_reply { > +struct gcmd_stats { > __le64 type; /* VIRTIO_BALLOON_GCMD_STATS_REPLY */ > + __le64 need_more; > struct virtio_balloon_statistic stats[VIRTIO_BALLOON_S_NR]; > }; > > @@ -90,8 +93,8 @@ struct virtio_balloon { > __le64 type; > struct gcmd_get_pages get_pages; > struct gcmd_give_pages give_pages; > - struct gcmd_need_mem need_mem; > - struct gcmd_stats_reply stats_reply; > + struct gcmd_exchange_pages exchange_pages; > + struct gcmd_stats stats; > } gcmd; > > union hcmd { > @@ -382,19 +385,11 @@ static int virtballoon_migratepage(struct address_space *mapping, > if (!mutex_trylock(&vb->lock)) > return -EAGAIN; > > - /* Try to get the page out of the balloon. */ > - vb->gcmd.get_pages.type = cpu_to_le64(VIRTIO_BALLOON_GCMD_GET_PAGES); > - vb->gcmd.get_pages.pages[0] = page_to_pfn(page) << PAGE_SHIFT; > - if (!send_gcmd(vb, offsetof(struct gcmd_get_pages, pages[1]))) { > - err = -EIO; > - goto unlock; > - } > - > - /* Now put newpage into balloon. */ > - vb->gcmd.give_pages.type = cpu_to_le64(VIRTIO_BALLOON_GCMD_GIVE_PAGES); > - vb->gcmd.give_pages.pages[0] = page_to_pfn(newpage) << PAGE_SHIFT; > - if (!send_gcmd(vb, offsetof(struct gcmd_give_pages, pages[1]))) { > - /* We leak a page here, but only happens if balloon broken. */ > + vb->gcmd.exchange_pages.type = > + cpu_to_le64(VIRTIO_BALLOON_GCMD_EXCHANGE_PAGES); > + vb->gcmd.exchange_pages.from_balloon = page_to_pfn(page) << PAGE_SHIFT; > + vb->gcmd.exchange_pages.to_balloon = page_to_pfn(newpage) << PAGE_SHIFT; > + if (!send_gcmd(vb, sizeof(vb->gcmd.exchange_pages))) { > err = -EIO; > goto unlock; > } > diff --git a/include/uapi/linux/virtio_balloon.h b/include/uapi/linux/virtio_balloon.h > index cdca2934668a..925d79ad5c90 100644 > --- a/include/uapi/linux/virtio_balloon.h > +++ b/include/uapi/linux/virtio_balloon.h > @@ -48,21 +48,36 @@ struct virtio_balloon_statistic { > }; > > /* Guest->host command queue. */ > -/* Ask the host for more pages. > - Followed by array of 1 or more readable le64 pageaddr's. */ > + > +/* > + * Ask the host for more pages. > + * Followed by array of 1 or more readable le64 pageaddr's. > + */ > #define VIRTIO_BALLOON_GCMD_GET_PAGES ((__le64)0) > -/* Give the host more pages. > - Followed by array of 1 or more readable le64 pageaddr's */ > +/* > + * Give the host more pages. > + * Followed by array of 1 or more readable le64 pageaddr's > + */ > #define VIRTIO_BALLOON_GCMD_GIVE_PAGES ((__le64)1) > -/* Dear host: I need more memory. */ > -#define VIRTIO_BALLOON_GCMD_NEEDMEM ((__le64)2) > -/* Dear host: here are your stats. > - * Followed by 0 or more struct virtio_balloon_statistic structs. */ > -#define VIRTIO_BALLOON_GCMD_STATS_REPLY ((__le64)3) > +/* > + * Exchange pages in the ballon. > + * Followed by array of 2N readable le64 pageaddr's. First N: to extract from > + * balloon, next N: to add to the balloon > +*/ > +#define VIRTIO_BALLOON_GCMD_EXCHANGE_PAGES ((__le64)2) > +/* > + * Stats, and optional request for memory. > + * __le64: 0 if we don't want target increased, 1 if we do. > + * Followed by 0 or more struct virtio_balloon_statistic structs. > + */ > +#define VIRTIO_BALLOON_GCMD_STATS ((__le64)3) > > /* Host->guest command queue. */ > -/* Followed by s64 of new balloon target size (only negative if > - * VIRTIO_BALLOON_F_EXTRA_MEM). */ > + > +/* > + * Followed by s64 of new balloon target size (only negative if > + * VIRTIO_BALLOON_F_EXTRA_MEM). > + */ > #define VIRTIO_BALLOON_HCMD_SET_BALLOON ((__le64)0x8000) > /* Ask for statistics */ > #define VIRTIO_BALLOON_HCMD_GET_STATS ((__le64)0x8001) > One other comment: for NUMA VMs we might want to have separate counters for each node, and separate memory pressure notification. Maybe just separate balloon for each node? -- MST
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]