virtio message

Subject: Re: [virtio] [OASIS Issue Tracker] Created: (VIRTIO-28) Implement new balloon device (ID 13)

From: Daniel Kiper <daniel.kiper@oracle.com>
To: Rusty Russell <rusty@au1.ibm.com>
Date: Tue, 29 Oct 2013 12:44:18 +0100

On Tue, Oct 29, 2013 at 10:44:53AM +1030, Rusty Russell wrote:
> "Michael S. Tsirkin" <mst@redhat.com> writes:
> > OK there are some more issues to consider around deflate.
> >
> >
> > 1.  On KVM, we actually want to change QEMU so that pagefaults don't
> > work either.  Specifically, we want to skip pages in the balloon for
> > migration.
> > However, migration is done in userspace while pagefaults
> > are done in kernel.
> > I think the implication is that
> > -	 you should be able to ask guest to inflate balloon
> > 	 with pages that can be paged in (when you don't want to migrate
> > 	 and want max local performance) or with pages that can not be paged in
> > 	(when you want to migrate faster), dynamically, not through a
> > 	device feature
> > -	 "will notify before use" feature should be per a bunch or pages actually.
>
> I am always reluctant to implement a spec for things which don't exist.
> This is the cause of the current "negative feature" mess with
> VIRTIO_BALLOON_F_MUST_TELL_HOST.
>
> So if we *ever* want to ask for pages, let's make the the driver always
> ask for pages.  You place a buffer in the queue and the device fills it
> with page addresses you can now use.

You mean PFNs?

> See below.
>
> > 2. Assuming we need the deflate command,
> >   I think it's a problem that we need to allocate memory for deflate.
> >   Can't we just stick page pointers in VQ directly?
>
> That's an internal implementation issue; we can avoid the allocation
> fairly trivially using a static buffer.
>
> > If we address these, I think Xen PV becomes simple, you just
> > never ask for pageable-in pages, and you ignore them if guest
> > gives them to you.
>
> Simple, yes.
>
> But even with this design, there's no way for the Xen PV model to
> implement this without a xen-specific hook in the driver.
>
> At the very least, "get_a_page_for_the_balloon()" and
> "return_a_page_from_the_balloon()" will need to do Xen-specific stuff.

Yep, it looks that even HVM needs some Xen specific hooks.

> We *could* argue that's outside the scope of the spec, but we should
> note it.

I am not sure right now.

> Cheers,
> Rusty.
>
> This revises the previous proposal.  It renames the queues for
> more clarity, and insists that you request pages.
>
> diff --git a/virtio-v1.0-wd01-part1-specification.txt b/virtio-v1.0-wd01-part1-specification.txt
> index 426a3cb..173130e 100644
> --- a/virtio-v1.0-wd01-part1-specification.txt
> +++ b/virtio-v1.0-wd01-part1-specification.txt
> @@ -3165,7 +3165,7 @@ to communicate guest memory statistics to the host.
>
>  100.2.4.5.2. Virtqueues
>  ------------------
> -  0:inputq. 1:outputq.
> +  0:fromdevq. 1:todevq.
>
>
>  100.2.4.5.3. Feature bits
> @@ -3187,14 +3187,14 @@ page size of the host (eg. 12 for 4096-byte pages).
>  -----------------------------
>
>  1. At least one struct virtio_balloon_request buffer should be placed
> -   in the inputq.
> +   in the fromdevq.
>
>  2. The balloon starts empty (size 0).
>
>  100.2.4.5.6. Device Operation
>  ------------------------
>
> -The device is driven by receipt of a command in the input queue:
> +The device is driven by receipt of a command in the fromdevq queue:
>
>  	struct virtio_balloon_req {
>  #define VIRTIO_BALLOON_REQ_RESIZE	0
> @@ -3211,7 +3211,7 @@ The device is driven by receipt of a command in the input queue:
>     target, the guest may withdraw pages.
>
>  2. To add pages to the balloon, the physical addresses of the pages
> -   are sent using the output queue.  The number of pages is implied in
> +   are sent using the todevq queue.  The number of pages is implied in
>     the message length, and each page value must be a multiple of the
>     page size indicated in struct virtio_balloon_config.
>
> @@ -3221,15 +3221,24 @@ The device is driven by receipt of a command in the input queue:
>  		u64 page[];
>  	};
>
> -3. To withdraw a page from the balloon, it can simply be accessed.
> -   The contents at this point will be undefined.  The device should
> +3. To withdraw pages from the balloon, the same structure should be
> +   placed in the todevq queue, with the page array writable:
> +
> +	struct virtio_balloon_pages {
> +#define VIRTIO_BALLOON_REQ_PAGES	2
> +		u32 type; // VIRTIO_BALLOON_REQ_PAGES
> +		u64 page[];

What is the size of this array?

> +	};
> +
> +   The device may not fill the entire page array.  The contents
> +   of the pages received will be undefined.  The device should
>     keep count of how many pages remain in the balloon so it can
>     correctly respond to future resize requests.

What happen if driver request more pages than are in balloon?
Are we going to support such cases? I am asking in context
of memory hotplug support.

Daniel

Follow-Ups:
- Re: [virtio] [OASIS Issue Tracker] Created: (VIRTIO-28) Implement new balloon device (ID 13)
  - From: Rusty Russell <rusty@au1.ibm.com>

References:
- Re: [virtio] [OASIS Issue Tracker] Created: (VIRTIO-28) Implement new balloon device (ID 13)
  - From: Rusty Russell <rusty@au1.ibm.com>
- Re: [virtio] [OASIS Issue Tracker] Created: (VIRTIO-28) Implement new balloon device (ID 13)
  - From: Daniel Kiper <daniel.kiper@oracle.com>
- Re: [virtio] [OASIS Issue Tracker] Created: (VIRTIO-28) Implement new balloon device (ID 13)
  - From: Rusty Russell <rusty@au1.ibm.com>
- Re: [virtio] [OASIS Issue Tracker] Created: (VIRTIO-28) Implement new balloon device (ID 13)
  - From: "Michael S. Tsirkin" <mst@redhat.com>
- Re: [virtio] [OASIS Issue Tracker] Created: (VIRTIO-28) Implement new balloon device (ID 13)
  - From: Rusty Russell <rusty@au1.ibm.com>