Subject: Re: [virtio] [OASIS Issue Tracker] Created: (VIRTIO-28) Implement new balloon device (ID 13)

"Michael S. Tsirkin" <mst@redhat.com> writes:
> OK there are some more issues to consider around deflate.
> 1.  On KVM, we actually want to change QEMU so that pagefaults don't
> work either.  Specifically, we want to skip pages in the balloon for
> migration.
> However, migration is done in userspace while pagefaults
> are done in kernel.
> I think the implication is that 
> -	 you should be able to ask guest to inflate balloon
> 	 with pages that can be paged in (when you don't want to migrate
> 	 and want max local performance) or with pages that can not be paged in
> 	(when you want to migrate faster), dynamically, not through a
> 	device feature
> -	 "will notify before use" feature should be per a bunch or pages actually.

I am always reluctant to implement a spec for things which don't exist.
This is the cause of the current "negative feature" mess with

So if we *ever* want to ask for pages, let's make the the driver always
ask for pages.  You place a buffer in the queue and the device fills it
with page addresses you can now use.

See below.

> 2. Assuming we need the deflate command,
>   I think it's a problem that we need to allocate memory for deflate.
>   Can't we just stick page pointers in VQ directly?

That's an internal implementation issue; we can avoid the allocation
fairly trivially using a static buffer.

> If we address these, I think Xen PV becomes simple, you just
> never ask for pageable-in pages, and you ignore them if guest
> gives them to you.

Simple, yes.

But even with this design, there's no way for the Xen PV model to
implement this without a xen-specific hook in the driver.

At the very least, "get_a_page_for_the_balloon()" and
"return_a_page_from_the_balloon()" will need to do Xen-specific stuff.
We *could* argue that's outside the scope of the spec, but we should
note it.


This revises the previous proposal.  It renames the queues for
more clarity, and insists that you request pages.

diff --git a/virtio-v1.0-wd01-part1-specification.txt b/virtio-v1.0-wd01-part1-specification.txt
index 426a3cb..173130e 100644
--- a/virtio-v1.0-wd01-part1-specification.txt
+++ b/virtio-v1.0-wd01-part1-specification.txt
@@ -3165,7 +3165,7 @@ to communicate guest memory statistics to the host. Virtqueues
-  0:inputq. 1:outputq.
+  0:fromdevq. 1:todevq. Feature bits
@@ -3187,14 +3187,14 @@ page size of the host (eg. 12 for 4096-byte pages).
 1. At least one struct virtio_balloon_request buffer should be placed
-   in the inputq.
+   in the fromdevq.
 2. The balloon starts empty (size 0). Device Operation
-The device is driven by receipt of a command in the input queue:
+The device is driven by receipt of a command in the fromdevq queue:
 	struct virtio_balloon_req {
@@ -3211,7 +3211,7 @@ The device is driven by receipt of a command in the input queue:
    target, the guest may withdraw pages.
 2. To add pages to the balloon, the physical addresses of the pages
-   are sent using the output queue.  The number of pages is implied in
+   are sent using the todevq queue.  The number of pages is implied in
    the message length, and each page value must be a multiple of the
    page size indicated in struct virtio_balloon_config.
@@ -3221,15 +3221,24 @@ The device is driven by receipt of a command in the input queue:
 		u64 page[];
-3. To withdraw a page from the balloon, it can simply be accessed.
-   The contents at this point will be undefined.  The device should
+3. To withdraw pages from the balloon, the same structure should be
+   placed in the todevq queue, with the page array writable:
+	struct virtio_balloon_pages {
+		u64 page[];
+	};
+   The device may not fill the entire page array.  The contents
+   of the pages received will be undefined.  The device should
    keep count of how many pages remain in the balloon so it can
    correctly respond to future resize requests.
 4. A VIRTIO_BALLOON_REQ_STATS command indicates that the driver
    should report what stats are available.
-5. To report stats, the following message is sent to the output queue.
+5. To report stats, the following message is sent to the todevq queue.
    Indeterminable stats must not be reported.
 	struct virtio_balloon_stats {

