OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: [RFC PATCH 1/1] virtio-balloon: Add Working Set Reporting feature


Adds VIRTIO_F_WS_REPORTING feature bit.
Adds additional virtqueues and device operation details.

Signed-off-by: T.J. Alumbaugh <talumbau@google.com>
---
 device-types/balloon/description.tex | 228 ++++++++++++++++++++++++++-
 1 file changed, 227 insertions(+), 1 deletion(-)

diff --git a/device-types/balloon/description.tex
b/device-types/balloon/description.tex
index a1d9603..4ea764f 100644
--- a/device-types/balloon/description.tex
+++ b/device-types/balloon/description.tex
@@ -22,6 +22,8 @@ \subsection{Virtqueues}\label{sec:Device Types /
Memory Balloon Device / Virtque
 \item[2] statsq
 \item[3] free_page_vq
 \item[4] reporting_vq
+\item[5] ws_vq
+\item[6] notification_vq
 \end{description}

   statsq only exists if VIRTIO_BALLOON_F_STATS_VQ is set.
@@ -30,6 +32,8 @@ \subsection{Virtqueues}\label{sec:Device Types /
Memory Balloon Device / Virtque

   reporting_vq only exists if VIRTIO_BALLOON_F_PAGE_REPORTING is set.

+  s_vq and notification_vq only exist if VIRTIO_BALLOON_F_WS_REPORTING is set.
+
 \subsection{Feature bits}\label{sec:Device Types / Memory Balloon
Device / Feature bits}
 \begin{description}
 \item[VIRTIO_BALLOON_F_MUST_TELL_HOST (0)] Host has to be told before
@@ -48,7 +52,9 @@ \subsection{Feature bits}\label{sec:Device Types /
Memory Balloon Device / Featu
     Configuration field \field{poison_val} is valid.
 \item[ VIRTIO_BALLOON_F_PAGE_REPORTING(5) ] The device has support for free
     page reporting. A virtqueue for reporting free guest memory is present.
-
+\item[ VIRTIO_BALLOON_F_WS_REPORTING(6) ] The device has support for
Working Set
+    (WS) reporting. A virtqueue for reporting WS histograms is
present (ws_vq) and
+    a virtqueue to receive WS-related notifications (notification_vq)
is present.
 \end{description}

 \drivernormative{\subsubsection}{Feature bits}{Device Types / Memory
Balloon Device / Feature bits}
@@ -86,6 +92,8 @@ \subsection{Device configuration
layout}\label{sec:Device Types / Memory Balloon
     read-only by the driver.
   \field{poison_val} is available if VIRTIO_BALLOON_F_PAGE_POISON has been
     negotiated.
+  \field{ws_num_bins} is available if VIRTIO_BALLOON_F_WS_REPORTING has been
+    negotiated.

 \begin{lstlisting}
 struct virtio_balloon_config {
@@ -93,6 +101,7 @@ \subsection{Device configuration
layout}\label{sec:Device Types / Memory Balloon
         le32 actual;
         le32 free_page_hint_cmd_id;
         le32 poison_val;
+        le32 ws_num_bins;
 };
 \end{lstlisting}

@@ -632,3 +641,220 @@ \subsubsection{Free Page
Reporting}\label{sec:Device Types / Memory Balloon Devi
 If the VIRTIO_BALLOON_F_PAGE_POISON feature has been negotiated, the device
 MUST NOT modify the the content of a reported page to a value other than
 \field{poison_val}.
+
+\subsubsection{Working Set Reporting}\label{sec:Device Types / Memory
Balloon Device / Device Operation / Working Set Reporting}
+
+A Working Set ("WS") measures what memory a computer system has recently
+used (where "recently" is application specific). In most practical systems,
+memory is viewed at the granularity of a page. An ideal system would check
+the access time for every page after every instruction, but this is not
+practical. In a realistic scenario, the idle age of a page can be defined as:
+
+\begin{lstlisting}
+    idle_age = current_system_time - time_access_bit_was_cleared
+\end{lstlisting}
+
+\field{time_access_bit_was_cleared} is a proxy for "time of last access."
+Checking (and clearing) the "accessed" bit on a page table entry is a typical
+task in operating systems, running from time to time in memory management
+activities. In this scheme, accuracy is sacrificed for improved performance
+(since less time overall is spent on scanning the memory).
+
+The Working Set consists of "bins" of pages of similar estimated idle age.
+Collecting idle ages for large sets of pages means finding convenient and
+efficient times to check the accessed bits. For all these pages, we associate
+some time \field{t} with the set, and logically consider them as "accessed
+no later than time t."
+
+The collection of "binned" sets of pages is best described as a histogram,
+where each bin has an associated idle age and all pages in the bin have been
+idle for no longer than that age.
+
+\paragraph{Memory Types: Working Set Reporting}\label{sec:Device
Types / Memory Balloon Device / Device Operation / Working Set
Reporting / Memory Types: Working Set Reporting}
+
+Each bin can describe more than one type of memory, reflecting the different
+types of pages tracked by an operating system. Memory types are enumerated
+in the \field{virtio_balloon_ws_memory_type} enum. To guarantee backwards
+compatibility, devices are free to ignore unrecognized WS memory type values.
+
+\begin{lstlisting}
+enum virtio_balloon_ws_memory_type {
+  VIRTIO_BALLOON_WS_ANON
+ VIRTIO_BALLOON_WS_FILE
+ };
+\end{lstlisting}
+
+The supported memory types are as follows:
+
+\begin{description}
+\item[ANON] Memory that is not backed by files.
+
+\item[FILE] This is memory that is backed by files, and represents the total
+  of both dirty and clean pages of file-backed memory.
+\end{description}
+
+\paragraph{Idle Age Units}\label{sec:Device Types / Memory Balloon
Device / Device Operation / Working Set Reporting / Idle Age Units}
+
+The time unit for the idle age is specified by the guest system and reported
+by the driver. Valid types are enumerated in the
+\field{virtio_balloon_ws_age_units_type} enum.
+
+\begin{lstlisting}
+enum virtio_balloon_ws_age_units_type {
+  VIRTIO_BALLOON_WS_MILLISECONDS
+ };
+\end{lstlisting}
+
+The currently supported age unit types are:
+
+\begin{description}
+  \item[MILLISECONDS] with a 64-bit unsigned type, this can cover idle ages of
+  up to many years.
+\end{description}
+
+\paragraph{NUMA}\label{sec:Device Types / Memory Balloon Device /
Device Operation / Working Set Reporting / NUMA}
+
+A 16 bit node_id is used to communicate the NUMA node associated with a bin of
+the WS report. The node_id MUST be a value between 0 and
+\field{max_numa_nodes} -1 (inclusive). \field{max_numa_nodes} is the maximum
+number of supported NUMA nodes on the guest system.
+
+\paragraph{Working Set Report}\label{sec:Device Types / Memory
Balloon Device / Device Operation / Working Set Reporting / Working
Set Report}
+
+A full WS report is a variable length structure with the following layout:
+
+struct virtio_balloon_ws_report {
+le16 node_id;
+struct {
+le64 idle_age
+le64 memory_size_bytes[2 // nr_types];
+} [ws_num_bins];
+}
+
+Ordering within the report is such that the struct with the smallest
+\field{idle_age} value comes first and  represents the hottest memory, i.e. all
+memory in this bin has an idle age of at most `idle_age`, The bin with the next
+largest `idle_age` refers to memory that has an idle_age greater than the first
+bin, but less than or equal to the `idle_age` of the current bin, and so on.
+The sequence of struct values MUST be in order of increasing `idle_age`. The
+last struct ALWAYS has an `idle_age` value of  LONG_LONG_MAX, since it
+represents simply the oldest memory with no upper bound on idle age.
+
+The driver MAY send WS Reports at its discretion, typically in times of memory
+pressure. For NUMA systems, a complete report consists of the above array for
+one NUMA node. The driver MAY provide a sequence of reports, one for each NUMA
+node.
+
+\paragraph{Virtqueue Usage}\label{sec:Device Types / Memory Balloon
Device / Device Operation / Working Set Reporting / Virtqueue Usage}
+
+Notifications are sent from the device to the driver via the notification
+virtqueue. The notification virtqueue is different from other virtqueues in
+that the driver creates an input buffer of the appropriate size and then
+signals the device that the buffer is available. When the device chooses to
+send a notification, it fills the buffer with the appropriate message (and any
+additional data) and notifies the driver. The driver is then responsible for
+reading the notification, taking appropriate action, and then presenting a new
+empty buffer back to the device for the next notification.
+
+Each valid notification has an associated value in the
+\field{virtio_balloon_ws_operation} enum.
+
+\begin{lstlisting}
+enum virtio_balloon_ws_operation_type {
+  VIRTIO_BALLOON_WS_REQUEST 1
+  VIRTIO_BALLOON_WS_CONFIG 2
+ };
+\end{lstlisting}
+
+The first data in the buffer is a 16-bit tag with a valid operation type. The
+data that is placed in the buffer after the operation identifier value depends
+on the operation provided.
+
+The current notification operations are:
+\begin{description}
+\item[WS Request] the device requests that the device send a current WS Report.
+  No additional data is required after this identifier.
+\item[WS Config] This message supplies the required configuration information
+  for receiving future WS Reports. After this operation identifier, the
+  following data MUST be in the buffer:
+\end{description}
+
+\begin{lstlisting}
+struct virtio_balloon_ws_config {
+    struct {
+        le64 idle_age
+    } [ws_num_bins - 1];
+    le64 refresh;
+    le64 report;
+    le16 age_units_type;
+}
+\end{lstlisting}
+
+The first \field{ws_num_bins} - 1 values are the interval values provided in
+increasing order. They are the expected idle_age values for each bin in the
+reported histogram. Conceptually, the idle_age value represents an upper
+(closed) boundary on the time of last access for all memory associated with
+that bin (the last bin has no maximum value and simply contains "the coldest"
+memory)
+
+The next value is the refresh_threshold. and it indicates an upper bound on
+how old the WS Report may be. It can be useful for the driver to send a
+cached WS Report collected at some point in the recent past, rather than
+collecting the data for a fresh report with each transmission. The time
+referred to via this value indicates how old such a cached report may be. Note
+the distinction: "idle age" measures time since the last reference for some
+amount of memory with respect to a moment in time; "staleness" is how far in
+the past that instant is allowed to be. The driver MUST NOT send a WS Report
+that represents the guest state older than the refresh threshold.
+
+The next value is the report_threshold. It is the rate-limiting mechanism that
+indicates a lower bound on the time between reports. After sending a WS Report,
+the driver MUST NOT send another WS Report until report_threshold units of
+time have expired.
+
+The final value is the  virtio_balloon_ws_age_units_type which provides the
+units of the previous {ws_num_bins}+1 values.
+
+The driver MUST NOT begin sending WS reports until it receives an initial
+\field{WS_CONFIG}  message via the notifications virtqueue. The device MAY send
+additional \field{WS_CONFIG} notifications. The number of bins is fixed, but
+bin intervals, refresh threshold, and report thresholds can be changed.
+
+The allowed range for {ws_num_bins} are set via these values:
+\begin{lstlisting}
+#define VIRTIO_BALLOON_WS_MAX_NUM_BINS  16
+#define VIRTIO_BALLOON_WS_MIN_NUM_BINS  2
+\end{lstlisting}
+
+The ws_vq virtqueue transmits the WS report from the driver to the device.
+This virtqueue functions in a way that is similar to the stats virtqueue.
+The reporting proceeds as follows:
+The driver collects the WS information into a new buffer.
+The driver adds the buffer to the virtqueue and notifies the device.
+The device pops the buffer and consumes the WS report.
+
+The driver determines when to send the WS report, although the device may send
+requests for a report (via WS_REQUEST) at any time. The typical situation is
+to send the WS report during times of memory pressure, informing the host of
+what memory is currently in use, with the notion that the host might trigger
+a balloon deflation.
+
+\drivernormative{\paragraph}{Working Set Reporting}{Device Types /
Memory Balloon Device / Device Operation / Working Set Reporting}
+
+Normative statements in this section apply if the
+VIRTIO_BALLOON_F_WS_REPORTING feature has been negotiated.
+
+The driver MUST NOT report the WS until the WS_CONFIG message is received from
+the device.
+The driver MAY report a "cached" WS, that is, a report representing the state
+of the system at some recent time in the past. The maximum "staleness" of the
+WS report is given by the report_threshold, from above.
+
+The driver SHOULD honor the requested idle age units if it is able, but it MAY
+choose other units if the requested units are not supported in the guest. In
+that case, the driver MAY supply bin intervals, report and refresh thresholds
+of its choosing. Once the device begins receiving WS reports in the
+non-requested units, it can then follow up with a subsequent WS CONFIG
+specifying desired interval and threshold values in units that the guest system
+supports.
+
-- 
2.40.1.606.ga4b1b128d6-goog

On Mon, May 15, 2023 at 2:33âPM T.J. Alumbaugh <talumbau@google.com> wrote:
>
> This is a proposed spec expansion for a Working Set Reporting feature
> in the balloon with driver patch here:
>
> https://lore.kernel.org/linux-mm/20230509185419.1088297-1-yuanchu@google.com/
>
> with device implementation here:
>
> https://lists.gnu.org/archive/html/qemu-devel/2023-05/msg02503.html
>
> It describes the requirements for a VIRTIO_F_WS_REPORTING feature bit
> on the balloon device.
>
> Motivation
> ==========
> When we have a system with overcommitted memory and 1 or more VMs, we
> seek to get both timely and accurate information on overall memory
> utilization in order to drive appropriate reclaim activities. For
> example, in some client device use cases a VM might need a significant
> fraction of the overall memory for a period of time, but then enter a
> quiet period that results in a large number of cold pages in the guest.
>
> The balloon device has a number of features to assist in sharing memory
> resources amongst the guests and host (e.g free page hinting, stats, free page
> reporting). As mentioned in slide 12 in [1], the balloon doesn't have a good
> mechanism to drive the reclaim of guest cache. Our use case includes both
> typical page cache as well as "application caches" with memory that should be
> discarded in times of system-wide memory pressure.
>
> Working Set Reporting
> =====================
>
> Working Set reporting in the balloon provides:
>
>  - an accurate picture of current memory utilization in the guest
>  - event driven reporting (with configurable rate limiting) to deliver reports
>    during times of memory pressure.
>
> The reporting mechanism can be combined with a domain-specific balloon policy
> to drive the separate reclaim activities in a coordinated fashion.
>
> TODOs:
> ======
>
>  - There are some small differences between this spec and the
>    implementation in the data exchange protocol in the device. We wanted to
>    get feedback on this diff at an early stage though, rather than get every
>    piece nailed down with precision.
>
> References:
>
> [1] https://kvmforum2020.sched.com/event/eE4U/virtio-balloonpmemmem-managing-guest-memory-david-hildenbrand-michael-s-tsirkin-red-hat
>
> T.J. Alumbaugh (1):
>   virtio-balloon: Add Working Set Reporting feature
>
>  device-types/balloon/description.tex | 228 ++++++++++++++++++++++++++-
>  1 file changed, 227 insertions(+), 1 deletion(-)
>
> --
> 2.40.1.606.ga4b1b128d6-goog


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]