OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-dev message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [PATCH] add virtio-pmem device spec


On Tue,  9 Jul 2019 13:41:34 +0530
Pankaj Gupta <pagupta@redhat.com> wrote:

> This patch proposes a virtio specification for new
> virtio pmem device. Virtio pmem is a paravirtualized
> device which solves two problems:
> 
> - Provides emulation of persistent memory on host regular
>   (non NVDIMM) storage.
> - Allows the guest to bypass the page cache.
> 
> This is changed version from previous RFC [1], incorporated
> changes suggested by Stefan, Michael & Cornerlia.
> 
> [1] https://lists.oasis-open.org/archives/virtio-dev/201903/msg00083.html
> 
> Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> ---
>  conformance.tex |  23 ++++++++++-
>  content.tex     |   1 +
>  virtio-pmem.tex | 118 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 140 insertions(+), 2 deletions(-)
>  create mode 100644 virtio-pmem.tex

(...)

> diff --git a/virtio-pmem.tex b/virtio-pmem.tex
> new file mode 100644
> index 0000000..fa64c6b
> --- /dev/null
> +++ b/virtio-pmem.tex
> @@ -0,0 +1,118 @@
> +\section{PMEM Device}\label{sec:Device Types / PMEM Device}
> +
> +virtio pmem is an emulated persistent memory device. The 
> +device is plugged to the guest using virtio bus.

"virtio bus" sounds too Linux-specific. Maybe just say "virtio pmem is
an emulated persistent memory devices using virtio."?

> +
> +Device works as fake nvdimm device when emulated on host regular

s/Device/The devices/

(more of these below)

> +(non NVDIMM) device. Device provides a virtio based asynchronous
> +flush mechanism to persists the guest writes. This avoids the 

s/persists/persist/

> +need of separate caching inside the guest and host side caching
> +is used. Under memory pressure, host makes efficient memory reclaim

s/host/the host/

> +decisions on uniform view of memory.
> +
> +\subsection{Device ID}\label{sec:Device Types / PMEM Device / Device ID}
> +  27
> +
> +\subsection{Virtqueues}\label{sec:Device Types / PMEM Device / Virtqueues}
> +\begin{description}
> +\item[0] req_vq
> +\end{description}
> +
> +\subsection{Feature bits}\label{sec:Device Types / PMEM Device / Feature bits}
> +
> +There are currently no feature bits defined for this device.
> +
> +\subsection{Device configuration layout}\label{sec:Device Types / PMEM Device / Device configuration layout}
> +
> +\begin{lstlisting}
> +struct virtio_pmem_config {
> +	uint64_t start;
> +	uint64_t size;
> +};
> +\end{lstlisting}
> +
> +\field{start} contains the physical address of the start of the persistent memory range.
> +\field{size} in bytes length of address range.

"contains the length of the address range in bytes"?

> +
> +\subsection{Device Initialization}\label{sec:Device Types / PMEM Device / Device Initialization}
> +
> +Device hot-plugs physical memory to guest address space. Persistent memory device
> +is emulated at host side.
> +
> +\begin{enumerate}
> +  \item The driver reads the physical start address from \field{start}.
> +  \item The driver reads the length of the persistent memory range from \field{size}.
> +  \end{enumerate}
> +
> +\devicenormative{\subsubsection}{Device Initialization}{Device Types / PMEM Device / Device Initialization}
> +
> +Host memory region MUST be mapped to guest address space in a 

s/Host memory region/The host memory region/

> +way so that updates are visible to other processes mapping the
> +same memory region. 
> +
> +\subsection{Driver Initialization}\label{sec:Device Types / PMEM Driver / Driver Initialization}
> +
> +Memory stores to the persistent memory range are not guaranteed to be
> +persistent without further action. An explicit flush command is
> +required to ensure persistence. The req_vq is used to perform flush
> +commands.
> +
> +\drivernormative{\subsubsection}{Driver Initialization: flush callback}{Device Types / PMEM Driver / Driver Initialization / flush callback}
> +
> +Driver MUST implement a virtio based flush callback, otherwise driver

s/Driver/The driver/

> +will fail to register.

I'm not sure what a "virtio based flush callback" is supposed to mean,
though?

Also, what does it mean if a driver "fails to register"? I don't think
you can translate this concept to the spec, it seems to be an
implementation detail.

> +
> +\subsection{Driver Operations}\label{sec:Device Types / PMEM Driver / Driver Operation}
> +
> +The VIRTIO_PMEM_REQ_TYPE_FLUSH command persists memory writes that were performed

Maybe "all memory writes to the persistent memory range", to be more
clear?

> +before the command was submitted. Once the command completes the

s/the/those/, maybe?

> +writes are guaranteed to be persistent.
> +
> +\drivernormative{\subsubsection}{Driver Operation: Virtqueue command}{Device Types / PMEM Driver / Driver Operation / Virtqueue command}
> +
> +The driver MUST submit a VIRTIO_PMEM_REQ_TYPE_FLUSH command after performing
> +memory writes that require persistence.
> +
> +The driver MUST wait for the VIRTIO_PMEM_REQ_TYPE_FLUSH command to complete before
> +assuming previous writes are persistent.
> +
> +\subsection{Device Operations}\label{sec:Device Types / PMEM Driver / Device Operation}
> +
> +\devicenormative{\subsubsection}{Device Operations: Virtqueue flush}{Device Types / PMEM Device / Device Operation / Virtqueue flush}
> +
> +Device SHOULD handle multiple flush requests simultaneously using
> +corresponding host flush mechanisms.
> +
> +\devicenormative{\subsubsection}{Device operations: Virtqueue return}{Device Types / PMEM Device / Device Operation / Virtqueue return}
> +
> +Device SHOULD return integer '0' for success and '1' for failure.
> +These errors are converted to corresponding error codes by guest as
> +per architecture.

I think what the driver actually does with the 0/1 return values is
already an implementation detail. Also, why 'SHOULD'? The driver needs
to be sure that it won't interpret success as failure, and vice versa.
At least "return 0 on success, !0 otherwise" needs to be a MUST, I
think.

> +
> +\subsection{Possible security implications}\label{sec:Device Types / PMEM Device / Possible Security Implications}
> +
> +If two devices actually share the same memory, this creates a 

Maybe "Two devices actually sharing the same memory creates a..."?

> +potential information leak as access patterns of one driver could
> +be observable by another driver.
> +
> +This can happen for example if two devices are implemented in software
> +by a hyper-visor, and two drivers are parts of VMs running on the 
> +hyper-visor. In this case, the timing of access to device memory

I'm not sure whether hyper-visor or hypervisor is the more common
spelling here.

> +might leak information about access patterns from one VM to another.
> +
> +This can include, but might not be limited to:
> +\begin{enumerate}
> +\item Configurations sharing a single region of device memory (even in a read-only configuration)
> +\item Configurations with a shared cache between devices (e.g. linux page cache)

s/linux/Linux/

> +\item Configurations with memory deduplication techniques such as KSM similar side-channels 
> + might be present if the device memory is shared with another system, e.g. information about
> + the hyper-visor/host page cache might leak into a VM guest.

I'm not quite able to make sense of this sentence -- is there a
semicolon missing after 'KSM'?

> +\end{enumerate}
> +
> +\subsection{Countermeasures}\label{sec:Device Types / PMEM Device / Possible Security Implications / Countermeasures}
> +Solution is to avoid sharing resources between devices.
> +\begin{enumerate}
> +\item Each VM must have its own device memory, not shared with any other VM or process. 
> +\item If VM workload is a special application and there is no risk, its okay to share the device memory.  

s/VM workload/the VM workload/
s/its/it is/

> +\item Don't allow host cache eviction from VM when device memory is shared with other VM or host process. 
> +\end{enumerate}

Generally, I think this has moved a lot into the right direction; some
general nits from my side, and some remaining things that sound a bit
too Linux-specific.


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]