[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: [RFC] virtio-pmem: PMEM device spec
This patch proposes a virtio specification for new virtio pmem device. Virtio pmem is a paravirtualized device which allows the guest to bypass page cache. Previous posting of kernel driver is [1] and Qemu device is [2]. Specification has introduction of virtio pmem device with other implementation details. I have also listed concerns with page cache side channel attacks in previous kernel driver posting [1] and possible countermeasures based on discussion [4]. I have also created an virtio-spec issue [5] for this. Request to provide feedback on device specification. [1] https://lkml.org/lkml/2019/1/9/471 [2] https://marc.info/?l=qemu-devel&m=153555721901824&w=2 [3] https://lkml.org/lkml/2019/1/9/541 [4] https://lkml.org/lkml/2019/2/4/1151 [5] https://github.com/oasis-tcs/virtio-spec/issues/38 Signed-off-by: Pankaj Gupta <pagupta@redhat.com> --- conformance.tex | 14 ++++++ content.tex | 3 ++ virtio-pmem.tex | 134 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 151 insertions(+) create mode 100644 virtio-pmem.tex diff --git a/conformance.tex b/conformance.tex index ad7e82e..2133a7c 100644 --- a/conformance.tex +++ b/conformance.tex @@ -181,6 +181,20 @@ A socket driver MUST conform to the following normative statements: \item \ref{drivernormative:Device Types / Socket Device / Device Operation / Device Events} \end{itemize} +\subsection{PMEM Driver Conformance}\label{sec:Conformance / Driver Conformance / PMEM Driver Conformance} + +A PMEM driver MUST conform to the following normative statements: + +\begin{itemize} +\item \ref{devicenormative:Device Types / PMEM Device / Device Initialization} +\item \ref{drivernormative:Device Types / PMEM Driver / Driver Initialization / Direct access} +\item \ref{drivernormative:Device Types / PMEM Driver / Driver Initialization / Virtio flush} +\item \ref{drivernormative:Device Types / PMEM Driver / Driver Operation / Virtqueue command} +\item \ref{devicenormative:Device Types / PMEM Device / Device Operation / Virtqueue flush} +\item \ref{devicenormative:Device Types / PMEM Device / Device Operation / Virtqueue return} +\end{itemize} + + \section{Device Conformance}\label{sec:Conformance / Device Conformance} A device MUST conform to the following normative statements: diff --git a/content.tex b/content.tex index ede0ef6..6077d32 100644 --- a/content.tex +++ b/content.tex @@ -2634,6 +2634,8 @@ Device ID & Virtio Device \\ \hline 24 & Memory device \\ \hline +25 & PMEM device \\ +\hline \end{tabular} Some of the devices above are unspecified by this document, @@ -5595,6 +5597,7 @@ descriptor for the \field{sense_len}, \field{residual}, \input{virtio-input.tex} \input{virtio-crypto.tex} \input{virtio-vsock.tex} +\input{virtio-pmem.tex} \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits} diff --git a/virtio-pmem.tex b/virtio-pmem.tex new file mode 100644 index 0000000..04e07bb --- /dev/null +++ b/virtio-pmem.tex @@ -0,0 +1,134 @@ +\section{PMEM Device}\label{sec:Device Types / PMEM Device} + +The virtio pmem is a fake persistent memory (NVDIMM) device +used to bypass the guest page cache and provide a virtio +based asynchronous flush mechanism. This avoids the need +of a separate page cache in guest and keeps page cache only +in the host. Under memory pressure, the host makes use of +effecient memory reclaim decisions for page cache pages +of all the guests. This helps to reduce the memory footprint +and fit more guests in the host system. + +\subsection{Device ID}\label{sec:Device Types / PMEM Device / Device ID} + 25 + +\subsection{Virtqueues}\label{sec:Device Types / PMEM Device / Virtqueues} +\begin{description} +\item[0] req_vq +\end{description} + +\subsection{Feature bits}\label{sec:Device Types / PMEM Device / Feature bits} + +There are currently no feature bits defined for this device. + +\subsection{Device configuration layout}\label{sec:Device Types / PMEM Device / Device configuration layout} + +\begin{lstlisting} +struct virtio_pmem_config { + le64 start; + le64 size; +}; +\end{lstlisting} + +\field{start} contains the guest physical address of the start of the +physical address range to be hotplugged into the guest address space +using the pmem API. +\field{size} contains the length of this address range. + +\subsection{Device Initialization}\label{sec:Device Types / PMEM Device / Device Initialization} + +Device hotplugs physical memory to guest address space. Persistent memory device +is emulated with file backed memory at host side. + +\begin{enumerate} +\item Guest vpmem start is read from \field{start}. +\item Guest vpmem end is read from \field{size}. +\end{enumerate} + +\devicenormative{\subsubsection}{Device Initialization}{Device Types / PMEM Device / Device Initialization} + +File backed memory SHOULD be memory mapped to guest address space with SHARED +memory mapping. + +\subsection{Driver Initialization}\label{sec:Device Types / PMEM Driver / Driver Initialization} + +Driver hotplugs the physical memory and registers associated +region with the pmem API. Also, configures a flush callback +function with the corresponding region. + +\drivernormative{\subsubsection}{Driver Initialization: Filesystem direct access}{Device Types / PMEM Driver / Driver Initialization / Direct access} + +Driver SHOULD enable filesystem direct access operations for +read/write on the device. + +\drivernormative{\subsubsection}{Driver Initialization: Virtio flush}{Device Types / PMEM Driver / Driver Initialization / Virtio flush} + +Driver SHOULD add implement a virtio based flush callback. +Driver SHOULD disable other FLUSH/SYNC mechanisms for the device +when virtio flush is configured. + +\subsection{Driver Operations}\label{sec:Device Types / PMEM Driver / Driver Operation} +\drivernormative{\subsubsection}{Driver Operation: Virtqueue command}{Device Types / PMEM Driver / Driver Operation / Virtqueue command} + +Driver SHOULD send VIRTIO_FLUSH command on request virtqueue, +blocks guest userspace process utill host completes fsync/flush. +Driver SHOULD handle multiple fsync requests on files present +on the device. + +\subsection{Device Operations}\label{sec:Device Types / PMEM Driver / Device Operation} + +\devicenormative{\subsubsection}{Device Operations}{Device Types / PMEM Device / Device Operation / Virtqueue flush} + +Device SHOULD handle multiple flush requests simultaneously using +host filesystem fsync/flush call. + +\devicenormative{\subsubsection}{Device operations}{Device Types / PMEM Device / Device Operation / Virtqueue return} + +Device SHOULD return integer '0' for success and '-1' for failure. +These errors are converted to corresponding error codes by guest as +per architecture. + +\subsection{Possible security implications}\label{sec:Device Types / PMEM Device / Possible Security Implications} + +There could be potential security implications depending on how +memory mapped host backing file is used. By default device emulation +is done with SHARED mapping. There is a contract between guest and host +process to access same backing file for read/write operations. + +If a malicious guest or host userspace map the same backing file, +attacking process can make use of known cache side channel attacks +to predict the current state of shared page cache page. If both +attacker and victim somehow execute same shared code after a +flush/evict call, with difference in execution timing attacker +could infer another guest local data or host data. Though this is +not easy and same challenges exist as with bare metal host system +when userspace share same backing file. + +\subsection{Countermeasures}\label{sec:Device Types / PMEM Device / Possible Security Implications / Countermeasures} + +\subsubsection{ With SHARED mapping}\label{sec:Device Types / PMEM Device / Possible Security Implications / Countermeasures / SHARED} + +If device backing backing file is shared with multiple guests or host +processes, this may act as a metric for page cache side channel attack. +As a counter measure every guest should have its own(not shared with +another guest) SHARED backing file and gets populated a per host process +page cache pages. + +\subsubsection{ With PRIVATE mapping}\label{sec:Device Types / PMEM Device / Possible Security Implications / Countermeasures / PRIVATE} +There maybe be chances of side channels attack with PRIVATE +memory mapping similar to SHARED with read-only shared mappings. +PRIVATE is not used for virtio pmem making this usecase +irrelevant. + +\subsubsection{ Workload specific mapping}\label{sec:Device Types / PMEM Device / Possible Security Implications / Countermeasures / Workload} +For SHARED mapping, if workload is single application inside +guest and there is no risk with sharing of data between guests. +Guest sharing same backing file with SHARED mapping can be +used as a valid configuration. + +\subsubsection{ Prevent cache eviction}\label{sec:Device Types / PMEM Device / Possible Security Implications / Countermeasures / Cache eviction} +Don't allow cache evict from guest filesystem trim/discard command +with virtio pmem. This rules out any possibility of evict-reload +page cache side channel attacks if backing disk is shared(SHARED) +with mutliple guests. Though if we use per device backing file with +shared mapping this countermeasure is not required. -- 2.14.3
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]