[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [virtio-dev] [PATCH RESEND] virtio-pmem: PMEM device spec
Hi Stefan, Thank you for reviewing this. > > Posting virtio specification for virtio pmem device. Virtio pmem is a > > paravirtualized device which allows the guest to bypass page cache. > > Virtio pmem kernel driver is merged in Upstream Kernel 5.3. Also, Qemu > > device is merged in Qemu 4.1. > > > > Signed-off-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> > > --- > > Sorry, It took me long time to get back on this. There is > > an enhancement to this spec by "Taylor Stark" CCed in the list. > > Request for feedback and merging. > > > > RFC is posted here [1] > > [1] https://lists.oasis-open.org/archives/virtio-dev/201903/msg00083.html > > I skimmed through the review comments but pretty much reviewed this > patch from scratch. Feel free to ignore questions that others have > already raised. > > > > > conformance.tex | 19 ++++++- > > content.tex | 1 + > > virtio-pmem.tex | 132 ++++++++++++++++++++++++++++++++++++++++++++++++ > > 3 files changed, 150 insertions(+), 2 deletions(-) > > create mode 100644 virtio-pmem.tex > > > > diff --git a/conformance.tex b/conformance.tex > > index 94d7a06..818ddda 100644 > > --- a/conformance.tex > > +++ b/conformance.tex > > @@ -31,7 +31,8 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets} > > \ref{sec:Conformance / Driver Conformance / Sound Driver Conformance}, > > \ref{sec:Conformance / Driver Conformance / Memory Driver Conformance}, > > \ref{sec:Conformance / Driver Conformance / I2C Adapter Driver Conformance} or > > -\ref{sec:Conformance / Driver Conformance / SCMI Driver Conformance}. > > +\ref{sec:Conformance / Driver Conformance / SCMI Driver Conformance}, > > +\ref{sec:Conformance / Driver Conformance / PMEM Driver Conformance}. > > > > \item Clause \ref{sec:Conformance / Legacy Interface: Transitional Device and Transitional Driver Conformance}. > > \end{itemize} > > @@ -55,7 +56,8 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets} > > \ref{sec:Conformance / Device Conformance / Sound Device Conformance}, > > \ref{sec:Conformance / Device Conformance / Memory Device Conformance}, > > \ref{sec:Conformance / Device Conformance / I2C Adapter Device Conformance} or > > -\ref{sec:Conformance / Device Conformance / SCMI Device Conformance}. > > +\ref{sec:Conformance / Device Conformance / SCMI Device Conformance}, > > +\ref{sec:Conformance / Device Conformance / PMEM Driver Conformance}. > > > > \item Clause \ref{sec:Conformance / Legacy Interface: Transitional Device and Transitional Driver Conformance}. > > \end{itemize} > > @@ -301,6 +303,19 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets} > > \item \ref{drivernormative:Device Types / SCMI Device / Device Operation / Setting Up eventq Buffers} > > \end{itemize} > > > > +\conformance{\subsection}{PMEM Driver Conformance}\label{sec:Conformance / Driver Conformance / PMEM Driver Conformance} > > + > > +A PMEM driver MUST conform to the following normative statements: > > + > > +\begin{itemize} > > +\item \ref{devicenormative:Device Types / PMEM Device / Device Initialization} > > +\item \ref{drivernormative:Device Types / PMEM Driver / Driver Initialization / Direct access} > > +\item \ref{drivernormative:Device Types / PMEM Driver / Driver Initialization / Virtio flush} > > +\item \ref{drivernormative:Device Types / PMEM Driver / Driver Operation / Virtqueue command} > > +\item \ref{devicenormative:Device Types / PMEM Device / Device Operation / Virtqueue flush} > > +\item \ref{devicenormative:Device Types / PMEM Device / Device Operation / Virtqueue return} > > +\end{itemize} > > + > > \conformance{\section}{Device Conformance}\label{sec:Conformance / Device Conformance} > > > > A device MUST conform to the following normative statements: > > diff --git a/content.tex b/content.tex > > index ceb2562..6acc785 100644 > > --- a/content.tex > > +++ b/content.tex > > @@ -6583,6 +6583,7 @@ \subsubsection{Legacy Interface: Framing Requirements}\label{sec:Device > > \input{virtio-mem.tex} > > \input{virtio-i2c.tex} > > \input{virtio-scmi.tex} > > +\input{virtio-pmem.tex} > > > > \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits} > > > > diff --git a/virtio-pmem.tex b/virtio-pmem.tex > > new file mode 100644 > > index 0000000..a2b888e > > --- /dev/null > > +++ b/virtio-pmem.tex > > @@ -0,0 +1,132 @@ > > +\section{PMEM Device}\label{sec:Device Types / PMEM Device} > > + > > +The virtio pmem is a fake persistent memory (NVDIMM) device > > s/fake/virtual/ or drop "fake"? If the device persists data correctly > then it's not fake. Yes, Agree. > > > +used to bypass the guest page cache and provide a virtio > > +based asynchronous flush mechanism.This avoids the need > > +of a separate page cache in guest and keeps page cache only > > +in the host. Under memory pressure, the host makes use of > > +effecient memory reclaim decisions for page cache pages > > s/effecient/efficient/ noted. > > > +of all the guests. This helps to reduce the memory footprint > > +and fit more guests in the host system. > > This explains the motivation for the device. It would also be nice to > explain the nature of the device: > > The virtio pmem device provides access to byte-addressable persistent > memory. The persist memory is directly accessible as a Shared Memory > Region. Data written to this memory is made persistent by separately > sending a flush command. Writes that have been flushed are preserved > across device reset and power failure. I was not sure we need to add motivation as well in the spec. But yes, will add. > > > + > > +\subsection{Device ID}\label{sec:Device Types / PMEM Device / Device ID} > > + 27 > > + > > +\subsection{Virtqueues}\label{sec:Device Types / PMEM Device / Virtqueues} > > +\begin{description} > > +\item[0] req_vq > > +\end{description} > > + > > +\subsection{Feature bits}\label{sec:Device Types / PMEM Device / Feature bits} > > + > > +There are currently no feature bits defined for this device. > > + > > +\subsection{Device configuration layout}\label{sec:Device Types / PMEM Device / Device configuration layout} > > + > > +\begin{lstlisting} > > +struct virtio_pmem_config { > > + le64 start; > > + le64 size; > > +}; > > +\end{lstlisting} > > + > > +\begin{description} > > +\item[\field{start}] contains the start address from the guest physical address range > > +to be hotplugged into the guest address space using the pmem API. > > + > > +\item[\field{size}] contains the length of this address range. > > +\end{description} > > Please use a Shared Memory Region instead. This is a relatively new > addition to the VIRTIO device model that seems like a good fit for this > device: > https://github.com/oasis-tcs/virtio-spec/blob/master/shared-mem.tex David has answered this. > > > + > > +\subsection{Device Initialization}\label{sec:Device Types / PMEM Device / Device Initialization} > > + > > +Device hotplugs physical memory to guest address space. Persistent memory device > > +is emulated with file backed memory at host side. > > This paragraph can be dropped. The device could be implemented in a > number of ways and it's beyond the scope of the specification. But spec is for current implementation? As we have more devices or ways, we can amend this. > > > + > > +\begin{enumerate} > > +\item Guest vpmem start is read from \field{start}. > > +\item Guest vpmem end is read from \field{size}. > > +\end{enumerate} > > Please avoid the terms "host" and "guest". VIRTIO devices can also be > implemented in hardware and used without guests. The spec usually uses > the terms "device" and "driver" instead. Agree. > > > + > > +\devicenormative{\subsubsection}{Device Initialization}{Device Types / PMEM Device / Device Initialization} > > + > > +File backed memory MUST be memory mapped to guest address space with SHARED > > +memory mapping. > > This is a device implementation detail that is beyond the scope of the > specification. The VIRTIO spec is concerned with the driver/device > interface, not with the implementation details of the device. Already noted this. > > > + > > +\subsection{Driver Initialization}\label{sec:Device Types / PMEM Driver / Driver Initialization} > > + > > +Driver hotplugs the physical memory and registers associated region with the pmem API. > > The pmem API is a Linux kernel implementation detail. Instead you could > say something like: > > The driver determines the start address and size of the persist memory > region in preparation for reading or writing data. Sure. > > > +Also, configures a flush callback function with the corresponding region. > > "callback function" is a driver implementation detail that's beyond the > scope of the specification. Instead you could say something like: > > The driver initializes req_vq in preparation for making flush > requests. Sure. > > > + > > +\drivernormative{\subsubsection}{Driver Initialization: Filesystem direct access}{Device Types / PMEM Driver / Driver Initialization / Direct access} > > + > > +Driver MUST enable filesystem direct access operations for read/write on the device. > > This is beyond the scope of the VIRTIO specification because it's a > Linux guest kernel detail. noted. > > > + > > +\drivernormative{\subsubsection}{Driver Initialization: Virtio flush}{Device Types / PMEM Driver / Driver Initialization / Virtio flush} > > + > > +Driver MUST implement a virtio based flush callback. > > Driver implementation detail. > > > + > > +Driver MUST disable other FLUSH/SYNC mechanisms for the device when virtio flush is configured. > > What does this mean? This will be removed as per earlier feedback. > > > + > > +\subsection{Driver Operations}\label{sec:Device Types / PMEM Driver / Driver Operation} > > +\drivernormative{\subsubsection}{Driver Operation: Virtqueue command}{Device Types / PMEM Driver / Driver Operation / Virtqueue command} > > + > > +Driver MUST send VIRTIO_FLUSH command on request virtqueue, allows guest userspace process to perform IO operations asynchronously. > > VIRTIO_FLUSH has not been defined. o.k > > "guest userspace process" is beyond the scope of the driver/device > interface and therefore not relevant to the VIRTIO specification. This > could be rephrased: > > The driver MUST send and wait for the successful completion of a > VIRTIO_PMEM_FLUSH command on req_vq in order to ensure previously > written data will persist across device reset and power failure. > > > + > > +Driver SHOULD handle multiple fsync requests on files present on the device. > > The concept of "files" is beyond the scope of the driver/device > interface. We only deal with the memory region in this specification, so > I think the scenario you're describing is when multiple writes have been > performed and there are several flush commands in flight. yes. noted this as per earlier feedback. > > > + > > +\subsection{Device Operations}\label{sec:Device Types / PMEM Driver / Device Operation} > > + > > +\devicenormative{\subsubsection}{Device Operations}{Device Types / PMEM Device / Device Operation / Virtqueue flush} > > + > > +Device SHOULD handle multiple flush requests simultaneously using host filesystem fsync or flush call. > > Same thing as above. > > Missing: > > The device MUST ensure that all writes made before a flush request > will persist across device reset and power failure before completing > the flush request. Sure. > > > + > > +\devicenormative{\subsubsection}{Device operations}{Device Types / PMEM Device / Device Operation / Virtqueue return} > > + > > +Device MUST return integer "0" for success and "-1" for failure. > > +These errors are converted to corresponding error codes by guest > > +as per architecture. > > This sentence about guest error codes is outside the scope of the VIRTIO > specification. Yes. w > > > + > > +\subsection{Possible security implications}\label{sec:Device Types / PMEM Device / Possible Security Implications} > > + > > +There could be potential security implications depending on how > > +memory mapped host backing file is used. By default device emulation > > +is done with SHARED mapping. There is a contract between guest and host > > +process to access same backing file for read/write operations. > > + > > +If a malicious guest or host userspace map the same backing file, > > +attacking process can make use of known cache side channel attacks > > +to predict the current state of shared page cache page. If both > > +attacker and victim somehow execute same shared code after a > > +flush/evict call, with difference in execution timing attacker > > +could infer another guest local data or host data. Though this is > > +not easy and same challenges exist as with bare metal host system > > +when userspace share same backing file. > > This is important information but needs to be phrased without referring > to "host"/"host" and only with respect to the driver/device interface > (not the host/guest kernel, applications, etc). Sure will do, Thank you! > > The same applies to the rest of the security points below. Yes > > > + > > +\subsection{Countermeasures}\label{sec:Device Types / PMEM Device / Possible Security Implications / Countermeasures} > > + > > +\subsubsection{ With SHARED mapping}\label{sec:Device Types / PMEM Device / Possible Security Implications / Countermeasures / SHARED} > > + > > +If device backing backing file is shared with multiple guests or host > > +processes, this may act as a metric for page cache side channel attack. > > +As a counter measure every guest should have its own(not shared with > > +another guest) SHARED backing file and gets populated a per host process > > +page cache pages. > > + > > +\subsubsection{ With PRIVATE mapping}\label{sec:Device Types / PMEM Device / Possible Security Implications / Countermeasures / PRIVATE} > > +There maybe be chances of side channels attack with PRIVATE > > +memory mapping similar to SHARED with read-only shared mappings. > > +PRIVATE is not used for virtio pmem making this usecase > > +irrelevant. > > + > > +\subsubsection{ Workload specific mapping}\label{sec:Device Types / PMEM Device / Possible Security Implications / Countermeasures / Workload} > > +For SHARED mapping, if workload is single application inside > > +guest and there is no risk with sharing of data between guests. > > +Guest sharing same backing file with SHARED mapping can be > > +used as a valid configuration. > > + > > +\subsubsection{ Prevent cache eviction}\label{sec:Device Types / PMEM Device / Possible Security Implications / Countermeasures / Cache eviction} > > +Don't allow cache evict from guest filesystem trim/discard command > > +with virtio pmem. This rules out any possibility of evict-reload > > +page cache side channel attacks if backing disk is shared(SHARED) > > +with mutliple guests. Though if we use per device backing file with > > +shared mapping this countermeasure is not required. > > -- > > 2.25.1 > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org > >
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]