OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-dev message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [virtio-dev] [PATCH v3 2/2] virtio-fs: add DAX window


On Mon, Jun 24, 2019 at 02:58:08PM +0100, Stefan Hajnoczi wrote:
> On Tue, Jun 18, 2019 at 09:41:25PM -0400, Michael S. Tsirkin wrote:
> > On Wed, Feb 20, 2019 at 12:46:13PM +0000, Stefan Hajnoczi wrote:
> > > Describe how shared memory region ID 0 is the DAX window and how
> > > FUSE_SETUPMAPPING maps file ranges into the window.
> > > 
> > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > ---
> > > Note that this depends on the shared memory resource specification
> > > extension that David Gilbert is working on.
> > > https://lists.oasis-open.org/archives/virtio-comment/201901/msg00000.html
> > > 
> > > The FUSE_SETUPMAPPING message is part of the virtio-fs Linux patches:
> > > https://gitlab.com/virtio-fs/linux/blob/virtio-fs/include/uapi/linux/fuse.h
> > > ---
> > >  virtio-fs.tex | 25 +++++++++++++++++++++++++
> > >  1 file changed, 25 insertions(+)
> > > 
> > > diff --git a/virtio-fs.tex b/virtio-fs.tex
> > > index 5df5b9c..abb1e48 100644
> > > --- a/virtio-fs.tex
> > > +++ b/virtio-fs.tex
> > > @@ -157,6 +157,31 @@ The driver MUST submit FUSE_INTERRUPT, FUSE_FORGET, and FUSE_BATCH_FORGET reques
> > >  
> > >  The driver MUST anticipate that request queues are processed concurrently with the hiprio queue.
> > >  
> > > +\subsubsection{Device Operation: DAX Window}\label{sec:Device Types / File System Device / Device Operation / Device Operation: DAX Window}
> > > +
> > > +FUSE\_READ and FUSE\_WRITE requests transfer file contents between the
> > > +driver-provided buffer and the device.  In cases where data transfer is
> > > +undesirable, the device can map file contents into the DAX window shared memory
> > > +region.  The driver then accesses file contents directly in device-owned memory
> > > +without a data transfer.
> > > +
> > > +Shared memory region ID 0 is called the DAX window.  The driver maps a file
> > > +range into the DAX window using the FUSE\_SETUPMAPPING request.  The mapping is
> > > +removed using the FUSE\_REMOVEMAPPING request.
> > 
> > I don't see FUSE\_SETUPMAPPING or FUSE\_REMOVEMAPPING  under
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/fuse.h
> > Is it just me?
> 
> They are not upstream yet and can be found here:
> 
> https://gitlab.com/virtio-fs/linux/blob/virtio-fs/include/uapi/linux/fuse.h#L384
> 
> There is a chicken-and-egg problem.  Linux should merge this once the
> spec has been accepted.  The spec makes reference to a new FUSE command
> that is being added to Linux.  :D
> 
> I suggest we break it by merging the VIRTIO spec change first.  There
> won't be a spec release so soon anyway and we can revert it in case
> there are issues Linux.  Miklos, the FUSE maintainer, is well aware of
> virtio-fs and contributes to it, so it's unlikely that Linux will reject
> these commands.
> 
> > > +
> > > +After FUSE\_SETUPMAPPING has completed successfully the file range is accessible
> > > +from the DAX window at the offset provided by the driver in the request.
> > 
> > Dgilbert's patches describing shared memory say that
> > the legal ways to set up mappings are all implementation-dependent.
> > How does driver know which attributes to use for the
> > mapping?
> 
> Two different types of mappings:
> 1. The DAX window shared memory region described by DaveG's spec.
> 2. The file mappings established using FUSE_SETUPMAPPING.
> 
> The virtio_fs.ko driver maps the DAX window, e.g. from a PCI BAR in an
> implementation-defined way.  virtio_pci_*.c in Linux will have to help
> out with the implementation-specific details here.
> 
> The only flags currently supported by FUSE_SETUPMAPPING are READ and
> WRITE.  This depends on the file's access mode.  There is nothing
> implementation-specific in FUSE_SETUPMAPPING.

Sorry - I'm being unclear.
The guest driver maps parts of the PCI BAR.
What are the attributes of this mapping?
This is unrelated to FUSE_SETUPMAPPING things -
mapping is created by creatig PTEs and such
within guest, not by virtio things.


> > Also, we recently had a discussion about DAX support on hosts
> > and safety wrt crashes. Do we need to expose this
> > information to guests maybe?
> 
> No.  Although virtio-fs uses the DAX subsystem, it does not use NVDIMM's
> persistence model (e.g. CPU cache flush for persistence).  FUSE_FSYNC is
> sent when persistence is required.  Therefore virtio-fs is still using
> the traditional file/block persistence model.  No changes necessary for
> power failure, etc.
> 
> > Finally, do we want to have a way to express that the filesystem
> > only allows RO mappings?
> 
> Thanks for this idea.  I'm discussing it with the FUSE community because
> mount -o ro with FUSE currently doesn't involve the file system daemon.
> 
> > > +
> > > +\devicenormative{\paragraph}{Device Operation: DAX Window}{Device Types / File System Device / Device Operation / Device Operation: DAX Window}
> > > +
> > > +The device MUST allow mappings that completely or partially overlap existing mappings within the DAX window.
> > 
> > 
> > Any alignment requirements?
> 
> Good point.  There are alignment requirements and the driver has no way
> of knowing what they are.  I'll find a way to communicate them into the
> guest, either via virtio or via FUSE.
> 
> > Also, with no limit on mappings, it looks like guest can use up lots of
> > host VMAs quickly. Shouldn't there be a limit on # of mappings?
> 
> The VM can only deteriorate its own performance, right?

Only if QEMU is put in a container where virtual memory is
limited.
It's generally not a good idea where the only way for
host to make progress is to allocate more memory
without any limit.

If we are in a situation where we need to either kill
the guest or hit swap, none of the choices is good.




> We haven't seen catastrophic problems that bring the system to it's
> knees.

Because you are not running malicious guests?

>  But we're aware that increasing the number VMAs slows down the
> lookup.  There is currently no imposed limit.
> 
> Ideas have been discussed to avoid using (so many) VMAs but it seems
> like that will take some time to develop and get upstream.  This will
> not affect the virtio specification because the device interface doesn't
> need to know about this.
> 
> Stefan


One way to address this is to expose the # of mappings
in the config space.


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]