OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-dev message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [virtio-dev] [PATCH v3 2/2] virtio-fs: add DAX window


On Tue, Jun 25, 2019 at 10:55:15AM +0100, Dr. David Alan Gilbert wrote:
> * Michael S. Tsirkin (mst@redhat.com) wrote:
> > On Mon, Jun 24, 2019 at 02:58:08PM +0100, Stefan Hajnoczi wrote:
> > > On Tue, Jun 18, 2019 at 09:41:25PM -0400, Michael S. Tsirkin wrote:
> > > > On Wed, Feb 20, 2019 at 12:46:13PM +0000, Stefan Hajnoczi wrote:
> > > > > Describe how shared memory region ID 0 is the DAX window and how
> > > > > FUSE_SETUPMAPPING maps file ranges into the window.
> > > > > 
> > > > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > > > ---
> > > > > Note that this depends on the shared memory resource specification
> > > > > extension that David Gilbert is working on.
> > > > > https://lists.oasis-open.org/archives/virtio-comment/201901/msg00000.html
> > > > > 
> > > > > The FUSE_SETUPMAPPING message is part of the virtio-fs Linux patches:
> > > > > https://gitlab.com/virtio-fs/linux/blob/virtio-fs/include/uapi/linux/fuse.h
> > > > > ---
> > > > >  virtio-fs.tex | 25 +++++++++++++++++++++++++
> > > > >  1 file changed, 25 insertions(+)
> > > > > 
> > > > > diff --git a/virtio-fs.tex b/virtio-fs.tex
> > > > > index 5df5b9c..abb1e48 100644
> > > > > --- a/virtio-fs.tex
> > > > > +++ b/virtio-fs.tex
> > > > > @@ -157,6 +157,31 @@ The driver MUST submit FUSE_INTERRUPT, FUSE_FORGET, and FUSE_BATCH_FORGET reques
> > > > >  
> > > > >  The driver MUST anticipate that request queues are processed concurrently with the hiprio queue.
> > > > >  
> > > > > +\subsubsection{Device Operation: DAX Window}\label{sec:Device Types / File System Device / Device Operation / Device Operation: DAX Window}
> > > > > +
> > > > > +FUSE\_READ and FUSE\_WRITE requests transfer file contents between the
> > > > > +driver-provided buffer and the device.  In cases where data transfer is
> > > > > +undesirable, the device can map file contents into the DAX window shared memory
> > > > > +region.  The driver then accesses file contents directly in device-owned memory
> > > > > +without a data transfer.
> > > > > +
> > > > > +Shared memory region ID 0 is called the DAX window.  The driver maps a file
> > > > > +range into the DAX window using the FUSE\_SETUPMAPPING request.  The mapping is
> > > > > +removed using the FUSE\_REMOVEMAPPING request.
> > > > 
> > > > I don't see FUSE\_SETUPMAPPING or FUSE\_REMOVEMAPPING  under
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/fuse.h
> > > > Is it just me?
> > > 
> > > They are not upstream yet and can be found here:
> > > 
> > > https://gitlab.com/virtio-fs/linux/blob/virtio-fs/include/uapi/linux/fuse.h#L384
> > > 
> > > There is a chicken-and-egg problem.  Linux should merge this once the
> > > spec has been accepted.  The spec makes reference to a new FUSE command
> > > that is being added to Linux.  :D
> > > 
> > > I suggest we break it by merging the VIRTIO spec change first.  There
> > > won't be a spec release so soon anyway and we can revert it in case
> > > there are issues Linux.  Miklos, the FUSE maintainer, is well aware of
> > > virtio-fs and contributes to it, so it's unlikely that Linux will reject
> > > these commands.
> > > 
> > > > > +
> > > > > +After FUSE\_SETUPMAPPING has completed successfully the file range is accessible
> > > > > +from the DAX window at the offset provided by the driver in the request.
> > > > 
> > > > Dgilbert's patches describing shared memory say that
> > > > the legal ways to set up mappings are all implementation-dependent.
> > > > How does driver know which attributes to use for the
> > > > mapping?
> > > 
> > > Two different types of mappings:
> > > 1. The DAX window shared memory region described by DaveG's spec.
> > > 2. The file mappings established using FUSE_SETUPMAPPING.
> > > 
> > > The virtio_fs.ko driver maps the DAX window, e.g. from a PCI BAR in an
> > > implementation-defined way.  virtio_pci_*.c in Linux will have to help
> > > out with the implementation-specific details here.
> > > 
> > > The only flags currently supported by FUSE_SETUPMAPPING are READ and
> > > WRITE.  This depends on the file's access mode.  There is nothing
> > > implementation-specific in FUSE_SETUPMAPPING.
> > 
> > Sorry - I'm being unclear.
> > The guest driver maps parts of the PCI BAR.
> > What are the attributes of this mapping?
> > This is unrelated to FUSE_SETUPMAPPING things -
> > mapping is created by creatig PTEs and such
> > within guest, not by virtio things.
> 
> By attributes you mean... memory ordering, cachability etc?

I mean non-cacheable, writeback, write combining.
standard mmap of pci bar is non-cacheable.

> > 
> > > > Also, we recently had a discussion about DAX support on hosts
> > > > and safety wrt crashes. Do we need to expose this
> > > > information to guests maybe?
> > > 
> > > No.  Although virtio-fs uses the DAX subsystem, it does not use NVDIMM's
> > > persistence model (e.g. CPU cache flush for persistence).  FUSE_FSYNC is
> > > sent when persistence is required.  Therefore virtio-fs is still using
> > > the traditional file/block persistence model.  No changes necessary for
> > > power failure, etc.
> > > 
> > > > Finally, do we want to have a way to express that the filesystem
> > > > only allows RO mappings?
> > > 
> > > Thanks for this idea.  I'm discussing it with the FUSE community because
> > > mount -o ro with FUSE currently doesn't involve the file system daemon.
> > > 
> > > > > +
> > > > > +\devicenormative{\paragraph}{Device Operation: DAX Window}{Device Types / File System Device / Device Operation / Device Operation: DAX Window}
> > > > > +
> > > > > +The device MUST allow mappings that completely or partially overlap existing mappings within the DAX window.
> > > > 
> > > > 
> > > > Any alignment requirements?
> > > 
> > > Good point.  There are alignment requirements and the driver has no way
> > > of knowing what they are.  I'll find a way to communicate them into the
> > > guest, either via virtio or via FUSE.
> > > 
> > > > Also, with no limit on mappings, it looks like guest can use up lots of
> > > > host VMAs quickly. Shouldn't there be a limit on # of mappings?
> > > 
> > > The VM can only deteriorate its own performance, right?
> > 
> > Only if QEMU is put in a container where virtual memory is
> > limited.
> > It's generally not a good idea where the only way for
> > host to make progress is to allocate more memory
> > without any limit.
> > 
> > If we are in a situation where we need to either kill
> > the guest or hit swap, none of the choices is good.
> 
> There is a bound; it's cache region size / page size - so
> that's ~1M mappings worst case (e.g. 4GB cache, 4kB page size)
> That limit can be bought down if we impose a larger granularity
> somewhere (and the reality is our kernel uses 2MB mapping chunks I
> think).
> 
> > > We haven't seen catastrophic problems that bring the system to it's
> > > knees.
> > 
> > Because you are not running malicious guests?
> 
> Hmm, I didn't realise a process having an excessive number of mappings
> could harm any other process.
> 
> Dave

Well it allocates resources on the host. If you don't
contain qemu then even just allocating virtual memory
can make host swap, right? If you contain it then
qemu will get killed instead but then you need to tell
guest what not to do so as not to get qemu killed.


> > >  But we're aware that increasing the number VMAs slows down the
> > > lookup.  There is currently no imposed limit.
> > > 
> > > Ideas have been discussed to avoid using (so many) VMAs but it seems
> > > like that will take some time to develop and get upstream.  This will
> > > not affect the virtio specification because the device interface doesn't
> > > need to know about this.
> > > 
> > > Stefan
> > 
> > 
> > One way to address this is to expose the # of mappings
> > in the config space.
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]