OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [virtio-comment] [PATCH 1/3] shared memory: Define shared memory regions


On Fri, 15 Feb 2019 15:14:25 +0000
"Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:

> * David Hildenbrand (david@redhat.com) wrote:
> > On 15.02.19 15:02, Dr. David Alan Gilbert wrote:
> > > * David Hildenbrand (david@redhat.com) wrote:
> > >> On 15.02.19 14:50, Dr. David Alan Gilbert wrote:
> > >>> * Cornelia Huck (cohuck@redhat.com) wrote:
> > >>>> On Fri, 15 Feb 2019 13:33:06 +0100
> > >>>> David Hildenbrand <david@redhat.com> wrote:
> > >>>>
> > >>>>> On 15.02.19 13:28, Cornelia Huck wrote:
> > >>>>>> On Fri, 15 Feb 2019 12:26:00 +0100
> > >>>>>> David Hildenbrand <david@redhat.com> wrote:
> > >>>>>>   
> > >>>>>>> Probing is always ugly. But I think we can add something like
> > >>>>>>>  the x86 PCI hole between 3 and 4 GB after our initial boot memory.
> > >>>>>>> So there, we would have a memory region just like e.g. x86 has.  
> > >>>>>>
> > >>>>>> A special region is probably the best way out of this pickle. We would
> > >>>>>> only need the discovery ccw for virtio, then.
> > >>>>>>   
> > >>>>>>>
> > >>>>>>> This should even work with other mechanism I am working on. E.g.
> > >>>>>>> for memory devices, we will add yet another memory region above
> > >>>>>>> the special PCI region.
> > >>>>>>>
> > >>>>>>> The layout of the guest would then be something like
> > >>>>>>>
> > >>>>>>> [0x000000000000000]
> > >>>>>>> ... Memory region containing RAM
> > >>>>>>> [ram_size         ]
> > >>>>>>> ... Memory region for e.g. special PCI devices
> > >>>>>>> [ram_size +1 GB   ]
> > >>>>>>> ... Memory region for memory devices (virtio-pmem, virtio-mem ...)
> > >>>>>>> [maxram_size - ram_size + 1GB]
> > >>>>>>>
> > >>>>>>> We would have to create proper page tables for guest backing that take
> > >>>>>>> care of the new guest size (not just ram_size). Also, to the guest we
> > >>>>>>> would indicate "maximum ram size == ram_size" so it does not try to
> > >>>>>>> probe the "special" memory.  
> > >>>>>>
> > >>>>>> Hm... so that would be:
> > >>>>>> - 0..ram_size: just like it is handled now
> > >>>>>> - ram_size..ram_size + 1GB: guest does not treat it as ram, but does
> > >>>>>>   build page tables for it
> > >>>>>> - ram_size + 1GB..maxram_size: for whatever memory devices do with it
> > >>>>>>
> > >>>>>> How does the guest probe this? (SCLP?) Or does the guest simply know
> > >>>>>> via some kind of probable feature that there's a 1GB region there?  
> > >>>>>
> > >>>>> As the guest only "knowns" ram, there is a "maximum ram size" specified
> > >>>>> via SCLP. An unmodified guest will not probe beyond that.
> > >>>>
> > >>>> Nod.
> > >>>>
> > >>>>> The parts of the 1GB used by a device should be communicated via the
> > >>>>> paravirtualized device I guess. PCI bars don't really fit I assume, so
> > >>>>> we might need some virtio-ccw thingy (you're the expert :)) on top. That
> > >>>>> is one part to be clarified.
> > >>>>>
> > >>>>> I guess the guest does not need to know about the whole 1GB, only per
> > >>>>> device about the used part. We can then built page tables in the guest
> > >>>>> for that part when plugging.
> > >>>>
> > >>>> Hm. With my proposal, the guest would get a list of region addresses
> > >>>> from the device via a new ccw. It could then proceed to set up page
> > >>>> tables for it and start to use it. As long as it is aware that the
> > >>>> addresses it will get are beyond max_ram, that should be fine, I think.
> > >>>
> > >>> Which is the same as my virtio-mmio proposal; the host gets to put it
> > >>> where ever it sees fit (outside ram) and you've just got a way of
> > >>> telling the guest where it lives.
> > >>>
> > >>> Davidh's 1GB window is pretty much how older PCs worked I think;
> > >>> the problem is that 1GB is never enough and you still need a way
> > >>> to enumarate what devices are where, so it doesn't help you.
> > >>> (Our current virtio-fs dax mappings we're using are a few GB).
> > >>>
> > >>
> > >> How does that work on x86? You cannot suddenly move stuff into the
> > >> memory device memory region and potentially mess with DIMMs to be
> > >> plugged later. QEMU wise, this sounds wrong.
> > > 
> > > Because it's PCI based, it becomes the guests problem - the guest
> > > sets the PCI BARs which set the GPA of the PCI devices;  I assume
> > > there's some protection that happens if it gets mapped over RAM (?!)
> > > 
> > > I think that varies by firmware as well, with EFI mapping
> > > them differently from our bios.
> > > I think the guest knows the total number of DIMM slots and max-ram
> > > limit, so knows where not-to-map.
> > 
> > On s390x, we have to define the size of the host->guest page table when
> > starting the guest. So we need some upper limit.
> 
> That's OK; x86 also has that because they have a limited physical
> and virtual address size [which may or may not be correctly passed to
> the guest!].
> 
> > Mapping anywhere, I
> > really don't like. Letting the guest define the mapping, I really don't
> > like.
> 
> Well it's OK to have a hole for it, but letting the guest choose where
> those mappings go in the hole is the norm for PCI (there are
> exceptions).
> 
> > We can of course switch the order of mappings
> > 
> > [0x000000000000000      ]
> > ... Memory region containing RAM
> > [ram_size         	]
> > ... Memory region for memory devices (virtio-pmem, virtio-mem ...)
> > [maxram_size - ram_size ]
> > ... Memory region for e.g. special PCI/CCW devices
> > [                    TBD]
> > 
> > We can size TBD in a way that we e.g. max out the current page table
> > size before having to switch to more levels.
> 
> Yes, that's fine to set some upper limit; you've just got to make sure
> that the hypervisor knows where it can put stuff and if the guest
> does PCI that it knows where it's allowed to put stuff and as long
> as the two don't overlap everyone is happy.
> 
> [We should probably take this level of detail off this list - it's
> parsecs away from the detail of virtio]

If you do take the in detail discussion off is list please keep me in the
loop.

Regards,
Halil



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]