[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [virtio-comment] [PATCH 1/3] shared memory: Define shared memory regions
On 15.02.19 12:07, Cornelia Huck wrote: > On Thu, 14 Feb 2019 09:43:10 -0800 > Frank Yang <lfy@google.com> wrote: > >> On Thu, Feb 14, 2019 at 8:37 AM Dr. David Alan Gilbert <dgilbert@redhat.com> >> wrote: >> >>> * Cornelia Huck (cohuck@redhat.com) wrote: >>>> On Wed, 13 Feb 2019 18:37:56 +0000 >>>> "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote: >>>> >>>>> * Cornelia Huck (cohuck@redhat.com) wrote: >>>>>> On Wed, 16 Jan 2019 20:06:25 +0000 >>>>>> "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote: >>>>>> >>>>>>> So these are all moving this 1/3 forward - has anyone got comments >>> on >>>>>>> the transport specific implementations? >>>>>> >>>>>> No comment on pci or mmio, but I've hacked something together for >>> ccw. >>>>>> Basically, one sense-type ccw for discovery and a control-type ccw >>> for >>>>>> activation of the regions (no idea if we really need the latter), >>> both >>>>>> available with ccw revision 3. >>>>>> >>>>>> No idea whether this will work this way, though... >>>>> >>>>> That sounds (from a shm perspective) reasonable; can I ask why the >>>>> 'activate' is needed? >>>> >>>> The activate interface is actually what I'm most unsure about; maybe >>>> Halil can chime in. >>>> >>>> My basic concern is that we don't have any idea how the guest will use >>>> the available memory. If the shared memory areas are supposed to be >>>> mapped into an inconvenient place, the activate interface gives the >>>> guest a chance to clear up that area before the host starts writing to >>>> it. >>> >>> I'm expecting the host to map it into an area of GPA that is out of the >>> way - it doesn't overlap with RAM. > > My issue here is that I'm not sure how to model something like that on > s390... > >>> Given that, I'm not sure why the guest would have to do any 'clear up' - >>> it probably wants to make a virtual mapping somewhere, but again that's >>> upto the guest to do when it feels like it. >>> >>> >> This is what we do with Vulkan as well. >> >> >>>> I'm not really enthusiastic about that interface... for one, I'm not >>>> sure how this plays out at the device type level, which should not >>>> really concern itself with transport-specific handling. >>> >>> I'd expect the host side code to give an area of memory to the transport >>> and tell it to map it somewhere (in the QEMU terminology a MemoryRegion >>> I think). > > My main issue is the 'somewhere'. > >>> >> >> I wonder if this could help: the way we're running Vulkan at the moment, >> what we do is add a the concept of a MemoryRegion with no actual backing: >> >> https://android-review.googlesource.com/q/topic:%22qemu-user-controlled-hv-mappings%22+(status:open%20OR%20status:merged) >> >> and it would be connected to the entire PCI address space on the shared >> memory address space realization. So it's kind of like a sparse or deferred >> MemoryRegion. >> >> When the guest actually wants to map a subregion associated with the host >> memory, >> on the host side, we can call the hypervisor to map the region, based on >> giving the device implementation the functions KVM_SET_USER_MEMORY_REGION >> and analogs. >> >> This has the advantage of a smaller contact area between shm and qemu, >> where the device level stuff can operate at a separate layer from >> MemoryRegions which is more transport level. > > That sounds like an interesting concept, but I'm not quite sure how it > would help with my problem. Read on for more explanation below... > >> >> >>> Similarly in the guest, I'm expecting the driver for the device to >>> ask for a pointer to a region with a particular ID and that goes >>> down to the transport code. >>> >>> Another option would be to map these into a special memory area that >>>> the guest won't use for its normal operation... the original s390 >>>> (non-ccw) virtio transport mapped everything into two special pages >>>> above the guest memory, but that was quite painful, and I don't think >>>> we want to go down that road again. >>> >>> Can you explain why? > > The background here is that s390 traditionally does not have any > concept of memory-mapped I/O. IOW, you don't just write to or read from > a special memory area; instead, I/O operations use special instructions. > > The mechanism I'm trying to extend here is channel I/O: the driver > builds a channel program with commands that point to guest memory areas > and hands it to the channel subsystem (which means, in our case, the > host) via a special instruction. The channel subsystem and the device > (the host, in our case) translate the memory addresses and execute the > commands. The one place where we write shared memory directly in the > virtio case are the virtqueues -- which are allocated in guest memory, > so the guest decides which memory addresses are special. Accessing the > config space of a virtio device via the ccw transport does not > read/write a memory location directly, but instead uses a channel > program that performs the read/write. > > For pci, the memory accesses are mapped to special instructions: > reading or writing the config space of a pci device does not perform > reads or writes of a memory location, either; the driver uses special > instructions to access the config space (which are also > interpreted/emulated by QEMU, for example.) > > The old s390 (pre-virtio-ccw) virtio transport had to rely on the > knowledge that there were two pages containing the virtqueues etc. > right above the normal memory (probed by checking whether accessing > that memory gave an exception or not). The main problems were that this > was inflexible (the guest had no easy way to find out how many > 'special' pages were present, other than trying to access them), and > that it was different from whatever other mechanisms are common on s390. Probing is always ugly. But I think we can add something like the x86 PCI hole between 3 and 4 GB after our initial boot memory. So there, we would have a memory region just like e.g. x86 has. This should even work with other mechanism I am working on. E.g. for memory devices, we will add yet another memory region above the special PCI region. The layout of the guest would then be something like [0x000000000000000] ... Memory region containing RAM [ram_size ] ... Memory region for e.g. special PCI devices [ram_size +1 GB ] ... Memory region for memory devices (virtio-pmem, virtio-mem ...) [maxram_size - ram_size + 1GB] We would have to create proper page tables for guest backing that take care of the new guest size (not just ram_size). Also, to the guest we would indicate "maximum ram size == ram_size" so it does not try to probe the "special" memory. As we are using paravirtualized features here, this should be fine for. Unmodified guests will never touch/probe anything beyond ram_size. > > We might be able to come up with another scheme, but I wouldn't hold my > breath. Would be great if someone else with s390 knowledge could chime > in here. -- Thanks, David / dhildenb
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]