Re: [virtio-dev] Memory sharing device

On Tue, Feb 12, 2019 at 8:02 PM Michael S. Tsirkin <mst@redhat.com> wrote:

On Tue, Feb 12, 2019 at 06:50:29PM -0800, Frank Yang wrote:
>
>
> On Tue, Feb 12, 2019 at 11:06 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
>Â Â ÂOn Tue, Feb 12, 2019 at 09:26:10AM -0800, Frank Yang wrote:
>Â Â Â> BTW, the other unique aspect is that the ping messages allow a _host_
>Â Â Âpointer
>Â Â Â> to serve as the lump of shared memory;
>Â Â Â> then there is no need to track buffers in the guest kernel and the device
>Â Â Â> implementation can perform specialize buffer space management.
>Â Â Â> Because it is also host pointer shared memory, it is also physically
>Â Â Âcontiguous
>Â Â Â> and there is no scatterlist needed to process the traffic.
>
>Â Â ÂYes at the moment virtio descriptors all pass addresses guest to host.
>
>Â Â ÂAbility to reverse that was part of the vhost-pci proposal a while ago.
>Â Â ÂBTW that also at least originally had ability to tunnel
>Â Â Âmultiple devices over a single connection.
>
>
>
> Can there be a similar proposal for virtio-pci without vhsot?
>
>Â Â ÂThere was nothing wrong with the proposals I think, they
>Â Â Âjust had to be polished a bit before making it into the spec.
>Â Â ÂAnd that runneling was dropped but I think it can be brought back
>Â Â Âif desired, we just didn't see a use for it.
>
>
> Thinking about it more, I think vhost-pci might be too much for us due to the
> vhost requirement (sockets and IPC while we desire a highly process local
> solution)

I agree because the patches try to document a bunch of stuff.
But I really just mean taking the host/guest interface
part from there.

So, are you referring to the new ideas that vhost-pci introduces minus socket IPC/inter-VM communication, and the vhost server being in the same process as qemu?

That sounds like we could build something for qemu (Stefan?) that talks to a virtio-pci-user (?) backend with a similar set of command line arguments.

> But there's nothing preventing us from having the same reversals for virtio-pci
> devices without vhost, right?

Right. I think that if you build something such that vhost pci
can be an instance of it on top, then it would have
a lot of value.

I'd be very eager to chase this down. The more interop with existing virtual PCI concepts the better.

> That's kind of what's being proposed with the shared memory stuff at the
> moment, though it is not a device type by itself yet (Arguably, it should be).
>
>
>Â Â ÂHow about that? That sounds close to what you were looking for,
>Â Â Âdoes it not? That would be something to look into -
>Â Â Âif your ideas can be used to implement a virtio device
>Â Â Âbackend by code running within a VM, that would be very interesting.
>
>
> What about a device type, say, virtio-metapci,

I have to say I really dislike that name. It's basically just saying I'm
not telling you what it is.Â Let's try to figure it out.Â Looks like
although it's not a vsock device it's also trying to support creating
channels with support for passing two types of messages (data and
control) as well as some shared memory access.Â And it also has
enumeration so opened channels can be tied to what? strings?Â PCI Device
IDs?

I think we can build this relying on PCI device ids, assuming there are still device IDs readily available.

Then the vsock device was designed for this problem space.Â It might not
be a good fit for you e.g. because of some vsock baggage it has. But one
of the complex issues it does address is controlling host resource usage
so guest socket can't DOS host or starve other sockets by throwing data
at host. Things might slow down but progress will be made. If you are
building a generic kind of message exchange you could do worse than copy
that protocol part.

That's a good point and I should make sure we've captured that.

I don't think the question of why not vsock generally was addressed all
that well. There's discussion of sockets and poll, but that has nothing
to do with virtio which is a host/guest interface. If you are basically
happy with the host/guest interface but want to bind a different driver
to it, with some minor tweaks, we could create a virtio-wsock which is
just like virtio-vsock but has a different id, and use that as a
starting point.Â Go wild build a different driver for it.

The virtio-wsock notion also sounds good, though (also from Stefan's comments) I'd want to clarify

how we would define such a device type

that is both pure in terms of host/guest interface (i.e., not assuming sockets either in the guest or host),

but doesn't also, at the implementation level, imply that the existing implementation of v(w)sock

change to accomodate non-socket-based guest/host interfaces.

> that relies on virtio-pci for
> device enumeration and shared memory handling
> (assuming it's going to be compatible with the host pointer shared memory
> implementation),
> so there's no duplication of the concept of device enumeration nor shared
> memory operations.
> But, it works in terms of the ping / event virtqueues, and relies on the host
> hypervisor to dispatch to device implementation callbacks.

All the talk about dispatch and device implementation is just adding to
confusion.Â This isn't something that belongs in virtio spec anyway, and
e.g. qemu is unlikely to add an in-process plugin support just for this.

A plugin system of some type is what we think is quite valuable,

for decoupling device functionality from QEMU.

Keeping the same process is also attractive because of the lack of need of IPC.

If there's a lightweight, cross platform way to do IPC via function pointer mechanisms,

perhaps by dlsym or LoadLibrary on another process,

that could work too, though yes, it would need to be integrated to qemu.

> A potential issue is that such metapci device share the same device id
> namespace as other virtio-pci devices...but maybe that's OK?

That's a vague question.
Same device and vendor id needs to imply same driver works.
I think you use the terminology that doesn't match virtio.
words device and driver have a specific meaning and that
doesnt include things like implementation callbacks.

> If this can build on virtio-pci, I might be able to come up with a spec that
> assumes virtio-pci as the transport,
> and assumes (via the WIP host memory sharing work) that host memory can be used
> as buffer storage.
> The difference is that it will not contain most of the config virtqueue stuff
> (except maybe for create/destroy instance),
> and it should also work with the existing ecosystem around virtio-pci.
> Â

I still can't say from above whether it's in scope for virtio or not.
All the talk about blobs and controlling both host and guest sounds
out of scope. But it could be that there are pieces that are
inscope, and you would use them for whatever vendor specific
thing you need.

And I spent a lot of time on this by now.

So could you maybe try to extract specifically the host/guest interface
things that you miss? I got the part where you want to take a buffer
within BAR and pass it to guest.Â But beyond that I didn't get a lot. E.g.
who is sending most data? host? guest? both? There are control messages
are these coming from guest?Â Do you want to know when guest is done
with the buffer host allocated? Can you take a buffer away from guest?

Thanks for taking the time to evaluate; we very much appreciate it

and want to resolve the issues you're having!

To answer the other questions:

- We expect most data to be sent by the guest, except in the cases of image readback,

where the host will write a lot of data.

- The control messages (ping) are driven by the guest only; at most, the guest can async wait on a long host operation that was triggered by the guest.

Hence, the events argument in the spec being guest driver, and revents being populated by the host.

- It is out of scope to know when the guest is done with the host buffer.

- Buffers will be owned by the host, and the guest will not own any buffers under the current scheme.

This is because it is up to the host-side implementation to decide how to back new memory allocations from the guest.

OTOH all the callback discussion is really irrelevant for the virtio tc.
If things can't be described without this they are out of scope
for virtio.

ÂWe can describe the current proposal for virtio without explicitly naming callbacks

on the host, but it would push that kind of implementation burden to the host qemu,

so I thought it would be a good idea to lay things out end to end

in a way that would be concretely implementable.

>
>Â Â Â--
>Â Â ÂMST
>

virtio-dev message