virtio-dev message

Subject: Re: [virtio-dev] [PATCH] snd: Add virtio sound device specification

From: Anton Yakovlev <anton.yakovlev@opensynergy.com>
To: Gerd Hoffmann <kraxel@redhat.com>
Date: Thu, 5 Dec 2019 13:06:10 +0100

Hi,

On 04.12.2019 13:52, Gerd Hoffmann wrote:

For example, in case
of GUEST_MEM the request could be followed by a buffer sg-list.


I'm not convinced guest_mem is that useful.  host_mem allows to give the
guest access to the buffers used by the hosts sound hardware, which is
probably what you need if the MSG transport can't handle the latency
requirements you have.


Actually, it might be pretty useful.

If a device is not capable of sharing host memory with a guest but still
capable of using guest shared memory, then there's a good use case for that:


Note that "hostmem" only means that the buffer is allocated by the host
(i.e. the device).  Whenever the allocation is backed by actual host
sound buffers or just normal ram depends on the device implementation
and maybe the host sound hardware capabilities.  Qemu would probably use
normal ram due to the extra indirection caused by the sound backend
interface.

I don't think there is a use case which guestmem can handle but hostmem
can not.


There may be a special security requirement like "a guest must not have any
access to a host memory" (especially in case of type 1 hv).

when a buffer is mapped into a user space application. It assumes, that the
driver is not involved in frame transfer at all (thus, it can not queue
buffers into a virtqueue and send notifications to the device).


The sound data can actually be in userspace mapped buffers even for the
message transport.  Adding a buffer to the virtqueue (which actually
stores pointer(s) to the buffer pages) effectively moves the application
position then.


It's how we implemented a PoC for the previous spec version.

But, yes, userspace must do a syscall for that which is not needed when
updating the position in a shared page.


Yes, that's the trickiest part.

Also, sharing memory implies sharing of two things: buffer itself and a page
containing position counter. And it goes in both directions (a host shares
buffer and hardware position or a guest shares buffer and application
position). And it seems, that overall solution can benefit if there will be
four independent entities. For example, message-based transport can have
some benefits, if a host will provide its hardware pointer in a shared page.

But if that
memory is shared with the device (as long as some piece of memory containing
an application position as well), then it's possible to implement quite
simple poller in the device. And it would be pretty aligned with common
software mixer workflow (like in PA or QEmu), where a mixer invokes client's
callbacks for reading/writing next piece of data. The device will need only
to check an application position and directly read from/write to a shared
buffer.


Note that it is possible to store things in the virtqueue without
notifying the device and expect the device poll the virtqueue instead.
Likewise for the other direction.  That allows operating the queue
without vmexits and might work more efficient with lots of small
buffers.  Of course device and driver must negotiate whenever they want
use polling or notifications.


Yes, and a shared memory only makes things simpler here.

But you described exactly "blockable" case: an I/O request is completed not
immediately but upon some condition (the buffer is full). In case of message-
based transport, both the device and the driver will have its own buffers.


Well, no.  The device doesn't need any buffers, it can use the buffers
submitted by the driver.  Typical workflow:

    (1) The driver puts a bunch of empty buffers into the rx (record/read)
        virtqueue (each being period_bytes in size).
    (2) The driver starts recording.
    (3) The device fills the first buffer with recorded sound data.
    (4) When the buffer is full the device returns it to the driver,
        takes the next from the virtqueue to continue recording.
    (5) The driver takes the filled buffer and does whatever it wants do
        with the data (typically pass on to the userspace app).
    (6) The driver submits a new empty buffer to the virtqueue to make
        sure the device doesn't run out of buffers.

So, it's not a "here is a buffer, fill it please", "here is the next,
..." ping pong game between driver and device.  There is a queue with
multiple buffers instead, and the device fills them one by one.


Then, should we make this pattern to be mandatory?


It's pretty standard for virtio.  The network receive queue works
fundamentally the same way for example, except that network buffers
actually might be half-filled only because you get network packets
where you don't know the size beforehand instead of a constant stream
of sound samples which you can easily pack into buffers as you like.


Except the end of the capture stream case. Can we use the len field in
virtq_used_elem for storing actual captured size then?

But, yes, it probably makes sense to explicitly say so in the specs.
If in doubt be more verbose ;)


I think it would be the best decision here. Such pattern may be a common thing
in virtio world, but people still might have their own interpretations.
Especially, if it's their first experience with implementing something.

And
for capturing these buffers might be filled at different speed. For example,
in order to improve latency, the device could complete requests immediately
and fill in buffers with whatever it has at the moment.


Latency obviously depends on period_bytes.  If the driver cares about
latency it should simply work with lots of small buffers instead of a
few big ones.


Well, smaller buffers would mean higher rate of hypercalls/traps but better
latency. And larger buffers would mean less amount of hypercalls/traps but
worse latency.


Note that it depends on the use case whenever long latencies are
actually worse.  Some use cases don't care much (music playback).  For
some use cases it is good enough to know the exact latency so you can
take it into account for synchronization (video playback).  Some
usecases are pretty much impossible with long latencies (games, voip).


By the way, there is a good point here. Maybe it makes sense for the device
somehow reports its current/additionally latency value? Like, as a part of a
response to the set_params request?

There's always a trade-off and an implementer might choose
whatever fits better.


Sure.  But you choose once, when configuring the stream, not for each
individual buffer, right?  So the driver will:

   (1) set-params (including period_bytes), taking into account what
       userspace asked for.
   (2) queue buffers for the stream, each being period_bytes in size.
   (3) start stream.


Originally, we introduced period_bytes for enabling period notification
feature (which is optional). At least, it was not supposed to be used for
slicing a buffer into pieces (although, the driver may chose to work in this
way).


--
Anton Yakovlev
Senior Software Engineer

OpenSynergy GmbH
Rotherstr. 20, 10245 Berlin

Phone: +49 30 60 98 54 0
E-Mail: anton.yakovlev@opensynergy.com

Follow-Ups:
- Re: [virtio-dev] [PATCH] snd: Add virtio sound device specification
  - From: Gerd Hoffmann <kraxel@redhat.com>

References:
- Re: [virtio-dev] [PATCH] snd: Add virtio sound device specification
  - From: Anton Yakovlev <anton.yakovlev@opensynergy.com>
- Re: [virtio-dev] [PATCH] snd: Add virtio sound device specification
  - From: Gerd Hoffmann <kraxel@redhat.com>
- Re: [virtio-dev] [PATCH] snd: Add virtio sound device specification
  - From: Anton Yakovlev <anton.yakovlev@opensynergy.com>
- Re: [virtio-dev] [PATCH] snd: Add virtio sound device specification
  - From: Gerd Hoffmann <kraxel@redhat.com>