OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-dev message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: Re: [virtio-dev] [PATCH] snd: Add virtio sound device specification

On 12.11.2019 16:16, Liam Girdwood wrote:
On Tue, 2019-11-12 at 13:45 +0100, Anton Yakovlev wrote:
On 11.11.2019 16:20, Liam Girdwood wrote:
On Tue, 2019-10-29 at 10:42 +0100, Anton Yakovlev wrote:
Hi Liam,

Thank you for comments! Please, take a look at our answers below.

Apologies, been travelling. More comments below.

On 28.10.2019 17:05, Liam Girdwood wrote:
On Tue, 2019-09-24 at 15:43 +0100, Mikhail Golubev wrote:
From: Anton Yakovlev <Anton.Yakovlev@opensynergy.com>


diff --git a/virtio-sound.tex b/virtio-sound.tex
new file mode 100644
index 0000000..68a606e
--- /dev/null
+++ b/virtio-sound.tex
@@ -0,0 +1,405 @@
+\section{Sound Device}\label{sec:Device Types / Sound
+The virtio sound card is a virtual audio device supporting
and input PCM
+streams. All device control requests are placed into the
virtqueue, all
+I/O requests are placed into the PCM virtqueue.

Where are stream position updates placed ? These are usually
synchronous from the driver and enable the player application
or read data from the correct locations in the buffers. This is
sensitive data.

It happens in an interrupt handler. Since PCM frame transfer is
message based,
the only possible way to figure out consumed or captured amount
frames is
to take a look into a message response.

Ok, this is what I thought. This wont work for low latency audio
cases since the buffers and position responses are queued (voice
some gaming, some system notifications like keypress or volume

But it's not different from typical HW. A driver receives an
interrupt once
DMA transaction is finished (i.e. some part of data was copied
to/from host).
And hardware pointer value usually is extrapolated based on some
device state.

Modern HW can usually provide the HW position to the nearest word. Some
older/bad HW provides to the nearest period though and this is when we
have to estimate.

Native (non virtual) audio device drivers typically have no latency
reading the stream position and it's read at the same time as the
system clock. Reading the same clocks together is important as it means
the relationship between the two is deterministic and not subject to
any jitter (which can be introduced by reading stream position over

A virtio driver could do something similar: it could either
return "real" pointer value (i.e. what it internally uses) or could
show fake
"extrapolated" value (like frame_rate * elappsed_time). Although,
it's a
different story.

Virtio driver should return real stream position alongside real common
system time (of when stream position was read).

Do you propose to share stream position via shared memory? Or will it be read with special request?

However, it's probably tolerable for most other use cases. What use
cases have you tried in your testing ?

All typical cases: multimedia (video + audio or audio only), network
streaming, video calls, system events, volume changing and so on.

By the way, could you define the "low latency" term that we are
talking about?
Just to be on the same wave.

Low latency audio is when any application or processing can render data
very close to the HW stream R/W position (typically low or sub
millisecond). This needs zero copy buffering to HW on host/VM/user side
and also timely and accurate stream positions.


Sound Device / Device configuration layout}
+/* supported PCM sample formats */
+enum {

24 bit.

These kinds of sample (18/20/24-bit) were discussed here earlier and
some concerns (regarding supporting in a guest, testing and so on),
so they
were removed for simplicity.

  From other side, if we are up to add these, maybe it's better not to
format names and use some kind of generic description instead?
Something like

struct {
    kind; /* S/U/F */

It allows to describe formats like 24 and 24_3 in unified way and to
future formats without changing the protocol. What do you think?

I would recommend copying the exact formats used by ALSA, these formats
are industry standard (although naming will be different on other
I mean, there are two options:

1. We can go with defining format names. But then we will need to update the protocol and (perhaps) existing drivers. 2. We can develop some generic way to report supported formats (as well, as rates). And it will not require to touch the spec on every update.

+/* supported PCM frame rates */
+enum {
+    VIRTIO_SND_PCM_RATE_8000 = 0,
+    VIRTIO_SND_PCM_RATE_176400,

It may be best to use the same rates and format from the ALSA
this covers any holes (like rates that are not in the above

These rates are from ALSA definitions. And what formats should be

Rate, Knot - I would copy ALSA supported rates and formats just
with the channel maps.

See a comment above.

See comment above :)

+/* a device configuration space */
+struct virtio_snd_config {
+    struct virtio_pcm_config {
+        struct virtio_pcm_stream_config output;
+        struct virtio_pcm_stream_config input;
+    } pcm;

I assuming this is sent to configure PCM capabilities ?

Yes, minimum amount of information required to setup a PCM stream
any kind
of guest.

What about buffer and period size capabilities ?

In this specification we assume that buffer size is up to guest
Also, period size is ALSA-only conception and should not be
here at
all (we are not stick only to ALSA).

Period size == fragment size == block size (probably a few other
conventions too), we should be able to tell the device VM our
period/block/fragment size so we can configure the HW
This can also be used to make our stream copying more efficient

Then let's talk about playback. As I can understand, you are
describing a
case, when
1. ALSA application sets some period size
2. fills the whole buffer with non-silence frames
3. upon every period notification, provides next period-sized part of

It's one of use cases, but quite rare.

Ehm, it's bog standard file R/W (non mmap()) audio and extremely common
on Linux and derivatives.

Step 4 is

4) Unblock and wake userspace R/W for next period/block/fragment.

More often (especially if we are
talking about latency control and low latency audio) things will be
1. an application fills the whole buffer with silence
2. suppresses period notifications
3. starts stream and almost immediatly seeks to near hw position
4. puts non-silence frames there (size is set according to desired
5. provides new frames at some interval (usually in 5-10ms)

So this describes mmap() audio where we block by poll()ing, read HW
ptr, and the write new frames.

This proposal needs to support both operating modes and there is
nothing I can see with virtio and some minor changes to this proposal
that would prevent that ?

Yes, nothing prevents. I explained, why it was not here from the start.

It's how software mixers usually work: PulseAudio, Windows in-kernel
new Android system mixer and so on. Also, you can find such approach
in low
latency player applications (when they work directly with device).

The point is they do not rely on notifications from the driver and
use their
own timers. The only thing they querying is current hw position.

The point is we could have period notifications from the device. But
it seems
they will be useful only in minority of cases. Also, for example,
audio driver has no such concept at all.

Windows does not need to use periodic mode, but a lot of Linux
userspace will use it and this will currently will break those

It does not break anything. Since the driver must notify on every elapsed period in any case. And exactly that happens in an interrupt handler.

Since xrun conditions are higher level conceptions, we decided to
such issues to guest application itself. It helps to make overall
simpler. And it seems there's not so much we can do if xrun
happened on the device side.

We can inform the guest. The guest userspace can then take any
action if needed.

Then, we need the third queue for notifications.

Why, this should go in the queue that's used for stream position ?

Then, the I/O queue will multiplex already three things: read requests, write requests and notifications. The question is how rational is it.

2) Zero copy of audio data and stream positions. Essential for
latency audio use cases (if supported by HW, hypervisor, VM

What do you mean by zero-copy? Shared memory between device and
driver sides?

Yes, but I see this is now a separate thread as it used by others

Shared memory based approach was proposed in the very first version
of the
spec (and we are rooting for this). Then there was discussion and it
postponed for some future virtio device. One of the reason - it's not
for arch with non-coherent memory between host and guest.

I'm rooting for this too. I do think we need to be a bit more flexible
so we can deal with non coherent architectures via a SW virtio copy and
support coherent architectures via zero copy.

We should be good as long as we can leave some configuration
space/types in the stream config to allow mapping of operating modes
(zero copy, periodic, mmap etc). The device driver should also send
it's capabilities during init so that guests will know what use cases
and modes are supported.

Yes, we can use device feature bits for this.

Btw, please don't take my comments as trying to block your proposal. I
am behind this work, I'll likely be one of the first users. My
intention is to help provide something that is both flexible enough to
cater for all current OS users/use cases on HDA like hardware and has
scope for future enhancements.

As we are! :)



To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org

Anton Yakovlev
Senior Software Engineer

OpenSynergy GmbH
Rotherstr. 20, 10245 Berlin

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]