virtio-dev message

Subject: Re: [virtio-dev] [PATCH] snd: Add virtio sound device specification
From: Liam Girdwood <liam.r.girdwood@linux.intel.com>
To: Anton Yakovlev <anton.yakovlev@opensynergy.com>, Mikhail Golubev <Mikhail.Golubev@opensynergy.com>, "virtio-dev@lists.oasis-open.org" <virtio-dev@lists.oasis-open.org>
Date: Tue, 12 Nov 2019 15:16:28 +0000
On Tue, 2019-11-12 at 13:45 +0100, Anton Yakovlev wrote:
> On 11.11.2019 16:20, Liam Girdwood wrote:
> > On Tue, 2019-10-29 at 10:42 +0100, Anton Yakovlev wrote:
> > > Hi Liam,
> > > 
> > > Thank you for comments! Please, take a look at our answers below.
> > 
> > Apologies, been travelling. More comments below.
> > 
> > > 
> > > 
> > > On 28.10.2019 17:05, Liam Girdwood wrote:
> > > > On Tue, 2019-09-24 at 15:43 +0100, Mikhail Golubev wrote:
> > > > > From: Anton Yakovlev <Anton.Yakovlev@opensynergy.com>
> > > > > 
> > > > > 

snip

> > > > > diff --git a/virtio-sound.tex b/virtio-sound.tex
> > > > > new file mode 100644
> > > > > index 0000000..68a606e
> > > > > --- /dev/null
> > > > > +++ b/virtio-sound.tex
> > > > > @@ -0,0 +1,405 @@
> > > > > +\section{Sound Device}\label{sec:Device Types / Sound
> > > > > Device}
> > > > > +
> > > > > +The virtio sound card is a virtual audio device supporting
> > > > > output
> > > > > and input PCM
> > > > > +streams. All device control requests are placed into the
> > > > > control
> > > > > virtqueue, all
> > > > > +I/O requests are placed into the PCM virtqueue.
> > > > 
> > > > Where are stream position updates placed ? These are usually
> > > > synchronous from the driver and enable the player application
> > > > to
> > > > render
> > > > or read data from the correct locations in the buffers. This is
> > > > latency
> > > > sensitive data.
> > > 
> > > It happens in an interrupt handler. Since PCM frame transfer is
> > > message based,
> > > the only possible way to figure out consumed or captured amount
> > > of
> > > frames is
> > > to take a look into a message response.
> > > 
> > 
> > Ok, this is what I thought. This wont work for low latency audio
> > use
> > cases since the buffers and position responses are queued (voice
> > calls,
> > some gaming, some system notifications like keypress or volume
> > change).
> 
> But it's not different from typical HW. A driver receives an
> interrupt once 
> DMA transaction is finished (i.e. some part of data was copied
> to/from host). 
> And hardware pointer value usually is extrapolated based on some
> internal 
> device state. 

Modern HW can usually provide the HW position to the nearest word. Some
older/bad HW provides to the nearest period though and this is when we
have to estimate.

Native (non virtual) audio device drivers typically have no latency
reading the stream position and it's read at the same time as the
system clock. Reading the same clocks together is important as it means
the relationship between the two is deterministic and not subject to
any jitter (which can be introduced by reading stream position over
virtio).  

> A virtio driver could do something similar: it could either 
> return "real" pointer value (i.e. what it internally uses) or could
> show fake 
> "extrapolated" value (like frame_rate * elappsed_time). Although,
> it's a 
> different story.
> 
> 

Virtio driver should return real stream position alongside real common
system time (of when stream position was read). 

> > However, it's probably tolerable for most other use cases. What use
> > cases have you tried in your testing ?
> 
> All typical cases: multimedia (video + audio or audio only), network 
> streaming, video calls, system events, volume changing and so on.
> 
> By the way, could you define the "low latency" term that we are
> talking about? 
> Just to be on the same wave.
> 

Low latency audio is when any application or processing can render data
very close to the HW stream R/W position (typically low or sub
millisecond). This needs zero copy buffering to HW on host/VM/user side
and also timely and accurate stream positions. 

> es
> > > > > /
> > > > > Sound Device / Device configuration layout}
> > > > > +
> > > > > +\begin{lstlisting}
> > > > > +/* supported PCM sample formats */
> > > > > +enum {
> > > > > +    VIRTIO_SND_PCM_FMT_S8 = 0,
> > > > > +    VIRTIO_SND_PCM_FMT_U8,
> > > > > +    VIRTIO_SND_PCM_FMT_S16,
> > > > > +    VIRTIO_SND_PCM_FMT_U16,
> > 
> > 24 bit.
> 
> These kinds of sample (18/20/24-bit) were discussed here earlier and
> caused 
> some concerns (regarding supporting in a guest, testing and so on),
> so they 
> were removed for simplicity.
> 
>  From other side, if we are up to add these, maybe it's better not to
> hardcode 
> format names and use some kind of generic description instead?
> Something like
> 
> struct {
>    kind; /* S/U/F */
>    significant_bits;
>    storage_bits;
> }
> 
> It allows to describe formats like 24 and 24_3 in unified way and to
> add 
> future formats without changing the protocol. What do you think?
> 

I would recommend copying the exact formats used by ALSA, these formats
are industry standard (although naming will be different on other
OSes).

> 
> > > > > +    VIRTIO_SND_PCM_FMT_S32,
> > > > > +    VIRTIO_SND_PCM_FMT_U32,
> > > > > +    VIRTIO_SND_PCM_FMT_FLOAT,
> > > > > +    VIRTIO_SND_PCM_FMT_FLOAT64
> > > > > +};
> > > > > +
> > > > > +/* supported PCM frame rates */
> > > > > +enum {
> > > > > +    VIRTIO_SND_PCM_RATE_8000 = 0,
> > > > > +    VIRTIO_SND_PCM_RATE_11025,
> > > > > +    VIRTIO_SND_PCM_RATE_16000,
> > > > > +    VIRTIO_SND_PCM_RATE_22050,
> > > > > +    VIRTIO_SND_PCM_RATE_32000,
> > > > > +    VIRTIO_SND_PCM_RATE_44100,
> > > > > +    VIRTIO_SND_PCM_RATE_48000,
> > > > > +    VIRTIO_SND_PCM_RATE_64000,
> > > > > +    VIRTIO_SND_PCM_RATE_88200,
> > > > > +    VIRTIO_SND_PCM_RATE_96000,
> > > > > +    VIRTIO_SND_PCM_RATE_176400,
> > > > > +    VIRTIO_SND_PCM_RATE_192000
> > > > 
> > > > It may be best to use the same rates and format from the ALSA
> > > > spec,
> > > > this covers any holes (like rates that are not in the above
> > > > list).
> > > 
> > > These rates are from ALSA definitions. And what formats should be
> > > added?
> > > 
> > 
> > Rate, Knot - I would copy ALSA supported rates and formats just
> > like
> > with the channel maps.
> 
> See a comment above.
> 

See comment above :)
> 
> > > > 
> > > > > +
> > > > > +/* a device configuration space */
> > > > > +struct virtio_snd_config {
> > > > > +    struct virtio_pcm_config {
> > > > > +        struct virtio_pcm_stream_config output;
> > > > > +        struct virtio_pcm_stream_config input;
> > > > > +    } pcm;
> > > > > +};
> > > > > +\end{lstlisting}
> > > > 
> > > > I assuming this is sent to configure PCM capabilities ?
> > > 
> > > Yes, minimum amount of information required to setup a PCM stream
> > > in
> > > any kind
> > > of guest.
> > > 
> > > 
> > > > What about buffer and period size capabilities ?
> > > 
> > > In this specification we assume that buffer size is up to guest
> > > decision.
> > > Also, period size is ALSA-only conception and should not be
> > > mentioned
> > > here at
> > > all (we are not stick only to ALSA).
> > > 
> > > 
> > 
> > Period size == fragment size == block size (probably a few other
> > naming
> > conventions too), we should be able to tell the device VM our
> > period/block/fragment size so we can configure the HW
> > appropriately.
> > This can also be used to make our stream copying more efficient
> > too.
> 
> Then let's talk about playback. As I can understand, you are
> describing a 
> case, when
> 1. ALSA application sets some period size
> 2. fills the whole buffer with non-silence frames
> 3. upon every period notification, provides next period-sized part of
> data
> 
> It's one of use cases, but quite rare. 

Ehm, it's bog standard file R/W (non mmap()) audio and extremely common
on Linux and derivatives.

Step 4 is

4) Unblock and wake userspace R/W for next period/block/fragment.

> More often (especially if we are 
> talking about latency control and low latency audio) things will be
> 1. an application fills the whole buffer with silence
> 2. suppresses period notifications
> 3. starts stream and almost immediatly seeks to near hw position
> 4. puts non-silence frames there (size is set according to desired
> latency)
> 5. provides new frames at some interval (usually in 5-10ms)

So this describes mmap() audio where we block by poll()ing, read HW
ptr, and the write new frames. 

This proposal needs to support both operating modes and there is
nothing I can see with virtio and some minor changes to this proposal
that would prevent that ?

> 
> It's how software mixers usually work: PulseAudio, Windows in-kernel
> mixer, 
> new Android system mixer and so on. Also, you can find such approach
> in low 
> latency player applications (when they work directly with device).
> 
> The point is they do not rely on notifications from the driver and
> use their 
> own timers. The only thing they querying is current hw position.
> 
> The point is we could have period notifications from the device. But
> it seems 
> they will be useful only in minority of cases. Also, for example,
> Windows 
> audio driver has no such concept at all.

Windows does not need to use periodic mode, but a lot of Linux
userspace will use it and this will currently will break those
applications. 

> > > 
> > > Since xrun conditions are higher level conceptions, we decided to
> > > delegate
> > > such issues to guest application itself. It helps to make overall
> > > design
> > > simpler. And it seems there's not so much we can do if xrun
> > > condition
> > > is
> > > happened on the device side.
> > 
> > We can inform the guest. The guest userspace can then take any
> > remedial
> > action if needed.
> 
> Then, we need the third queue for notifications.

Why, this should go in the queue that's used for stream position ?

> 
> 
> > > 
> > > 
> > > > 2) Zero copy of audio data and stream positions. Essential for
> > > > any
> > > > low
> > > > latency audio use cases (if supported by HW, hypervisor, VM
> > > > framework).
> > > 
> > > What do you mean by zero-copy? Shared memory between device and
> > > driver sides?
> > 
> > Yes, but I see this is now a separate thread as it used by others
> > too.
> 
> Shared memory based approach was proposed in the very first version
> of the 
> spec (and we are rooting for this). Then there was discussion and it
> was 
> postponed for some future virtio device. One of the reason - it's not
> suitable 
> for arch with non-coherent memory between host and guest.

I'm rooting for this too. I do think we need to be a bit more flexible
so we can deal with non coherent architectures via a SW virtio copy and
support coherent architectures via zero copy. 

We should be good as long as we can leave some configuration
space/types in the stream config to allow mapping of operating modes
(zero copy, periodic, mmap etc). The device driver should also send
it's capabilities during init so that guests will know what use cases
and modes are supported.

Btw, please don't take my comments as trying to block your proposal. I
am behind this work, I'll likely be one of the first users. My
intention is to help provide something that is both flexible enough to
cater for all current OS users/use cases on HDA like hardware and has
scope for future enhancements.

Thanks

Liam
Follow-Ups:
- Re: [virtio-dev] [PATCH] snd: Add virtio sound device specification
  - From: Anton Yakovlev <anton.yakovlev@opensynergy.com>
References:
- Re: [virtio-dev] [PATCH] snd: Add virtio sound device specification
  - From: Liam Girdwood <liam.r.girdwood@linux.intel.com>
- Re: [virtio-dev] [PATCH] snd: Add virtio sound device specification
  - From: Anton Yakovlev <anton.yakovlev@opensynergy.com>