virtio-dev message

Subject: Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
From: Alexandre Courbot <acourbot@chromium.org>
To: Alexander Gordeev <alexander.gordeev@opensynergy.com>
Date: Fri, 12 May 2023 13:09:16 +0900
On Thu, May 11, 2023 at 5:50âPM Alexander Gordeev
<alexander.gordeev@opensynergy.com> wrote:
>
> On 08.05.23 06:55, Alexandre Courbot wrote:
> > On Fri, May 5, 2023 at 8:55âPM Alexander Gordeev
> > <alexander.gordeev@opensynergy.com> wrote:
> >>
> >> On 03.05.23 16:04, Cornelia Huck wrote:
> >>> On Fri, Apr 28 2023, Alexander Gordeev <alexander.gordeev@opensynergy.com> wrote:
> >>>
> >>>> On 27.04.23 15:16, Alexandre Courbot wrote:
> >>>>> But in any case, that's irrelevant to the guest-host interface, and I
> >>>>> think a big part of the disagreement stems from the misconception that
> >>>>> V4L2 absolutely needs to be used on either the guest or the host,
> >>>>> which is absolutely not the case.
> >>>>
> >>>> I understand this, of course. I'm arguing, that it is harder to
> >>>> implement it, get it straight and then maintain it over years. Also it
> >>>> brings limitations, that sometimes can be workarounded in the virtio
> >>>> spec, but this always comes at a cost of decreased readability and
> >>>> increased complexity. Overall it looks clearly as a downgrade compared
> >>>> to virtio-video for our use-case. And I believe it would be the same for
> >>>> every developer, that has to actually implement the spec, not just do
> >>>> the pass through. So if we think of V4L2 UAPI pass through as a
> >>>> compatibility device (which I believe it is), then it is fine to have
> >>>> both and keep improving the virtio-video, including taking the best
> >>>> ideas from the V4L2 and overall using it as a reference to make writing
> >>>> the driver simpler.
> >>>
> >>> Let me jump in here and ask another question:
> >>>
> >>> Imagine that, some years in the future, somebody wants to add a virtio
> >>> device for handling video encoding/decoding to their hypervisor.
> >>>
> >>> Option 1: There are different devices to chose from. How is the person
> >>> implementing this supposed to pick a device? They might have a narrow
> >>> use case, where it is clear which of the devices is the one that needs to
> >>> be supported; but they also might have multiple, diverse use cases, and
> >>> end up needing to implement all of the devices.
> >>
> >> I think in this case virtio-v4l2 should be used as a compatibility
> >> device exclusively. This means discouraging increasing its complexity
> >> even more with more patches in the spec. virtio-video should eventually
> >> cover all the use-cases of V4L2, so I think it is reasonable to use it
> >> in both complex use-cases and in simple use-cases, where there is no
> >> decoder/encoder V4L2 device on the host.
> >>
> >>> Option 2: There is one device with various optional features. The person
> >>> implementing this can start off with a certain subset of features
> >>> depending on their expected use cases, and add to it later, if needed;
> >>> but the upfront complexity might be too high for specialized use cases.
> >
> > I don't see that many negociable features we can provide for a
> > decoder/encoder device - at least not many that are not considered
> > basic (like guest buffers). In terms of provided features for codecs
> > virtio-video and virtio-v4l2 are essentially equivalent.
>
> Actually I see a lot of potential in using the virtio feature flag
> negotiation for virtio-video:
>
> 1. We already have some feature flags related to memory management.
>
> 2. I think it would be great to take V4L2 controls negotiation and turn
> it into the feature flags negotiation. I really like this idea and I'd
> like to implement it. Not all the controls at once, of course. Still it
> would be very easy to port more given the existing process. They
> correspond well enough to each other, I think. This way we don't need to
> introduce something like the VIDIOC_QUERYCTRL/VIDIOC_QUERY_EXT_CTRL, we
> don't need two mechanisms for feature negotiations (like it would be
> with virtio-v4l2, right?), also all the features would be in one place.
> Then we can directly reference some enums from V4L2, like
> v4l2_mpeg_video_h264_profile or v4l2_mpeg_video_h264_level. That's what
> I call taking the best from V4L2.
>
> 3. We can have things, that V4L2 doesn't support in their stateful UAPI.
> For example, dequeuing output buffers in decoder order.

... provided the host can do that (i.e. has a stateless decoder
interface). What is the use-case for this btw?

>
> > The question is more: do we want a decoder/encoder specification and
> > another one for cameras or do we want something that covers video
> > devices in general. And is it desirable to reuse an already existing
> > protocol for communication between a non-privileged entity and a
> > privileged one (V4L2) or define our own?
>
> Please note, that virtio-video could be extended to support everything
> V4L2 does in the future. Including cameras if necessary. So it can cover
> video devices in general too.

That's a very, very bold claim, and if you do you will inevitably need
to add things that are not relevant to the codec case, which is your
complaint about V4L2. Not to mention the many new pages of spec that
this will require.

>
> > About the latter point, Alex BennÃe mentioned that it was difficult to
> > find the engineering time to define virtio-camera. And virtio-video
> > has also not been particularly fast to come together. virtio-v4l2
> > basically serves us both on a plate for a lower effort.
>
> If we talk only about codecs, the effort is lower only in case you have
> V4L2 codecs on the host. Otherwise the effort seems higher.

The effort is lower in terms of spec writing, and also lower if your
guest is Linux as you can support all video devices with a single
driver. That's a very large portion of the virtio users here.

> I also hope to be able to update virtio-video at a faster pace. Please
> let me try.

Please hold on a bit - there are two things here.

1) I'd like to settle the virtio-v4l2/virtio-video argument first to
make sure we don't get two things clashing head-on. As far as codecs
are concerned we certainly don't need both. Cornelia, I think we'll
need you to make a call on this, or at least tell us what you need to
make the call. If it helps I can send a draft of what the virtio-v4l2
spec would look like, it should be relatively short.

2) I (and other ChromeOS contributors) have been driving this spec so
far and while I think virtio-v4l2 is a better solution, I have not
said I would give up on virtio-video if virtio-v4l2 was not adopted
and will keep iterating on it in that case.

That said, your contributions are of course welcome if you have stuff
written down that you want to include. The current virtio-video spec
is available here:

https://github.com/Gnurou/virtio-video-spec

I'm writing it in Markdown (virtio-video.md) to avoid dealing with
LaTeX directly and use a pandoc filter to convert it before submission
- there is a Makefile that takes care of that. Feel free to send a
pull request with changes you have worked on, including your
Signed-off-by in the patches so I can carry it on into the v7 patch if
we go that route.

> This is understandable, that working on the specs takes time. That's why
> I'm all for making room for everybody to work. I think eventually
> virtio-video or virtio-video + virtio-camera can replace virtio-v4l2. I
> strongly believe this is a better solution for the long term.
>
> >> In this case I'd prefer to have the simpler device first, that is the
> >> current virtio-video, then to add features incrementally using feature
> >> flags and taking into account the virtualization context. V4L2 is a
> >> complex thing from a different context. They already tried to carve out
> >> some of the use-cases like stateful decoder/encoder API, but this work
> >> is not finished (struct v4l2_buffer can serve as an evidence). This is
> >> like dissecting a monolith. Also it has to be patched to make it more
> >> appropriate for virtualization (we can see this in Alexandre's PoC already).
> >>
> >>> Leaving concrete references to V4L2 out of the picture, we're currently
> >>> trying to decide whether our future will be more like Option 1 or Option
> >>> 2, with their respective trade-offs.
> >>
> >> I'd like to rely on opinions of people, who know more about virtio
> >> development and goals. I would be happy to present or reiterate my
> >> arguments to anyone interested if necessary.
> >>
> >>> I'm slightly biased towards Option 2; does it look feasible at all, or
> >>> am I missing something essential here? (I had the impression that some
> >>> previous confusion had been cleared up; apologies in advance if I'm
> >>> misrepresenting things.)
> >>
> >> Indeed some of the previous confusion has been cleared up. But not the
> >> key thing. Alexandre still claims, that this patched V4L2 UAPI pass
> >> through is only marginally more complex, for example. I don't agree with
> >> this and I have evidence. We haven't finished discussing this evidence.
> >
> > Are you talking about v4l2_buffer?
> >
> > https://www.kernel.org/doc/html/v4.9/media/uapi/v4l/buffer.html#struct-v4l2-buffer
> >
> > I think you implied that some of its fields were not relevant for
> > video decoding or encoding, which if you examine them is again
> > incorrect. That also answers your question of why the stateful decoder
> > spec did not mention the valid fields - because it is documented on
> > this page, which tells exactly which fields the driver/device are
> > expected to set for each queue.
>
> I'm talking about struct v4l2_buffer, yes, but not only. Also about
> struct v4l2_plane, enum v4l2_buf_type, the buffer flags, enum
> v4l2_memory (but this one is comparable to virtio-video), timecodes.
> For example, the way the fields in the struct v4l2_buffer and struct
> v4l2_plane are filled and interpreted depends a lot on the type. Here is
> the enum v4l2_buf_type:
>
> enum v4l2_buf_type {
>          V4L2_BUF_TYPE_VIDEO_CAPTURE        = 1,
>          V4L2_BUF_TYPE_VIDEO_OUTPUT         = 2,
>          V4L2_BUF_TYPE_VIDEO_OVERLAY        = 3,
>          V4L2_BUF_TYPE_VBI_CAPTURE          = 4,
>          V4L2_BUF_TYPE_VBI_OUTPUT           = 5,
>          V4L2_BUF_TYPE_SLICED_VBI_CAPTURE   = 6,
>          V4L2_BUF_TYPE_SLICED_VBI_OUTPUT    = 7,
>          V4L2_BUF_TYPE_VIDEO_OUTPUT_OVERLAY = 8,
>          V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE = 9,
>          V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE  = 10,
>          V4L2_BUF_TYPE_SDR_CAPTURE          = 11,
>          V4L2_BUF_TYPE_SDR_OUTPUT           = 12,
>          V4L2_BUF_TYPE_META_CAPTURE         = 13,
>          V4L2_BUF_TYPE_META_OUTPUT          = 14,
>          /* Deprecated, do not use */
>          V4L2_BUF_TYPE_PRIVATE              = 0x80,
> };
>
> Of these 14 cases we only need 2 for the codecs. Right?
> Also the flags. There are 22 of them. Are they all needed too? I don't
> think so. We only have like 5 in virtio-video at the moment.

Upon reading the V4L2 spec it is pretty clear which queue types are
valid for each device, and anything other than CAPTURE_MPLANE and
OUTPUT_MPLANE is unlikely to be used anyway.

We are really splitting hairs here and this looks like a wild goose
chase for software purity - even within virtio-video there are already
flags that only make sense for an encoder, and you cannot remove them
without defining new more specific structures and complicating things
overall. If you extend virtio-video to support more use-cases, you
will end up with more of these as well. So seriously, why is this such
a big deal when the instructions on how to use these structures for
each use case are at the other end of a link click?

>
> You posted code for filling v4l2_buffer in one of your previous emails.
> What I'm trying to say is that a person, who doesn't know this in
> advance, will have a hard time writing this same code if they only have
> the virtio-v4l2 spec.

Well yes, the counterpart of virtio-v4l2 being shorter is that you
need to refer to V4L2 as well - that's actually the point, to reduce
the burden on virtio by reusing a spec that already exists and
referring to it.

Earlier in this email you mentioned reusing structs like
v4l2_mpeg_video_h264_profile in virtio-video, that creates the same
dependency to the V4L2 spec. The difference is how much we are taking
from it.
Follow-Ups:
- Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  - From: Alexander Gordeev <alexander.gordeev@opensynergy.com>
- Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  - From: Alexander Gordeev <alexander.gordeev@opensynergy.com>
References:
- Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  - From: Cornelia Huck <cohuck@redhat.com>
- Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  - From: Alexander Gordeev <alexander.gordeev@opensynergy.com>
- Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  - From: Alexandre Courbot <acourbot@chromium.org>
- Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  - From: Alexander Gordeev <alexander.gordeev@opensynergy.com>