virtio-dev message

Subject: Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
From: Alexandre Courbot <acourbot@chromium.org>
To: Alexander Gordeev <alexander.gordeev@opensynergy.com>
Date: Thu, 27 Apr 2023 22:23:52 +0900
On Thu, Apr 27, 2023 at 12:52âAM Alexander Gordeev
<alexander.gordeev@opensynergy.com> wrote:
>
> On 21.04.23 06:02, Alexandre Courbot wrote:
> > On Wed, Apr 19, 2023 at 4:39âPM Alexander Gordeev
> > <alexander.gordeev@opensynergy.com> wrote:
> >>
> >> On 17.04.23 16:43, Cornelia Huck wrote:
> >>> On Mon, Apr 17 2023, Alexander Gordeev <alexander.gordeev@opensynergy.com> wrote:
> >>>
> >>>> OpenSynergy, the company that I work for, develops a proprietary
> >>>> hypervisor called COQOS mainly for automotive and aerospace domains. We
> >>>> have our proprietary device implementations, but overall our goal is to
> >>>> bring open standards into these quite closed domains and we're betting
> >>>> big on virtio. The idea is to run safety-critical functions like cockpit
> >>>> controller alongside with multimedia stuff in different VMs on the same
> >>>> physical board. Right now they have it on separate physical devices. So
> >>>> they already have maximum isolation. And we're trying to make this
> >>>> equally safe on a single board. The benefit is the reduced costs and
> >>>> some additional features. Of course, we also need features here, but at
> >>>> the same time security and ease of certification are among the top of
> >>>> our priorities. Nobody wants cars or planes to have security problems,
> >>>> right? Also nobody really needs DVB and even more exotic devices in cars
> >>>> and planes AFAIK.
> >>>>
> >>>> For the above mentioned reasons our COQOS hypervisor is running on bare
> >>>> metal. Also memory management for the guests is mostly static. It is
> >>>> possible to make a shared memory region between a device and a driver
> >>>> managed by device in advance. But definitely no mapping of random host
> >>>> pages on the fly is supported.
> >>>>
> >>>> AFAIU crosvm is about making Chrome OS more secure by putting every app
> >>>> in its own virtualized environment, right? Both the host and guest are
> >>>> linux. In this case I totally understand why V4L2 UAPI pass-through
> >>>> feels like a right move. I guess, you'd like to make the switch to
> >>>> virtualized apps as seemless as possible for your users. If they can't
> >>>> use their DVBs anymore, they complain. And adding the virtualization
> >>>> makes the whole thing more secure anyway. So I understand the desire to
> >>>> have the range of supported devices as broad as possible. It is also
> >>>> understandable that priorities are different with desktop
> >>>> virtualization. Also I'm not trying to diminish the great work, that you
> >>>> have done. It is just that from my perspective this looks like a step in
> >>>> the wrong direction because of the mentioned concerns. So I'm going to
> >>>> continue being a skeptic here, sorry.
> >>>>
> >>>> Of course, I don't expect that you continue working on the old approach
> >>>> now as you have put that many efforts into the V4L2 UAPI pass-through.
> >>>> So I think it is best to do the evolutionary changes in scope of virtio
> >>>> video device specification, and create a new device specification
> >>>> (virtio-v4l2 ?) for the revolutionary changes. Then I'd be glad to
> >>>> continue the virtio-video development. In fact I already started making
> >>>> draft v7 of the spec according to the comments. I hope it will be ready
> >>>> for review soon.
> >>>>
> >>>> I hope this approach will also help fix issues with virtio-video spec
> >>>> and driver development misalignment as well as V4L2 compliance issues
> >>>> with the driver. I believe the problems were caused partly by poor
> >>>> communication between us and by misalignment of our development cycles,
> >>>> not by the driver complexity.
> >>>>
> >>>> So in my opinion it is OK to have different specs with overlapping
> >>>> functionality for some time. My only concern is if this would be
> >>>> accepted by the community and the committee. How the things usually go
> >>>> here: preferring features and tolerating possible security issues or the
> >>>> other way around? Also how acceptable is having linux-specific protocols
> >>>> at all?
> >>> My main question is: What would be something that we can merge as a
> >>> spec, that would either cover the different use cases already, or that
> >>> could be easily extended to cover the use cases it does not handle
> >>> initially?
> >>>
> >>> For example, can some of the features that would be useful in crosvm be
> >>> tucked behind some feature bit(s), so that the more restricted COQOS
> >>> hypervisor would simply not offer them? (Two feature bits covering two
> >>> different mechanisms, like the current approach and the v4l2 approach,
> >>> would also be good, as long as there's enough common ground between the
> >>> two.)
> >>>
> >>> If a staged approach (adding features controled by feature bits) would
> >>> be possible, that would be my preferred way to do it.
> >>
> >> Hmm, I see several ways how we can use the feature flags:
> >> 1. Basically making two feature flags: one for the current video spec
> >> and one for the V4L2 UAPI pass through. Kind of the same as having two
> >> different specs, but within one device. Not sure which way is better.
> >> Probably having two separate devices would be easier to review and merge.
> >
> > Having two different devices with their own IDs would indeed be less
> > confusing than using feature bits.
> >
> > That being said, the whole point of proposing virtio-v4l2 is to end up
> > with *less* specification, not more. Having two concurrent and largely
> > overlapping approaches will result in fragmentation and duplicated
> > work, so my suggestion would be to decide on one or the other and
> > stick to it.
>
> Hmm, Enrico pointed out, that having virtio-v4l2 would also be good
> because of much better compatibility with Android right now. I don't
> think the specification length should be our ultimate goal. Cornelia
> said, that her ultimate goal is to have a spec everyone is happy with,
> regardless on how we arrive there. Well, I can only say, that I also
> think this should be our goal.

Try to put yourself into the shoes of someone who needs to write a new
video device using virtio. Oh, there are two ways to do it. This guest
OS only supports virtio-video. But this guest OS only supports
virtio-v4l2. Great, now you need to support two interfaces with your
device, or write two different devices.

>
> >> 2. Finding a subset of V4L2, that closely matches the current draft, and
> >> restrict everything else. A perfectly reasoned answer for this case will
> >> require a lot of work going through all the V4L2 structures I think. And
> >
> > It's pretty trivial, on the contrary. For a decoder device one would
> > just need to restrict to the structs and operations mentioned here:
> > https://www.kernel.org/doc/html/v5.18/userspace-api/media/v4l/dev-decoder.html
>
> I don't agree. But we already talk about this in other threads.
>
> >> even if we have a concrete plan, it has to be implemented first. I doubt
> >> it is possible. Based on the things, that I already know, this is going
> >> to be a compromise on security anyway, so we're not happy about that.
> >> More on that below.
> >> 3. Stop trying to simply pass the V4L2 UAPI through and focus on making
> >> the virtio video spec as close to the V4L2 UAPI as possible, but with
> >> the appropriate security model. So that the video device can be extended
> >> with a feature flag to something very close to full V4L2 UAPI. A lot of
> >> work as well, I think. And this won't allow us to simply link the V4L2
> >> UAPI in the spec and therefore reduce its size, which is Alexandre's
> >> current goal. So Alexandre and his team are not happy this way probably.
> >
> > That would indeed be reinventing the wheel and a completely pointless
> > exercise IMHO.
>
> I agree.
>
> >>   From the security point of view these are our goals from most to less
> >> important AFAIU:
> >> 1. Make the device secure. If a device is compromised, the whole
> >> physical machine is at risk. Complexity is the enemy here. It helps a
> >> lot to make the device as straightforward and easy to implement as
> >> possible. Therefore it is good to make the spec device/function-centric
> >> from this PoV.
> >
> > This seems to be the main point of contention we have right here. I do
> > not believe that V4L2 introduces significant complexity to the video
> > use-case, and certainly not to a degree that would justify writing a
> > new spec just for that.
>
> Ok, then we should try to agree on a benchmark, I think.
>
> >> 2. Ensure, that drivers are also secure at least from user-space side.
> >> Maybe from device side too.
> >
> > FWIW V4L2 has been in use for a looong time (including in Chromebooks
> > that do secure playback) and I am not aware of fundamental security
> > issues with it.
>
> Already being discussed separately...
>
> >> 3. Implementing secure playback and making sure media doesn't leak. For
> >> this case it is nice to have these object UUIDs as buffers.
> >>
> >> Please correct me if there's something wrong here.
> >>
> >> When we start looking from this perspective even things like naming
> >> video buffer queues "output" for device input and "capture" for device
> >> output are problematic. In my experience this naming scheme takes some
> >> time and for sure several coding mistakes to get used to. Unfortunately
> >> this can't be turned off with some feature flags. In contrast virtio
> >> video v6 names these queues "input" and "output". This is perfectly fine
> >> if we look from the device side. It is understandable, that Alexandre's
> >> list of differences between V4L2 UAPI and the current state of virtio
> >> video doesn't include these things. But we have to count them, I think.
> >> That's why it takes me so long to make the list. :) So throwing away
> >> this simplicity is still going to be a compromise from the security
> >> perspective, that we're not happy about.
> >>
> >> This is mostly because V4L2 UAPI brings a hard dependency on V4L2, its
> >> security model, legacy, use-cases, developers, etc. It can be changed
> >> over time, but this is a long process because this means changing the
> >> Linux UAPI. Also this means, that nothing can be removed from it, only
> >> added (if the V4L2 community agrees). For example, we can't simply add a
> >> new way of sharing buffers for potential new use-cases once we switch to
> >> V4L2 UAPI AFAIU.
> >
> > You certainly can, it would just need to be in the virtio
> > specification. And again that's a very hypothetical case.
>
> Ok, maybe it is hypothetical.  We won't know until it happens. I still
> don't like it. Overriding things in the virtio specification is not
> particularly nice. It is much easier to understand when things are kept
> together, not as patches.
>
> >> The V4L2 community can simply reject changes because
> >> this is a UAPI after all. We kind of have a weak dependency already,
> >> because the only driver implementation is based on V4L2 and we'd like to
> >> keep the spec as close to V4L2 as possible, but it is not the same
> >> thing. So at the moment it looks like the V4L2 UAPI proposal is not
> >> super flexible. Alexandre said, that we can simply not implement some of
> >> the ioctls. Well, this definitely doesn't cover all the complexity like
> >> the structures and other subtle details.
> >
> > Mmm? Where did I say that? That sounds like a misunderstanding.
>
> Here is the quote from your email on March 17:
>
> > V4L2 has a larger UAPI surface because it manages more kinds of
> > devices, but drivers only need to implement the ioctls they need. For
> > the rest, they just return -ENOTTY, and evil actors are hopefully kept
> > at bay.
>
>
> >> Also adding the feature flags would probably defeat the stated purpose
> >> of switching to V4L2 UAPI anyway: the simplicity of the spec and of the
> >> V4L2 driver.
> >>
> >> So I have a lot of doubts about the feasibility of adding feature flags.
> >> If Alexandre and his team want the V4L2 UAPI as is, then looks like it
> >> is best to simply have two specs:
> >> 1. virtio-video for those interested in building from ground up with the
> >> security model appropriate for virtualization in mind. This is going to
> >> take time and not going to reach feature parity with V4L2 ever I think.
> >> I mean some old devices might never get support this way.
> >> 2. virtio-v4l2 as a shortcut for those interested in having feature
> >> parity with V4L2 fast. Like a compatibility layer. Probably this is
> >> going to be used in linux host + linux guest use-cases only. Maybe it
> >> gets obsoleted by the first spec in several years for most modern use-cases.
> >>
> >> Or maybe have these two cases within a single device spec as I wrote above.
> >>
> >> This makes a lot of sense to me. If V4L2 UAPI pass through is in fact
> >> only needed for compatibility, then this way we can avoid a lot of work
> >> going through all of the V4L2 and trying to find different subsets or
> >> trying to construct something, that is close to V4L2 UAPI, but doesn't
> >> compromise on the security. I'm not really interested in doing all this
> >> work because we're already more or less satisfied with the current
> >> state. We don't need feature parity with V4L2. On the other hand for
> >> Alexandre the feature-parity with V4L2 is clearly of higher priority,
> >> than all these subtle security model differences. In my opinion it also
> >> doesn't make sense to invest that much time in something, that looks
> >> like a compatibility layer. So it seems both of us are interested in
> >> avoiding all this extra work. Then I'd just prefer to have two different
> >> specs so that everyone can work according to their priorities.
> >>
> >>
> >>> Regarding the protocol: I think Linux-originating protocols (that can be
> >>> implemented on non-Linux setups) are fine, Linux-only protocols probably
> >>> not so much.
> >>
> >> Thanks for the information. Well, it looks like the V4L2 UAPI could be
> >> implemented on any platform unless it needs a completely new way of
> >> memory management since all the V4L2_MEMORY_* constants are going to be
> >> used already AFAIU.
> >>
> >>
> >>>>>>       a. So V4L2 subsystem and the current virtio-video driver are already
> >>>>>> reducing the complexity. And this seems as the right place to do this,
> >>>>>> because the complexity is caused by the amount of V4L2 use cases and its
> >>>>>> legacy. If somebody wants to use virtio-video in a Windows guest, they
> >>>>>> would prefer a simpler API, right? I think this use-case is not purely
> >>>>>> abstract at all.
> >>>>> The V4L2 subsystem is there to factorize code that can be shared
> >>>>> between drivers and manage their internal state. Our target is the
> >>>>> V4L2 UAPI, so a Windows driver needs not be concerned about these
> >>>>> details - it does what it would have done with virtio-video, and just
> >>>>> uses the V4L2 structures to communicate with the host instead of the
> >>>>> virtio-video ones.
> >>>> It can also reuse the virtio-video structures. So I think despite the
> >>>> ability to reuse V4L2 structures, having to implement a linux-specific
> >>>> interface would still be a bigger pain.
> >>> Hm. Do the v4l2 structures drag in too many adjacent things that need to
> >>> be implemented? Can we match the video-video structures from the current
> >>> proposal with some v4l2 structures and extract a common wrapper for
> >>> those that match, with a feature-bit controlled backend? It would be
> >>> fine if any of those backends supported a slightly different subset of
> >>> the common parts, as long as the parts implemented by both would be
> >>> enough to implement a working device. (Mostly thinking out loud here.)
> >>
> >> I don't think this is realistic unfortunately. On per ioctl level it is
> >> possible to disable some functionality probably, but the V4L2 structures
> >> are set in stone. We can only extend them.
> >>
> >>
> >>>>> The guest driver that I wrote is, I think, a good example of the
> >>>>> complexity you can expect in terms of guest driver size (as it is
> >>>>> pretty functional already with its 1000 and some LoCs). For the UAPI
> >>>>> complexity, the host device basically unpacks the information it needs
> >>>>> and rebuilds the V4L2 structures before calling into the host device,
> >>>>> and I don't see this process as more complex that the unpacking of
> >>>>> virtio-video structs which we also did in crosvm.
> >>>> Unfortunately our hypervisor doesn't support mapping random host pages
> >>>> in the guest. Static allocations of shared memory regions are possible.
> >>>> But then we have to tell V4L2 to allocate buffers there. Then we'll need
> >>>> a region per virtual device. This is just very tedious and inflexible.
> >>>> That's why we're mainly interested in having the guest pages sharing in
> >>>> the virtio video spec.
> >>> This really sounds like you'll want a different approach -- two
> >>> mechanisms covered by two feature bits might indeed be the way to go.
> >>
> >> Well, basically this is the way we have it now. I'm not sure what is
> >> Alexandre's plan with the V4L2 UAPI approach. And if this is going to be
> >> solved, the solution already doesn't look future-proof anyway unfortunately.
> >>
> >>
> >>>>> I hope I have somehow addressed your points. The main point here is to
> >>>>> discuss whether the V4L2 UAPI is a suitable transport for guest/host
> >>>>> accelerated codec work, regardless of what the guest or host
> >>>>> ultimately uses as UAPI. The goal of the PoC is to demonstrate that
> >>>>> this is a viable solution. This PoC is largely simplified by the fact
> >>>>> that V4L2 is used all along the way, but this is irrelevant - yes,
> >>>>> actual devices will likely talk to other APIs and maintain more state,
> >>>>> like a virtio-video device would do. What I want to demonstrate is
> >>>>> that we can send encoding work and receive a valid stream, and that it
> >>>>> is not costly, and only marginally more complex than our virtio-video
> >>>>> spec attempts.
> >>>>>
> >>>>> ... and we can support cameras too, but that's just a convenient
> >>>>> side-effect, not the ultimate solution to the camera virtualization
> >>>>> problem (that's for the camera folks to decide).
> >>>> Thanks for your answer!
> >>> Thanks everyone -- do you think the "two feature bits to cover different
> >>> approaches, but using a common infrastructure" idea could work? If yes,
> >>> I think that's the direction we should take. If we can implement this
> >>> with just one feature bit, that might also be a good route to extend it
> >>> later, but I'm not familiar enough with the whole infrastructure to make
> >>> any judgement here.
> >>
> >> Thanks for your suggestions. Hopefully we end up with a good solution.
> >
> > To summarize my position:
> >
> > * I am still not convinced that V4L2 is lacking from a security
> > perspective. It would take just one valid example to change my mind
> > (and no, the way the queues are named is not valid). And btw, if it
> > really introduces security issues, then this makes it invalid for
> > inclusion in virtio entirely, just not OpSy's hypervisor.
>
> Already being discussed separately...
>
> > * Having two overlapping specifications for video is overkill and will
> > just fragment virtio (as tempting as it is, I won't link to XKCD). I
> > strongly advise against that.
>
> I think they're not going to create more problems, than virtio-blk,
> virtio-scsi and virtio-fs, for example.

At least these devices work at different layers, that makes them more
justifiable. virtio-video and virtio-v4l2 are just going to provide
the same API for video devices, only with different structures and
commands.

> The decision can be made like this:
> 1. You have a V4L2 device, you don't need any more processing, just want
> it inside a Linux/Android VM => use virtio-v4l2.
> 2. You don't have a V4L2 device, or your host is not Linux, or your
> maybe your guest is not Linux/Android, or you want some extra processing
> on the host (say you have a third-party proprietary library or whatever)
> => use virtio-video.

That would make sense if 2. could not be done just as easily by also
using virtio-v4l2, which I believe it can be.
Follow-Ups:
- Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  - From: Alexander Gordeev <alexander.gordeev@opensynergy.com>
References:
- Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  - From: Alexander Gordeev <alexander.gordeev@opensynergy.com>
- Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  - From: Cornelia Huck <cohuck@redhat.com>
- Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  - From: Alexander Gordeev <alexander.gordeev@opensynergy.com>
- Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  - From: Alexandre Courbot <acourbot@chromium.org>
- Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  - From: Alexander Gordeev <alexander.gordeev@opensynergy.com>