virtio-dev message

Subject: Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
From: Enrico Granata <egranata@google.com>
To: Alexander Gordeev <alexander.gordeev@opensynergy.com>
Date: Wed, 19 Apr 2023 14:34:15 -0700
Inlined

Thanks,
- Enrico

Thanks,
- Enrico


On Wed, Apr 19, 2023 at 12:39âAM Alexander Gordeev
<alexander.gordeev@opensynergy.com> wrote:
>
> On 17.04.23 16:43, Cornelia Huck wrote:
> > On Mon, Apr 17 2023, Alexander Gordeev <alexander.gordeev@opensynergy.com> wrote:
> >
> >> Hi Alexandre,
> >>
> >> Thanks for you letter! Sorry, it took me some time to write an answer.
> >>
> >> First of all I'd like to describe my perspective a little bit because it
> >> seems, that in many cases we (and other people writing their feedbacks)
> >> simply have very different priorities and background.
> > Thank you for describing the environment you want to use this in, this
> > helps to understand the different use cases.
>
> Yeah, I hope too. I should have done that earlier. I think Dmitry
> described our use-case, but it was several years ago, so nobody
> remembers of course. We should reiterate all the use-cases once in a while.
>
>
> >> OpenSynergy, the company that I work for, develops a proprietary
> >> hypervisor called COQOS mainly for automotive and aerospace domains. We
> >> have our proprietary device implementations, but overall our goal is to
> >> bring open standards into these quite closed domains and we're betting
> >> big on virtio. The idea is to run safety-critical functions like cockpit
> >> controller alongside with multimedia stuff in different VMs on the same
> >> physical board. Right now they have it on separate physical devices. So
> >> they already have maximum isolation. And we're trying to make this
> >> equally safe on a single board. The benefit is the reduced costs and
> >> some additional features. Of course, we also need features here, but at
> >> the same time security and ease of certification are among the top of
> >> our priorities. Nobody wants cars or planes to have security problems,
> >> right? Also nobody really needs DVB and even more exotic devices in cars
> >> and planes AFAIK.
> >>
> >> For the above mentioned reasons our COQOS hypervisor is running on bare
> >> metal. Also memory management for the guests is mostly static. It is
> >> possible to make a shared memory region between a device and a driver
> >> managed by device in advance. But definitely no mapping of random host
> >> pages on the fly is supported.
> >>
> >> AFAIU crosvm is about making Chrome OS more secure by putting every app
> >> in its own virtualized environment, right? Both the host and guest are
> >> linux. In this case I totally understand why V4L2 UAPI pass-through
> >> feels like a right move. I guess, you'd like to make the switch to
> >> virtualized apps as seemless as possible for your users. If they can't
> >> use their DVBs anymore, they complain. And adding the virtualization
> >> makes the whole thing more secure anyway. So I understand the desire to
> >> have the range of supported devices as broad as possible. It is also
> >> understandable that priorities are different with desktop
> >> virtualization. Also I'm not trying to diminish the great work, that you
> >> have done. It is just that from my perspective this looks like a step in
> >> the wrong direction because of the mentioned concerns. So I'm going to
> >> continue being a skeptic here, sorry.
> >>
> >> Of course, I don't expect that you continue working on the old approach
> >> now as you have put that many efforts into the V4L2 UAPI pass-through.
> >> So I think it is best to do the evolutionary changes in scope of virtio
> >> video device specification, and create a new device specification
> >> (virtio-v4l2 ?) for the revolutionary changes. Then I'd be glad to
> >> continue the virtio-video development. In fact I already started making
> >> draft v7 of the spec according to the comments. I hope it will be ready
> >> for review soon.
> >>
> >> I hope this approach will also help fix issues with virtio-video spec
> >> and driver development misalignment as well as V4L2 compliance issues
> >> with the driver. I believe the problems were caused partly by poor
> >> communication between us and by misalignment of our development cycles,
> >> not by the driver complexity.
> >>
> >> So in my opinion it is OK to have different specs with overlapping
> >> functionality for some time. My only concern is if this would be
> >> accepted by the community and the committee. How the things usually go
> >> here: preferring features and tolerating possible security issues or the
> >> other way around? Also how acceptable is having linux-specific protocols
> >> at all?
> > My main question is: What would be something that we can merge as a
> > spec, that would either cover the different use cases already, or that
> > could be easily extended to cover the use cases it does not handle
> > initially?
> >
> > For example, can some of the features that would be useful in crosvm be
> > tucked behind some feature bit(s), so that the more restricted COQOS
> > hypervisor would simply not offer them? (Two feature bits covering two
> > different mechanisms, like the current approach and the v4l2 approach,
> > would also be good, as long as there's enough common ground between the
> > two.)
> >
> > If a staged approach (adding features controled by feature bits) would
> > be possible, that would be my preferred way to do it.
>
> Hmm, I see several ways how we can use the feature flags:
> 1. Basically making two feature flags: one for the current video spec
> and one for the V4L2 UAPI pass through. Kind of the same as having two
> different specs, but within one device. Not sure which way is better.
> Probably having two separate devices would be easier to review and merge.

I agree with this. It may be worth renaming the specs to something,
say virtio-vencdec and virtio-v4l2 </bikeshedding>
Having one spec for what is really two devices forked by flags may
superficially seem simpler and less controversial (everyone gets what
they want, right?) but it feels bolted on.

> 2. Finding a subset of V4L2, that closely matches the current draft, and
> restrict everything else. A perfectly reasoned answer for this case will
> require a lot of work going through all the V4L2 structures I think. And
> even if we have a concrete plan, it has to be implemented first. I doubt
> it is possible. Based on the things, that I already know, this is going
> to be a compromise on security anyway, so we're not happy about that.
> More on that below.

To be fair, this could be done on the device side, right? I am not
saying it would be trivial, but the device implementation could
restrict the subset of V4L2 it accepts to whatever it feels is "safe"
and limit the rest.

> 3. Stop trying to simply pass the V4L2 UAPI through and focus on making
> the virtio video spec as close to the V4L2 UAPI as possible, but with
> the appropriate security model. So that the video device can be extended
> with a feature flag to something very close to full V4L2 UAPI. A lot of
> work as well, I think. And this won't allow us to simply link the V4L2
> UAPI in the spec and therefore reduce its size, which is Alexandre's
> current goal. So Alexandre and his team are not happy this way probably.
>
>  From the security point of view these are our goals from most to less
> important AFAIU:
> 1. Make the device secure. If a device is compromised, the whole
> physical machine is at risk. Complexity is the enemy here. It helps a
> lot to make the device as straightforward and easy to implement as
> possible. Therefore it is good to make the spec device/function-centric
> from this PoV.
> 2. Ensure, that drivers are also secure at least from user-space side.
> Maybe from device side too.
> 3. Implementing secure playback and making sure media doesn't leak. For
> this case it is nice to have these object UUIDs as buffers.
>
> Please correct me if there's something wrong here.
>
> When we start looking from this perspective even things like naming
> video buffer queues "output" for device input and "capture" for device
> output are problematic. In my experience this naming scheme takes some
> time and for sure several coding mistakes to get used to. Unfortunately
> this can't be turned off with some feature flags. In contrast virtio
> video v6 names these queues "input" and "output". This is perfectly fine
> if we look from the device side. It is understandable, that Alexandre's
> list of differences between V4L2 UAPI and the current state of virtio
> video doesn't include these things. But we have to count them, I think.
> That's why it takes me so long to make the list. :) So throwing away
> this simplicity is still going to be a compromise from the security
> perspective, that we're not happy about.
>
> This is mostly because V4L2 UAPI brings a hard dependency on V4L2, its
> security model, legacy, use-cases, developers, etc. It can be changed
> over time, but this is a long process because this means changing the
> Linux UAPI. Also this means, that nothing can be removed from it, only
> added (if the V4L2 community agrees). For example, we can't simply add a
> new way of sharing buffers for potential new use-cases once we switch to
> V4L2 UAPI AFAIU. The V4L2 community can simply reject changes because
> this is a UAPI after all. We kind of have a weak dependency already,
> because the only driver implementation is based on V4L2 and we'd like to
> keep the spec as close to V4L2 as possible, but it is not the same
> thing. So at the moment it looks like the V4L2 UAPI proposal is not
> super flexible. Alexandre said, that we can simply not implement some of
> the ioctls. Well, this definitely doesn't cover all the complexity like
> the structures and other subtle details.
>
> Also adding the feature flags would probably defeat the stated purpose
> of switching to V4L2 UAPI anyway: the simplicity of the spec and of the
> V4L2 driver.
>
> So I have a lot of doubts about the feasibility of adding feature flags.
> If Alexandre and his team want the V4L2 UAPI as is, then looks like it
> is best to simply have two specs:
> 1. virtio-video for those interested in building from ground up with the
> security model appropriate for virtualization in mind. This is going to
> take time and not going to reach feature parity with V4L2 ever I think.
> I mean some old devices might never get support this way.
> 2. virtio-v4l2 as a shortcut for those interested in having feature
> parity with V4L2 fast. Like a compatibility layer. Probably this is
> going to be used in linux host + linux guest use-cases only. Maybe it
> gets obsoleted by the first spec in several years for most modern use-cases.

This latter device also makes a lot of sense for Android (I know,
Linux based) where there are nifty implementations of a number of HALs
that will just work if you assume V4L2 and will not (or will not
without rework) if you assume anything else.

>
> Or maybe have these two cases within a single device spec as I wrote above.
>
> This makes a lot of sense to me. If V4L2 UAPI pass through is in fact
> only needed for compatibility, then this way we can avoid a lot of work
> going through all of the V4L2 and trying to find different subsets or
> trying to construct something, that is close to V4L2 UAPI, but doesn't
> compromise on the security. I'm not really interested in doing all this
> work because we're already more or less satisfied with the current
> state. We don't need feature parity with V4L2. On the other hand for
> Alexandre the feature-parity with V4L2 is clearly of higher priority,
> than all these subtle security model differences. In my opinion it also
> doesn't make sense to invest that much time in something, that looks
> like a compatibility layer. So it seems both of us are interested in
> avoiding all this extra work. Then I'd just prefer to have two different
> specs so that everyone can work according to their priorities.
>
>
> > Regarding the protocol: I think Linux-originating protocols (that can be
> > implemented on non-Linux setups) are fine, Linux-only protocols probably
> > not so much.
>
> Thanks for the information. Well, it looks like the V4L2 UAPI could be
> implemented on any platform unless it needs a completely new way of
> memory management since all the V4L2_MEMORY_* constants are going to be
> used already AFAIU.
>
>
> >>>>      a. So V4L2 subsystem and the current virtio-video driver are already
> >>>> reducing the complexity. And this seems as the right place to do this,
> >>>> because the complexity is caused by the amount of V4L2 use cases and its
> >>>> legacy. If somebody wants to use virtio-video in a Windows guest, they
> >>>> would prefer a simpler API, right? I think this use-case is not purely
> >>>> abstract at all.
> >>> The V4L2 subsystem is there to factorize code that can be shared
> >>> between drivers and manage their internal state. Our target is the
> >>> V4L2 UAPI, so a Windows driver needs not be concerned about these
> >>> details - it does what it would have done with virtio-video, and just
> >>> uses the V4L2 structures to communicate with the host instead of the
> >>> virtio-video ones.
> >> It can also reuse the virtio-video structures. So I think despite the
> >> ability to reuse V4L2 structures, having to implement a linux-specific
> >> interface would still be a bigger pain.
> > Hm. Do the v4l2 structures drag in too many adjacent things that need to
> > be implemented? Can we match the video-video structures from the current
> > proposal with some v4l2 structures and extract a common wrapper for
> > those that match, with a feature-bit controlled backend? It would be
> > fine if any of those backends supported a slightly different subset of
> > the common parts, as long as the parts implemented by both would be
> > enough to implement a working device. (Mostly thinking out loud here.)
>
> I don't think this is realistic unfortunately. On per ioctl level it is
> possible to disable some functionality probably, but the V4L2 structures
> are set in stone. We can only extend them.
>
>
> >>> The guest driver that I wrote is, I think, a good example of the
> >>> complexity you can expect in terms of guest driver size (as it is
> >>> pretty functional already with its 1000 and some LoCs). For the UAPI
> >>> complexity, the host device basically unpacks the information it needs
> >>> and rebuilds the V4L2 structures before calling into the host device,
> >>> and I don't see this process as more complex that the unpacking of
> >>> virtio-video structs which we also did in crosvm.
> >> Unfortunately our hypervisor doesn't support mapping random host pages
> >> in the guest. Static allocations of shared memory regions are possible.
> >> But then we have to tell V4L2 to allocate buffers there. Then we'll need
> >> a region per virtual device. This is just very tedious and inflexible.
> >> That's why we're mainly interested in having the guest pages sharing in
> >> the virtio video spec.
> > This really sounds like you'll want a different approach -- two
> > mechanisms covered by two feature bits might indeed be the way to go.
>
> Well, basically this is the way we have it now. I'm not sure what is
> Alexandre's plan with the V4L2 UAPI approach. And if this is going to be
> solved, the solution already doesn't look future-proof anyway unfortunately.
>
>
> >>> I hope I have somehow addressed your points. The main point here is to
> >>> discuss whether the V4L2 UAPI is a suitable transport for guest/host
> >>> accelerated codec work, regardless of what the guest or host
> >>> ultimately uses as UAPI. The goal of the PoC is to demonstrate that
> >>> this is a viable solution. This PoC is largely simplified by the fact
> >>> that V4L2 is used all along the way, but this is irrelevant - yes,
> >>> actual devices will likely talk to other APIs and maintain more state,
> >>> like a virtio-video device would do. What I want to demonstrate is
> >>> that we can send encoding work and receive a valid stream, and that it
> >>> is not costly, and only marginally more complex than our virtio-video
> >>> spec attempts.
> >>>
> >>> ... and we can support cameras too, but that's just a convenient
> >>> side-effect, not the ultimate solution to the camera virtualization
> >>> problem (that's for the camera folks to decide).
> >> Thanks for your answer!
> > Thanks everyone -- do you think the "two feature bits to cover different
> > approaches, but using a common infrastructure" idea could work? If yes,
> > I think that's the direction we should take. If we can implement this
> > with just one feature bit, that might also be a good route to extend it
> > later, but I'm not familiar enough with the whole infrastructure to make
> > any judgement here.
>
> Thanks for your suggestions. Hopefully we end up with a good solution.
>
>
> --
> Alexander Gordeev
> Senior Software Engineer
>
> OpenSynergy GmbH
> Rotherstr. 20, 10245 Berlin
>
> Phone: +49 30 60 98 54 0 - 88
> Fax: +49 (30) 60 98 54 0 - 99
> EMail: alexander.gordeev@opensynergy.com
>
> www.opensynergy.com
>
> Handelsregister/Commercial Registry: Amtsgericht Charlottenburg, HRB 108616B
> GeschÃftsfÃhrer/Managing Director: RÃgis Adjamah
>
>
> Please mind our privacy notice<https://www.opensynergy.com/datenschutzerklaerung/privacy-notice-for-business-partners-pursuant-to-article-13-of-the-general-data-protection-regulation-gdpr/> pursuant to Art. 13 GDPR. // Unsere Hinweise zum Datenschutz gem. Art. 13 DSGVO finden Sie hier.<https://www.opensynergy.com/de/datenschutzerklaerung/datenschutzhinweise-fuer-geschaeftspartner-gem-art-13-dsgvo/>
Follow-Ups:
- Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  - From: Alexander Gordeev <alexander.gordeev@opensynergy.com>
References:
- Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  - From: Alexander Gordeev <alexander.gordeev@opensynergy.com>
- Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  - From: Cornelia Huck <cohuck@redhat.com>
- Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  - From: Alexander Gordeev <alexander.gordeev@opensynergy.com>