OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-dev message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification


On 17.04.23 16:43, Cornelia Huck wrote:
On Mon, Apr 17 2023, Alexander Gordeev <alexander.gordeev@opensynergy.com> wrote:

Hi Alexandre,

Thanks for you letter! Sorry, it took me some time to write an answer.

First of all I'd like to describe my perspective a little bit because it
seems, that in many cases we (and other people writing their feedbacks)
simply have very different priorities and background.
Thank you for describing the environment you want to use this in, this
helps to understand the different use cases.

Yeah, I hope too. I should have done that earlier. I think Dmitry
described our use-case, but it was several years ago, so nobody
remembers of course. We should reiterate all the use-cases once in a while.


OpenSynergy, the company that I work for, develops a proprietary
hypervisor called COQOS mainly for automotive and aerospace domains. We
have our proprietary device implementations, but overall our goal is to
bring open standards into these quite closed domains and we're betting
big on virtio. The idea is to run safety-critical functions like cockpit
controller alongside with multimedia stuff in different VMs on the same
physical board. Right now they have it on separate physical devices. So
they already have maximum isolation. And we're trying to make this
equally safe on a single board. The benefit is the reduced costs and
some additional features. Of course, we also need features here, but at
the same time security and ease of certification are among the top of
our priorities. Nobody wants cars or planes to have security problems,
right? Also nobody really needs DVB and even more exotic devices in cars
and planes AFAIK.

For the above mentioned reasons our COQOS hypervisor is running on bare
metal. Also memory management for the guests is mostly static. It is
possible to make a shared memory region between a device and a driver
managed by device in advance. But definitely no mapping of random host
pages on the fly is supported.

AFAIU crosvm is about making Chrome OS more secure by putting every app
in its own virtualized environment, right? Both the host and guest are
linux. In this case I totally understand why V4L2 UAPI pass-through
feels like a right move. I guess, you'd like to make the switch to
virtualized apps as seemless as possible for your users. If they can't
use their DVBs anymore, they complain. And adding the virtualization
makes the whole thing more secure anyway. So I understand the desire to
have the range of supported devices as broad as possible. It is also
understandable that priorities are different with desktop
virtualization. Also I'm not trying to diminish the great work, that you
have done. It is just that from my perspective this looks like a step in
the wrong direction because of the mentioned concerns. So I'm going to
continue being a skeptic here, sorry.

Of course, I don't expect that you continue working on the old approach
now as you have put that many efforts into the V4L2 UAPI pass-through.
So I think it is best to do the evolutionary changes in scope of virtio
video device specification, and create a new device specification
(virtio-v4l2 ?) for the revolutionary changes. Then I'd be glad to
continue the virtio-video development. In fact I already started making
draft v7 of the spec according to the comments. I hope it will be ready
for review soon.

I hope this approach will also help fix issues with virtio-video spec
and driver development misalignment as well as V4L2 compliance issues
with the driver. I believe the problems were caused partly by poor
communication between us and by misalignment of our development cycles,
not by the driver complexity.

So in my opinion it is OK to have different specs with overlapping
functionality for some time. My only concern is if this would be
accepted by the community and the committee. How the things usually go
here: preferring features and tolerating possible security issues or the
other way around? Also how acceptable is having linux-specific protocols
at all?
My main question is: What would be something that we can merge as a
spec, that would either cover the different use cases already, or that
could be easily extended to cover the use cases it does not handle
initially?

For example, can some of the features that would be useful in crosvm be
tucked behind some feature bit(s), so that the more restricted COQOS
hypervisor would simply not offer them? (Two feature bits covering two
different mechanisms, like the current approach and the v4l2 approach,
would also be good, as long as there's enough common ground between the
two.)

If a staged approach (adding features controled by feature bits) would
be possible, that would be my preferred way to do it.

Hmm, I see several ways how we can use the feature flags:
1. Basically making two feature flags: one for the current video spec
and one for the V4L2 UAPI pass through. Kind of the same as having two
different specs, but within one device. Not sure which way is better.
Probably having two separate devices would be easier to review and merge.
2. Finding a subset of V4L2, that closely matches the current draft, and
restrict everything else. A perfectly reasoned answer for this case will
require a lot of work going through all the V4L2 structures I think. And
even if we have a concrete plan, it has to be implemented first. I doubt
it is possible. Based on the things, that I already know, this is going
to be a compromise on security anyway, so we're not happy about that.
More on that below.
3. Stop trying to simply pass the V4L2 UAPI through and focus on making
the virtio video spec as close to the V4L2 UAPI as possible, but with
the appropriate security model. So that the video device can be extended
with a feature flag to something very close to full V4L2 UAPI. A lot of
work as well, I think. And this won't allow us to simply link the V4L2
UAPI in the spec and therefore reduce its size, which is Alexandre's
current goal. So Alexandre and his team are not happy this way probably.

From the security point of view these are our goals from most to less
important AFAIU:
1. Make the device secure. If a device is compromised, the whole
physical machine is at risk. Complexity is the enemy here. It helps a
lot to make the device as straightforward and easy to implement as
possible. Therefore it is good to make the spec device/function-centric
from this PoV.
2. Ensure, that drivers are also secure at least from user-space side.
Maybe from device side too.
3. Implementing secure playback and making sure media doesn't leak. For
this case it is nice to have these object UUIDs as buffers.

Please correct me if there's something wrong here.

When we start looking from this perspective even things like naming
video buffer queues "output" for device input and "capture" for device
output are problematic. In my experience this naming scheme takes some
time and for sure several coding mistakes to get used to. Unfortunately
this can't be turned off with some feature flags. In contrast virtio
video v6 names these queues "input" and "output". This is perfectly fine
if we look from the device side. It is understandable, that Alexandre's
list of differences between V4L2 UAPI and the current state of virtio
video doesn't include these things. But we have to count them, I think.
That's why it takes me so long to make the list. :) So throwing away
this simplicity is still going to be a compromise from the security
perspective, that we're not happy about.

This is mostly because V4L2 UAPI brings a hard dependency on V4L2, its
security model, legacy, use-cases, developers, etc. It can be changed
over time, but this is a long process because this means changing the
Linux UAPI. Also this means, that nothing can be removed from it, only
added (if the V4L2 community agrees). For example, we can't simply add a
new way of sharing buffers for potential new use-cases once we switch to
V4L2 UAPI AFAIU. The V4L2 community can simply reject changes because
this is a UAPI after all. We kind of have a weak dependency already,
because the only driver implementation is based on V4L2 and we'd like to
keep the spec as close to V4L2 as possible, but it is not the same
thing. So at the moment it looks like the V4L2 UAPI proposal is not
super flexible. Alexandre said, that we can simply not implement some of
the ioctls. Well, this definitely doesn't cover all the complexity like
the structures and other subtle details.

Also adding the feature flags would probably defeat the stated purpose
of switching to V4L2 UAPI anyway: the simplicity of the spec and of the
V4L2 driver.

So I have a lot of doubts about the feasibility of adding feature flags.
If Alexandre and his team want the V4L2 UAPI as is, then looks like it
is best to simply have two specs:
1. virtio-video for those interested in building from ground up with the
security model appropriate for virtualization in mind. This is going to
take time and not going to reach feature parity with V4L2 ever I think.
I mean some old devices might never get support this way.
2. virtio-v4l2 as a shortcut for those interested in having feature
parity with V4L2 fast. Like a compatibility layer. Probably this is
going to be used in linux host + linux guest use-cases only. Maybe it
gets obsoleted by the first spec in several years for most modern use-cases.

Or maybe have these two cases within a single device spec as I wrote above.

This makes a lot of sense to me. If V4L2 UAPI pass through is in fact
only needed for compatibility, then this way we can avoid a lot of work
going through all of the V4L2 and trying to find different subsets or
trying to construct something, that is close to V4L2 UAPI, but doesn't
compromise on the security. I'm not really interested in doing all this
work because we're already more or less satisfied with the current
state. We don't need feature parity with V4L2. On the other hand for
Alexandre the feature-parity with V4L2 is clearly of higher priority,
than all these subtle security model differences. In my opinion it also
doesn't make sense to invest that much time in something, that looks
like a compatibility layer. So it seems both of us are interested in
avoiding all this extra work. Then I'd just prefer to have two different
specs so that everyone can work according to their priorities.


Regarding the protocol: I think Linux-originating protocols (that can be
implemented on non-Linux setups) are fine, Linux-only protocols probably
not so much.

Thanks for the information. Well, it looks like the V4L2 UAPI could be
implemented on any platform unless it needs a completely new way of
memory management since all the V4L2_MEMORY_* constants are going to be
used already AFAIU.


     a. So V4L2 subsystem and the current virtio-video driver are already
reducing the complexity. And this seems as the right place to do this,
because the complexity is caused by the amount of V4L2 use cases and its
legacy. If somebody wants to use virtio-video in a Windows guest, they
would prefer a simpler API, right? I think this use-case is not purely
abstract at all.
The V4L2 subsystem is there to factorize code that can be shared
between drivers and manage their internal state. Our target is the
V4L2 UAPI, so a Windows driver needs not be concerned about these
details - it does what it would have done with virtio-video, and just
uses the V4L2 structures to communicate with the host instead of the
virtio-video ones.
It can also reuse the virtio-video structures. So I think despite the
ability to reuse V4L2 structures, having to implement a linux-specific
interface would still be a bigger pain.
Hm. Do the v4l2 structures drag in too many adjacent things that need to
be implemented? Can we match the video-video structures from the current
proposal with some v4l2 structures and extract a common wrapper for
those that match, with a feature-bit controlled backend? It would be
fine if any of those backends supported a slightly different subset of
the common parts, as long as the parts implemented by both would be
enough to implement a working device. (Mostly thinking out loud here.)

I don't think this is realistic unfortunately. On per ioctl level it is
possible to disable some functionality probably, but the V4L2 structures
are set in stone. We can only extend them.


The guest driver that I wrote is, I think, a good example of the
complexity you can expect in terms of guest driver size (as it is
pretty functional already with its 1000 and some LoCs). For the UAPI
complexity, the host device basically unpacks the information it needs
and rebuilds the V4L2 structures before calling into the host device,
and I don't see this process as more complex that the unpacking of
virtio-video structs which we also did in crosvm.
Unfortunately our hypervisor doesn't support mapping random host pages
in the guest. Static allocations of shared memory regions are possible.
But then we have to tell V4L2 to allocate buffers there. Then we'll need
a region per virtual device. This is just very tedious and inflexible.
That's why we're mainly interested in having the guest pages sharing in
the virtio video spec.
This really sounds like you'll want a different approach -- two
mechanisms covered by two feature bits might indeed be the way to go.

Well, basically this is the way we have it now. I'm not sure what is
Alexandre's plan with the V4L2 UAPI approach. And if this is going to be
solved, the solution already doesn't look future-proof anyway unfortunately.


I hope I have somehow addressed your points. The main point here is to
discuss whether the V4L2 UAPI is a suitable transport for guest/host
accelerated codec work, regardless of what the guest or host
ultimately uses as UAPI. The goal of the PoC is to demonstrate that
this is a viable solution. This PoC is largely simplified by the fact
that V4L2 is used all along the way, but this is irrelevant - yes,
actual devices will likely talk to other APIs and maintain more state,
like a virtio-video device would do. What I want to demonstrate is
that we can send encoding work and receive a valid stream, and that it
is not costly, and only marginally more complex than our virtio-video
spec attempts.

... and we can support cameras too, but that's just a convenient
side-effect, not the ultimate solution to the camera virtualization
problem (that's for the camera folks to decide).
Thanks for your answer!
Thanks everyone -- do you think the "two feature bits to cover different
approaches, but using a common infrastructure" idea could work? If yes,
I think that's the direction we should take. If we can implement this
with just one feature bit, that might also be a good route to extend it
later, but I'm not familiar enough with the whole infrastructure to make
any judgement here.

Thanks for your suggestions. Hopefully we end up with a good solution.


--
Alexander Gordeev
Senior Software Engineer

OpenSynergy GmbH
Rotherstr. 20, 10245 Berlin

Phone: +49 30 60 98 54 0 - 88
Fax: +49 (30) 60 98 54 0 - 99
EMail: alexander.gordeev@opensynergy.com

www.opensynergy.com

Handelsregister/Commercial Registry: Amtsgericht Charlottenburg, HRB 108616B
GeschÃftsfÃhrer/Managing Director: RÃgis Adjamah


Please mind our privacy notice<https://www.opensynergy.com/datenschutzerklaerung/privacy-notice-for-business-partners-pursuant-to-article-13-of-the-general-data-protection-regulation-gdpr/> pursuant to Art. 13 GDPR. // Unsere Hinweise zum Datenschutz gem. Art. 13 DSGVO finden Sie hier.<https://www.opensynergy.com/de/datenschutzerklaerung/datenschutzhinweise-fuer-geschaeftspartner-gem-art-13-dsgvo/>


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]