[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [PATCH v1] virtio-gpu: Document new fence-passing feature
On Fri, Feb 2, 2024 at 10:16âAM Gurchetan Singh
<gurchetansingh@chromium.org> wrote:
>
>
>
> On Thu, Feb 1, 2024 at 11:19âAM Rob Clark <robdclark@gmail.com> wrote:
>>
>> On Thu, Feb 1, 2024 at 10:08âAM Gurchetan Singh
>> <gurchetansingh@chromium.org> wrote:
>> >
>> >
>> >
>> > On Thu, Feb 1, 2024 at 9:14âAM Rob Clark <robdclark@gmail.com> wrote:
>> >>
>> >> On Thu, Feb 1, 2024 at 9:00âAM Gurchetan Singh
>> >> <gurchetansingh@chromium.org> wrote:
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > On Mon, Jan 29, 2024 at 9:21âPM Dmitry Osipenko <dmitry.osipenko@collabora.com> wrote:
>> >> >>
>> >> >> On 1/30/24 04:34, Gurchetan Singh wrote:
>> >> >> > On Sun, Jan 28, 2024 at 11:12âAM Dmitry Osipenko <
>> >> >> > dmitry.osipenko@collabora.com> wrote:
>> >> >> >
>> >> >> >> Document new virtio-gpu protocol fence-passing feature that extends 3d
>> >> >> >> submit command with support of in-fences which are passed from guest to
>> >> >> >> host for waiting. This feature allows to offload waiting for in-fences
>> >> >> >> from guest to host, where it can be done more efficiently.
>> >> >> >>
>> >> >> >> Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
>> >> >> >> ---
>> >> >> >>Â device-types/gpu/description.tex | 22 +++++++++++++++++++++-
>> >> >> >>Â 1 file changed, 21 insertions(+), 1 deletion(-)
>> >> >> >>
>> >> >> >> diff --git a/device-types/gpu/description.tex
>> >> >> >> b/device-types/gpu/description.tex
>> >> >> >> index 443524851a05..e5e23b5c9072 100644
>> >> >> >> --- a/device-types/gpu/description.tex
>> >> >> >> +++ b/device-types/gpu/description.tex
>> >> >> >> @@ -37,6 +37,8 @@ \subsection{Feature bits}\label{sec:Device Types / GPU
>> >> >> >> Device / Feature bits}
>> >> >> >>Â Â resources is supported.
>> >> >> >>Â \item[VIRTIO_GPU_F_CONTEXT_INIT (4)] multiple context types and
>> >> >> >>  synchronization timelines supported. Requires VIRTIO_GPU_F_VIRGL.
>> >> >> >> +\item[VIRTIO_GPU_F_FENCE_PASSING (5)] passing fence IDs from guest to host
>> >> >> >> + for waiting supported. Requires VIRTIO_GPU_F_VIRGL.
>> >> >> >>Â \end{description}
>> >> >> >>
>> >> >> >>Â \subsection{Device configuration layout}\label{sec:Device Types / GPU
>> >> >> >> Device / Device configuration layout}
>> >> >> >> @@ -746,7 +748,25 @@ \subsubsection{Device Operation: controlq
>> >> >> >> (3d)}\label{sec:Device Types / GPU Dev
>> >> >> >>
>> >> >> >>Â \item[VIRTIO_GPU_CMD_SUBMIT_3D]
>> >> >> >>  Submit an opaque command stream. The type of the command stream is
>> >> >> >> -Â determined when creating a context.
>> >> >> >> + determined when creating a context. Request data is
>> >> >> >> + \field{struct virtio_gpu_cmd_submit}. The \field{size} field describes
>> >> >> >> + the size of \field{cmd_data} array in bytes. The \field{num_in_fences}
>> >> >> >> +Â field is active only if VIRTIO_GPU_F_FENCE_PASSING is enabled, otherwise
>> >> >> >> + \field{num_in_fences} is treated as zero. The array of
>> >> >> >> \field{in_fences}
>> >> >> >> +Â IDs is placed in between the \field{num_in_fences} field and the
>> >> >> >> + \field{cmd_data} array. The \field{in_fences} array contains virtio-gpu
>> >> >> >> +Â \field{fence_id}'s corresponding to the out-fence IDs of a previous
>> >> >> >> +Â 3d submits.
>> >> >> >> +
>> >> >> >> +\begin{lstlisting}
>> >> >> >> +struct virtio_gpu_cmd_submit {
>> >> >> >> +Â Â Â Â struct virtio_gpu_ctrl_hdr hdr;
>> >> >> >> +Â Â Â Â le32 size;
>> >> >> >> +Â Â Â Â le32 num_in_fences;
>> >> >> >> +Â Â Â Â le64 in_fences[num_in_fences];
>> >> >> >> +Â Â Â Â u32 cmd_data[size/4];
>> >> >> >> +};
>> >> >> >> +\end{lstlisting}
>> >> >> >>
>> >> >> >
>> >> >> > Don't you need VIRTIO_GPU_FLAG_FENCE_SHAREABLE somewhere too?
>> >> >>
>> >> >> We haven't made use of this flag in the experimental/prototype version
>> >> >> that was made for the native contexts. The flag is unused by both Qemu
>> >> >> and CrosVM.
>> >> >>
>> >> >> We added Linux kernel UAPI flag VIRTGPU_EXECBUF_SHARED_FENCE and it's
>> >> >> deemed to be enough. If the UAPI flag isn't set for a job's out-fence,
>> >> >> then guest uses guest-side synchronization for this fence when it's
>> >> >> passed as in-fence for the further jobs.
>> >> >>
>> >> >> The FENCE_SHAREABLE flag isn't needed by nctx/virgl/venus protocols. All
>> >> >> fences are shareable by default. If VMM fails to export fence, it is
>> >> >> treated like fence was already signaled. In case of Venus, it's up to
>> >> >> Venus to decide internally which fences can be made shareable, i.e. both
>> >> >> guest and virglrenderer Venus implementations are responsible for making
>> >> >> decision whether fence should be made exportable. Venus won't make
>> >> >> out-fence exportable for internal intermediate jobs and will encode this
>> >> >> export-info within the Venus protocol.
>> >> >>
>> >> >> Whether FENCE_SHAREABLE virtio-gpu flag will be needed by gfxstream is
>> >> >> questionable. Gfxstream should be similar to Venus, i.e. you should be
>> >> >> able to add flag to the gfxstream. We'll need to be certain that the new
>> >> >> common flag is definitely needed by making use of it before specifying
>> >> >> it in the stable virtio-gpu protocol.
>> >> >
>> >> >
>> >> > I don't think you've gotten fence passing working with display integration. Although passing external fences between two GPU instances does happen, the much more common and important case is passing a fence from the virtual GPU to virtual display.
>> >> >
>> >> > Let me describe the use case for you:
>> >> >
>> >> > GPU creates a fence
>> >> > rutabaga exports fence on creation via stream_renderer_export_fence / or virglrenderer_export_fence --> creates [fence_id --> descriptor mapping] when FLAG_SHAREABLE is set
>> >> > SUBMIT_CMD + CROSS_DOMAIN_SEND --> takes the fence_ids and sends a fence to the compositor
>> >> >
>> >> > You also refer to https://gitlab.freedesktop.org/virgl/virglrenderer/-/merge_requests/1138#note_1996999.
>> >> >
>> >> > Once you have a working prototype that shares an out_fence with the display, then we can talk about whether FLAG_SHAREABLE is needed or not. So I suggest you focus on that before upstreaming uapis + virtio protocols.
>> >> >
>> >>
>> >> Why would sharing with display (via a different virtgpu ctx type) be
>> >> any different than sharing with a different GPU context? I don't
>> >> really see a problem here, either the fence fd still exists on the
>> >> host side, or you know that it is already signaled and you don't have
>> >> to wait on anything.
>> >
>> >
>> > I'm not a huge fan of the fact that the current implementation uses EGL to create the pre-signalled fence. Vulkan ... it could work, but I think you'll have to generate some sort of dummy command at startup? Shouldn't that be implemented too?
>> >
>> > Something should use the new "virgl_renderer_export_signalled_fence" API to verify everything works.
>>
>> ok, so I gather you are talking about the awkwardness with the wayland
>> protocol that we can't just pass -1 for a fence? Rather than say
>> something using kms.
>
>
> Is there a plan to do a virtio-gpu KMS implementation? Something like RESOURCE_FLUSH (in_fence)? There are still many virtio-gpu KMS users out there.
>
>>
>>
>> I think this is just an implementation detail. At any rate, in the
>> later version of the virglrenderer implementation, which got merged,
>> there is no dependency on EGL for this. Only that there was _some_
>> prior work which created a fence. Virglrenderer hangs on to a
>> reference to the last signalled fence until it is replaced with a
>> newer one. If a guest fence is created, by definition the host will
>> have _some_ signaled fence to hand out.
>
>
> The implementation does use a static global table (virgl_fence_table) to coordinate between the virgl sync thread and the main thread? This is common in C, but discouraged in Rust (and I do have a use case in mind).
>
This is fallout of how the virglrenderer API was designed (ie. not
having a toplevel vgpu instance object). But this seems pretty off
topic from the protocol definition. Maybe we can revisit if there
ever is a libvirglrenderer2 where we can break API compatibility.
> Spec-wise, having FLAG_SHAREABLE + another DESTROY_FENCE api would probably be the cleanest since it gives explicit lifetimes. If there were no performance concerns we would totally do it.
>
> The concern is that DESTROY_FENCE causes excessive vm-exits as fences are attached to the surface every frame. So that's how we end up with the last signalled fence idea, and somewhat undefined lifetime/ownership semantics.
>
The semantics are in fact well defined, a fence is just an integer in
the context of a timeline, there is _no_ lifetime/ownership.
> Though, the guest kernel already tracks fence lifetimes through dma_fence. What if we add:
>
> - DESTROY_FENCES(fence_ids, num_fences)
> - a virtio-config ("fence_destroy_threshold"), which controls num_fences in DESTROY_FENCES
>
> When fence_destroy_threshold == 0, this would be the current proposed solution (fence lifetimes implementation defined). However, a user can make "fence_destroy_threshold" 100 or something like that, to cause a VM-exit everytime 100 fences have been destroyed.
>
> This attempts a tunable compromise between API purity and performance concerns. WDYT?
Tbh, I think the current approach is cleaner from the PoV of what a
fence is.. it is just a seqno, it should not have a lifetime.
BR,
-R
>
>>
>>
>> But this is all implementation detail
>>
>> BR,
>> -R
>>
>> >
>> > Essentially, let's do our due diligence and verify the most important use case (gpu --> display) actually works.
>> >
>> >>
>> >>
>> >> BR,
>> >> -R
>> >>
>> >> >> --
>> >> >> Best regards,
>> >> >> Dmitry
>> >> >>
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]