Re: [virtio-dev] [PATCH v1 09/12] drm/virtio: implement context init: al

Subject: Re: [virtio-dev] [PATCH v1 09/12] drm/virtio: implement context init: allocate an array of fence contexts

On Tue, Sep 14, 2021 at 10:53 AM Chia-I Wu <olvaffe@gmail.com> wrote:

,On Mon, Sep 13, 2021 at 6:57 PM Gurchetan Singh
<gurchetansingh@chromium.org> wrote:
>
>
>
>
> On Mon, Sep 13, 2021 at 11:52 AM Chia-I Wu <olvaffe@gmail.com> wrote:
>>
>> .
>>
>> On Mon, Sep 13, 2021 at 10:48 AM Gurchetan Singh
>> <gurchetansingh@chromium.org> wrote:
>> >
>> >
>> >
>> > On Fri, Sep 10, 2021 at 12:33 PM Chia-I Wu <olvaffe@gmail.com> wrote:
>> >>
>> >> On Wed, Sep 8, 2021 at 6:37 PM Gurchetan Singh
>> >> <gurchetansingh@chromium.org> wrote:
>> >> >
>> >> > We don't want fences from different 3D contexts (virgl, gfxstream,
>> >> > venus) to be on the same timeline.Â With explicit context creation,
>> >> > we can specify the number of ring each context wants.
>> >> >
>> >> > Execbuffer can specify which ring to use.
>> >> >
>> >> > Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org>
>> >> > Acked-by: Lingfeng Yang <lfy@google.com>
>> >> > ---
>> >> >Â drivers/gpu/drm/virtio/virtgpu_drv.hÂ Â|Â 3 +++
>> >> >Â drivers/gpu/drm/virtio/virtgpu_ioctl.c | 34 ++++++++++++++++++++++++--
>> >> >Â 2 files changed, 35 insertions(+), 2 deletions(-)
>> >> >
>> >> > diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h b/drivers/gpu/drm/virtio/virtgpu_drv.h
>> >> > index a5142d60c2fa..cca9ab505deb 100644
>> >> > --- a/drivers/gpu/drm/virtio/virtgpu_drv.h
>> >> > +++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
>> >> > @@ -56,6 +56,7 @@
>> >> >Â #define STATE_ERR 2
>> >> >
>> >> >Â #define MAX_CAPSET_ID 63
>> >> > +#define MAX_RINGS 64
>> >> >
>> >> >Â struct virtio_gpu_object_params {
>> >> >Â Â Â Â Âunsigned long size;
>> >> > @@ -263,6 +264,8 @@ struct virtio_gpu_fpriv {
>> >> >Â Â Â Â Âuint32_t ctx_id;
>> >> >Â Â Â Â Âuint32_t context_init;
>> >> >Â Â Â Â Âbool context_created;
>> >> > +Â Â Â Âuint32_t num_rings;
>> >> > +Â Â Â Âuint64_t base_fence_ctx;
>> >> >Â Â Â Â Âstruct mutex context_lock;
>> >> >Â };
>> >> >
>> >> > diff --git a/drivers/gpu/drm/virtio/virtgpu_ioctl.c b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
>> >> > index f51f3393a194..262f79210283 100644
>> >> > --- a/drivers/gpu/drm/virtio/virtgpu_ioctl.c
>> >> > +++ b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
>> >> > @@ -99,6 +99,11 @@ static int virtio_gpu_execbuffer_ioctl(struct drm_device *dev, void *data,
>> >> >Â Â Â Â Âint in_fence_fd = exbuf->fence_fd;
>> >> >Â Â Â Â Âint out_fence_fd = -1;
>> >> >Â Â Â Â Âvoid *buf;
>> >> > +Â Â Â Âuint64_t fence_ctx;
>> >> > +Â Â Â Âuint32_t ring_idx;
>> >> > +
>> >> > +Â Â Â Âfence_ctx = vgdev->fence_drv.context;
>> >> > +Â Â Â Âring_idx = 0;
>> >> >
>> >> >Â Â Â Â Âif (vgdev->has_virgl_3d == false)
>> >> >Â Â Â Â Â Â Â Â Âreturn -ENOSYS;
>> >> > @@ -106,6 +111,17 @@ static int virtio_gpu_execbuffer_ioctl(struct drm_device *dev, void *data,
>> >> >Â Â Â Â Âif ((exbuf->flags & ~VIRTGPU_EXECBUF_FLAGS))
>> >> >Â Â Â Â Â Â Â Â Âreturn -EINVAL;
>> >> >
>> >> > +Â Â Â Âif ((exbuf->flags & VIRTGPU_EXECBUF_RING_IDX)) {
>> >> > +Â Â Â Â Â Â Â Âif (exbuf->ring_idx >= vfpriv->num_rings)
>> >> > +Â Â Â Â Â Â Â Â Â Â Â Âreturn -EINVAL;
>> >> > +
>> >> > +Â Â Â Â Â Â Â Âif (!vfpriv->base_fence_ctx)
>> >> > +Â Â Â Â Â Â Â Â Â Â Â Âreturn -EINVAL;
>> >> > +
>> >> > +Â Â Â Â Â Â Â Âfence_ctx = vfpriv->base_fence_ctx;
>> >> > +Â Â Â Â Â Â Â Âring_idx = exbuf->ring_idx;
>> >> > +Â Â Â Â}
>> >> > +
>> >> >Â Â Â Â Âexbuf->fence_fd = -1;
>> >> >
>> >> >Â Â Â Â Âvirtio_gpu_create_context(dev, file);
>> >> > @@ -173,7 +189,7 @@ static int virtio_gpu_execbuffer_ioctl(struct drm_device *dev, void *data,
>> >> >Â Â Â Â Â Â Â Â Â Â Â Â Âgoto out_memdup;
>> >> >Â Â Â Â Â}
>> >> >
>> >> > -Â Â Â Âout_fence = virtio_gpu_fence_alloc(vgdev, vgdev->fence_drv.context, 0);
>> >> > +Â Â Â Âout_fence = virtio_gpu_fence_alloc(vgdev, fence_ctx, ring_idx);
>> >> >Â Â Â Â Âif(!out_fence) {
>> >> >Â Â Â Â Â Â Â Â Âret = -ENOMEM;
>> >> >Â Â Â Â Â Â Â Â Âgoto out_unresv;
>> >> > @@ -691,7 +707,7 @@ static int virtio_gpu_context_init_ioctl(struct drm_device *dev,
>> >> >Â Â Â Â Â Â Â Â Âreturn -EINVAL;
>> >> >
>> >> >Â Â Â Â Â/* Number of unique parameters supported at this time. */
>> >> > -Â Â Â Âif (num_params > 1)
>> >> > +Â Â Â Âif (num_params > 2)
>> >> >Â Â Â Â Â Â Â Â Âreturn -EINVAL;
>> >> >
>> >> >Â Â Â Â Âctx_set_params = memdup_user(u64_to_user_ptr(args->ctx_set_params),
>> >> > @@ -731,6 +747,20 @@ static int virtio_gpu_context_init_ioctl(struct drm_device *dev,
>> >> >
>> >> >Â Â Â Â Â Â Â Â Â Â Â Â Âvfpriv->context_init |= value;
>> >> >Â Â Â Â Â Â Â Â Â Â Â Â Âbreak;
>> >> > +Â Â Â Â Â Â Â Âcase VIRTGPU_CONTEXT_PARAM_NUM_RINGS:
>> >> > +Â Â Â Â Â Â Â Â Â Â Â Âif (vfpriv->base_fence_ctx) {
>> >> > +Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Âret = -EINVAL;
>> >> > +Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Âgoto out_unlock;
>> >> > +Â Â Â Â Â Â Â Â Â Â Â Â}
>> >> > +
>> >> > +Â Â Â Â Â Â Â Â Â Â Â Âif (value > MAX_RINGS) {
>> >> > +Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Âret = -EINVAL;
>> >> > +Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Âgoto out_unlock;
>> >> > +Â Â Â Â Â Â Â Â Â Â Â Â}
>> >> > +
>> >> > +Â Â Â Â Â Â Â Â Â Â Â Âvfpriv->base_fence_ctx = dma_fence_context_alloc(value);
>> >> With multiple fence contexts, we should do something about implicit fencing.
>> >>
>> >> The classic example is Mesa and X server.Â When both use virgl and the
>> >> global fence context, no dma_fence_wait is fine.Â But when Mesa uses
>> >> venus and the ring fence context, dma_fence_wait should be inserted.
>> >
>> >
>> >Â If I read your comment correctly, the use case is:
>> >
>> > context A (venus)
>> >
>> > sharing a render target with
>> >
>> > context B (Xserver backed virgl)
>> >
>> > ?
>> >
>> > Which function do you envisage dma_fence_wait(...) to be inserted?Â Doesn't implicit synchronization mean there's no fence to share between contexts (only buffer objects)?
>>
>> Fences can be implicitly shared via reservation objects associated
>> with buffer objects.
>>
>> > It may be possible to wait on the reservation object associated with a buffer object from a different context (userspace can also do DRM_IOCTL_VIRTGPU_WAIT), but not sure if that's what you're looking for.
>>
>> Right, that's what I am looking for.Â Userspace expects implicit
>> fencing to work.Â While there are works to move the userspace to do
>> explicit fencing, it is not there yet in general and we can't require
>> the userspace to do explicit fencing or DRM_IOCTL_VIRTGPU_WAIT.
>
>
> Another option would be to use the upcoming DMA_BUF_IOCTL_EXPORT_SYNC_FILE + VIRTGPU_EXECBUF_FENCE_FD_IN (which checks the dma_fence context).
That requires the X server / compositors to be modified.Â For example,
venus works under Android (where there is explicit fencing) or under a
modified compositor (which does DMA_BUF_IOCTL_EXPORT_SYNC_FILE or
DRM_IOCTL_VIRTGPU_WAIT).Â But it does not work too well with an
unmodified X server.

Some semi-recent virgl modifications will be needed regardless for interop, such as VIRGL_CAP_V2_UNTYPED_RESOURCE (?).

Not sure aren't too many virgl users (most developers)

Does Xserver just pick up the latest Mesa release (including virgl/venus)?Â Suppose context types land in 5.16, the userspace changes land (both Venus/Virgl) in 21.2 stable releases.

https://docs.mesa3d.org/release-calendar.html

>
> Generally, if it only requires virgl changes, userspace changes are fine since OpenGL drivers implement implicit sync in many ways.Â Waiting on the reservation object in the kernel is fine too though.
I don't think we want to assume virgl to be the only consumer of
dma-bufs, despite that it is the most common use case.

>
> Though venus doesn't use the NUM_RINGS param yet.Â Getting all permutations of context type + display integration working would take some time (patchset mostly tested with wayland + gfxstream/Android [no implicit sync]).
>
> WDYT of someone figuring out virgl/venus interop later, independently of this patchset?

I think we should understand the implications of multiple fence
contexts better, even if some changes are not included in this
patchset.

From my view, we don't need implicit fencing in most cases and
implicit fencing should be considered a legacy path.Â But X server /
compositors today happen to require it.Â Other drivers seem to use a
flag to control whether implicit fences are set up or waited (e.g.,
AMDGPU_GEM_CREATE_EXPLICIT_SYNC, MSM_SUBMIT_NO_IMPLICIT, or
EXEC_OBJECT_WRITE).Â It seems to be the least surprising thing to do.

IMO, the easiest way is just to limit the change to userspace if possible since implicit sync is legacy/something we want to deprecate over time.ÂÂ

Another option is to add something like VIRTGPU_EXECBUF_EXPLICIT_SYNC (similar to MSM_SUBMIT_NO_IMPLICIT), where the reservation objects are waited on / added to without that flag.Â Since explicit sync will need new hypercalls/params and is a major, that feature is expected to be independent of context types.

With that option, waiting on the reservation object would just be another bug fixÂ+ addition to 5.16 (perhaps by you) so we can proceed in parallel faster.Â VIRTGPU_EXECBUF_EXPLICIT_SYNC (or an equivalent) would be added later.

>
>>
>>
>>
>>
>> >
>> >>
>> >>
>> >> > +Â Â Â Â Â Â Â Â Â Â Â Âvfpriv->num_rings = value;
>> >> > +Â Â Â Â Â Â Â Â Â Â Â Âbreak;
>> >> >Â Â Â Â Â Â Â Â Âdefault:
>> >> >Â Â Â Â Â Â Â Â Â Â Â Â Âret = -EINVAL;
>> >> >Â Â Â Â Â Â Â Â Â Â Â Â Âgoto out_unlock;
>> >> > --
>> >> > 2.33.0.153.gba50c8fa24-goog
>> >> >
>> >> >
>> >> > ---------------------------------------------------------------------
>> >> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
>> >> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>> >> >

virtio-dev message