OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-dev message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [PATCH 4/4] virtio-iommu: Clarify hot-unplug behavior


On 2023/08/11 23:20, Jean-Philippe Brucker wrote:
On Fri, Aug 11, 2023 at 07:21:29AM +0900, Akihiko Odaki wrote:
On 2023/08/11 0:10, Jean-Philippe Brucker wrote:
On Fri, Aug 04, 2023 at 03:19:27PM +0900, Akihiko Odaki wrote:
On 2023/08/04 0:32, Jean-Philippe Brucker wrote:
Since it is not obvious what the virtio-iommu device or driver should do
when an endpoint is removed [1], add some guidance in the specification.
Follow the same principle as other (hardware) IOMMU devices: clearing
endpoint ID state on endpoint removal is the responsibility of the
driver, not the IOMMU device.

On most hardware IOMMU devices, the endpoint ID state is kept in tables
managed by the driver, so intuitively the driver should be updating the
tables on hot-unplug, so that a new endpoint plugged later with the same
ID does not inherit stale translations. Besides, hardware IOMMUs have no
knowledge of endpoint removal. It is less obvious for virtio-iommu
because endpoint states are managed with ATTACH/DETACH requests, and a
virtual platform could in theory update the endpoint state when it is
removed. Existing VMMs like QEMU don't do it, and rightly expect the
driver to manage endpoint ID state like with other IOMMUs. It is not
invalid for a VMM to clean up state on endpoint removal, but a driver
shouldn't count on it.

I think it's better to say it's invalid to detach on endpoint removal.

It's certainly not a clear cut choice and I'm still hesitating whether we
should allow, deny or let the VMM choose what to do.

However, it looks like crosvm already detaches the domain when an endpoint
is unplugged:

https://github.com/google/crosvm/blob/e8b2cd080e8174948563567d395c2a42416f2807/devices/src/virtio/iommu/sys/unix.rs#L68

If I understood correctly, this function is called on PCIe hot-unplug, and
endpoint_map represents the domain attached to an endpoint ID. So the
domain is detached on endpoint removal. If a DETACH request is sent
afterwards, it will still succeed (detach_endpoint() in
devices/src/virtio/iommu.rs returns detached=true, which causes DETACH to
succeed, even if the endpoint is not in endpoint_map). But I could be
wrong as I'm not familiar with crosvm or rust.

That does make the decision a little easier, because if we did
retroactively specify that detaching on removal is invalid, this code
would become a bug. But in my opinion crosvm isn't doing anything wrong
here. It feels valid and even good practice to clean up all state
associated to an endpoint when it is removed, to avoid leaks and stale
objects.

In this case I think it's creating use-after-free hazard. The guest believes
it is managing domains and assume a domain is available until it makes a
DETACH request for the last endpoint attached to the domain.

However, if the host automatically detaches an endpoint from a domain and if
the domain has no other endpoints, the domain will be gone and the domain ID
may be reused for another domain. If the guest makes a request with the
stale domain ID before it becomes aware that the device is unplugged, the
request will be performed on some arbitrary domain and may break things.

That's an excellent point, there is a race:

         VMM                        GUEST
      Remove ep 1
        detach domain 1
        destroy domain 1
                                 ATTACH ep 2 to domain 1
      Handle ATTACH
        create domain 1

      Notify that ep 1 is gone
                                 Handle notification, DETACH ep 1

The guest tries to attach ep 2 to domain 1, which succeeds. So the
guest expects that ep 2 is now able to access the mappings that were on
domain 1. But instead it's a new empty domain.

I think that's a good justification for adding a device normative
statement, something like:

   The device SHOULD only detach endpoints from domains when requested by
   the driver; either when handling ATTACH and DETACH requests or when
   performing a device reset.


By the way, crosvm's logic to detach endpoint on removal looks incorrect for
me. A domain may have several endpoints attached, but the code looks like
it's always destroying a domain whether there are other endpoints attached
to the domain. I'm adding Zide Chen, who wrote the code according to git
blame, and crosvm-dev@chromium.org to CC.

Link to this thread for more context:
20230803153238.541803-5-jean-philippe@linaro.org/">https://lore.kernel.org/virtio-dev/20230803153238.541803-5-jean-philippe@linaro.org/

I thought crosvm rejected attaching multiple endpoints to one domain but I
think I misread. Rejecting multiple attach would be a straightforward fix
(it's allowed by the spec), though it would prevent assigning endpoints
that cannot be isolated from each others by the hardware (the driver won't
attach those to different domains, if it's made aware that they should be
in the same IOMMU group, for example if they are on a conventional PCI
bus).

Now we figured out an endpoint should not be detached from a domain without a request from the driver anyway so the code to detach an endpoint can be simply removed.

Regards,
Akihiko Odaki


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]