[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [virtio-dev] [PATCH RESEND v4 1/1] Add virtio-iommu device specification
On 25.11.19 13:35, Jean-Philippe Brucker wrote:
Hi Jan, On Mon, Nov 25, 2019 at 08:30:29AM +0100, Jan Kiszka wrote:What's the impact of a fault on the device(s) under the IOMMU regime? Can they recover?Are you asking about what happens to the endpoints when the virtio-iommu encounters an internal error? Or what happens to the endpoints if their DMA transactions fails translation? I think they are both equivalent to "what happens when the endpoint's memory transaction aborts?". The answer to that depends on the bus and endpoint, and is out of scope. The virtio-iommu spec could state that in those cases, we abort the memory transaction, but it's too vague since we don't know the specifics of the bus, and it isn't necessarily true (see VT-d and SMMU below).
I'm interested in both cases in how to recover the endpoints - of course also the virtual IOMMU, but that seems clear - from such errors. Can we describe a procedure that will always lead to working setup again, even if there might be simpler ways on certain setups? I mean, the answer shouldn't be "reset the host"...
Or will they get DEVICE_NEEDS_RESET as well?If the endpoint is virtio, then the behavior upon DMA fault should be specified by the virtio transport, because it could happen without an IOMMU (e.g. trying to access a physical address that isn't mapped to RAM or MMIO), or with a VT-d emulation for example. But it's not necessarily virtio. It can be a hardware passed-through endpoint, in which case the abort behavior depends on the physical IOMMU, which virtio-iommu doesn't know anything about, in addition to the physical bus and endpoint. I also wouldn't state that the whole device (or function, though we're not necessarily PCI) needs reset. It might be possible for some devices to only stop the faulting queue and leave the others running, to avoid disturbing the rest of the system.
As I said above: There should be at least one know-to-work path, and a way to signal better options when they are available in a concrete setup.
With PCI device behind real IOMMUs, it's normal that they need a reset after having caused a fault. I'm not sure if this is described in the related specs for them, but it should be clarify for the virtual IOMMU. But this can be done on top, IMHO.The device behaviour is generally not specified. However their spec can say something about the bus: * For Intel VT-d see 7.2 and 7.2.1 (Non-Recoverable Address Translation Faults), where the spec provides various implementation examples. "Requests that encounter non-recoverable address translation faults are aborted by the remapping hardware, and typically require a reset of the device (such as through a function-level-reset) to recover and re-initialize the device to put it back into service." So could be aborted, but as stated later in 7.2.1, can also be redirected to a catch-all memory location. * For Arm SMMU, the host driver can specify for each context whether the SMMU should return an abort (Slave Error on the AMBA bus) or not (read-zero, write-ignore). The spec also says "The behavior of the client device after termination is specific to the device." (3.12.1 Terminate model) * For AMD IOMMU, "when the IOMMU detects an I/O page fault, it target aborts the faulting request." and "the IOMMU sets the legacy PCI Signaled Target Abort bit, if applicable" (22.214.171.124 I/O Page Faults). I believe the equivalent for the PCIe bus is a Completer Abort response. They can specify the behaviour with some precision, because they also specify how the IOMMU is integrated with the system. We don't have this luxury, because if the virtio-iommu is just a proxy for a physical IOMMU, we don't know how aborts are configured, and the bus may be a variant of PCI, AMBA or something else.
Again: I'm on a virtual machine, my device just ran into a fault - how can I find out what I'm supposed to do to get it working again, in its driver and possibly also the virtio-iommu driver?
Jan -- Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux