[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [PATCH v4 1/1] Add suspend support for virtio PCI devices
On Mon, Feb 19, 2024 at 03:46:37PM +0900, David Stevens wrote: > On Fri, Feb 16, 2024 at 5:56âPM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > On Fri, Feb 16, 2024 at 05:24:32PM +0900, David Stevens wrote: > > > Add a virtio power management PCI capability to allow drivers to suspend > > > virtio PCI devices. This allows drivers to suspend devices at the virtio > > > level before suspending them at the PCI transport layer. This allows > > > drivers to do a two phase suspend, which prevents notifications from > > > being ignored or lost if interrupts are reconfigured at the PCI > > > transport layer immediately before or after the device is put into the > > > PCI PM D3 low power state. > > > --- > > > transport-pci.tex | 51 +++++++++++++++++++++++++++++++++++++++++++++++ > > > 1 file changed, 51 insertions(+) > > > > > > diff --git a/transport-pci.tex b/transport-pci.tex > > > index a5c6719ea871..ce77708a9b69 100644 > > > --- a/transport-pci.tex > > > +++ b/transport-pci.tex > > > @@ -188,6 +188,8 @@ \subsection{Virtio Structure PCI Capabilities}\label{sec:Virtio Transport Option > > > #define VIRTIO_PCI_CAP_SHARED_MEMORY_CFG 8 > > > /* Vendor-specific data */ > > > #define VIRTIO_PCI_CAP_VENDOR_CFG 9 > > > +/* Power management configuration */ > > > +#define VIRTIO_PCI_CAP_PM_CFG 10 > > > \end{lstlisting} > > > > > > Any other value is reserved for future use. > > > @@ -804,6 +806,55 @@ \subsubsection{PCI configuration access capability}\label{sec:Virtio Transport O > > > specified by some other Virtio Structure PCI Capability > > > of type other than \field{VIRTIO_PCI_CAP_PCI_CFG}. > > > > > > +\subsubsection{Power management capability}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Power management capability} > > > + > > > +The VIRTIO_PCI_CAP_PM_CFG capability refers to a single byte. The > > > +driver can write to the byte to set the power state of the device, > > > +and it can read from the byte to get the current power state of the > > > +device. > > > + > > > +The valid power states are: > > > + > > > +\begin{lstlisting} > > > +/* Device is operating normally */ > > > +#define VIRTIO_PM_STATE_ACTIVE 0 > > > +/* Device operation is suspended */ > > > +#define VIRTIO_PM_STATE_SUSPENDED 1 > > > +\end{lstlisting} > > > + > > > +The device power state has no effect when \field{device status} does > > > +not have the DRIVER_OK bit set or does have the DEVICE_NEEDS_RESET bit > > > +set. > > > > > > Given this is only after DRIVER_OK, wouldn't a feature bit + status bit > > make more sense? This will make it transport-independent and simplify > > discovery. > > A feature bit + status bit would work as well. I've posted some > questions on Zhu Lingshan's patch. Thanks! I like it that your patch is more specific but maybe something can be figure out to get the best of both worlds. > > > +\devicenormative{\paragraph}{Power management capability}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Power management capability} > > > + > > > +A device MUST maintain its state while suspended such that all driver > > > +visible state after resuming exactly matches driver visible state > > > +before suspending. > > > + > > > +A device MUST NOT send notifications while suspended. > > > + > > > +A device MAY operate on any buffers in its virtqueue while suspended. > > > > How is this reconsiled with state matching exactly? buffers are > > driver-visible ... > > Drivers can't atomically access buffers and the PM byte, so the device > modifying a buffer while suspended is indistinguishable from the > device modifying a buffer in the window between when it is resumed by > the driver and when the driver accesses the buffer. But you are right > that the wording is contradictory. Maybe the earlier requirement could > be better phrased as: > > A device MUST maintain its state while suspended such that after the > driver resumes the device, the driver can use the device as if it had > never been suspended in the first place. I think you wording is fine, just make it clear how is the contradiction resolved. E.g. exactly matches ... with the exception of using buffers available in one of the virtqueues. > > > +A device MUST set its power state to VIRTIO_PM_STATE_ACTIVE on reset. > > > + > > > +A device SHOULD take steps to minimize its resource consumption while > > > +suspended, although what this involves is specific to the particular > > > +device implementation. > > > + > > > +\drivernormative{\paragraph}{Power management capability}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Power management capability} > > > + > > > +A driver MUST NOT access a suspended device's BARs corresponding to > > > +any virtio structures, except for the power management byte. > > > + > > > +A driver MAY suspend a device that has buffers in one of its > > > +virtqueues, but it MUST NOT modify any such buffers while the device > > > +is suspended. > > > + > > > +A driver MUST read from the power management byte after writing to the > > > +byte to verify that the device successfully entered the target power > > > +state. > > > > Verify how? By checking the value returned? And what should it do with the value > > does not match? > > Yes, comparing the value returned to the one it just wrote. The three > options I can think of would be to abort the suspend, reset the > device, or retry. Retry only makes sense if suspend/resume might > succeed in the future. Well actually there is a reason to retry even without a timeout - has to do with pci rules which limit how quickly device has to respond to a read. So if you want to implement suspend in firmware and not be bound to any timing limits you need retry in software. > An API is really only retry-friendly if it has > a way to set a timeout, since the consumer is what should be deciding > how long to wait but can't make that decision without a timeout. > Personally, I would lean towards not allowing timeouts here, since > it's simpler. So maybe something like this: > > After the driver writes a new value to the PM byte, if the device can > transition to the requested state, then subsequent reads of the PM > state byte MUST block until the transition completes. If the device > cannot transition to the requested state, it MUST immediately return > the prior value of the PM state byte for subsequent reads of the PM > state byte. PCI has timing limitations on how long reads can block though. So that could be a reason to retry. > In that case, the only options are to abort the suspension or to reset > the device. That's really a policy decision of the driver, so I don't > know how it would fit into this spec. > > -David > > > -David
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]