virtio-comment message

Subject: Re: [virtio-comment] comment on pci section
From: Rusty Russell <rusty@au1.ibm.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Date: Mon, 03 Feb 2014 11:36:20 +1030
"Michael S. Tsirkin" <mst@redhat.com> writes:
> On Fri, Jan 31, 2014 at 01:10:30PM +1030, Rusty Russell wrote:
>> "Michael S. Tsirkin" <mst@redhat.com> writes:
>> > On Thu, Jan 30, 2014 at 10:11:14AM +1030, Rusty Russell wrote:
>> >> "Michael S. Tsirkin" <mst@redhat.com> writes:
>> >> > On Wed, Jan 29, 2014 at 04:38:58PM +1030, Rusty Russell wrote:
>> >> >> "Michael S. Tsirkin" <mst@redhat.com> writes:
>> >> >> > Two comments received from a co-worker:
>> >> >> >
>> >> >> > 3.1 "The device MUST NOT consume buffers before DRIVER_OK ..." - I
>> >> >> > personally
>> >> >> > would like to see also something like:  "device must not generate
>> >> >> > interrupts if
>> >> >> > DRIVER_OK is not set” to better handle power management transitions.
>> >> >> 
>> >> >> That addition seems to make sense, but I don't understand the reference
>> >> >> to power management transitions?
>> >> >
>> >> > Well - ATM we do things like suspend by device reset and re-initialization,
>> >> > right? I'm guessing during e.g. resume it's important not to get
>> >> > interrupts too early. I'll check.
>> >> >
>> >> > Hmm actually we have a chapter about initialization but not cleanup.
>> >> > Might be a good addition.
>> >> 
>> >> Maybe, but it would only need to say to reset the device and you can
>> >> remove buffers safely.
>> >> 
>> >> Looking at the current Linux block driver, it drives the device before
>> >> setting DRIVER_OK (add_disk will do this to scan partitions).  Should we
>> >> make the rule that we can driver the device after FEATURES_OK instead?
>> >> In which case, perhaps we should just combine DRIVER_OK and FEATURES_OK
>> >> into one bit.
>> >
>> > Hmm we discussed this already.
>> > There are two distinct points:
>> >
>> > 1. driver has finalized device features (FEATURES_OK)
>> >
>> > after this point we start configuring the device:
>> > VQ setup etc
>> >
>> > 2. driver adds buffers, device consumes them (DRIVER_OK)
>> 
>> Yes, I think I agree.  On a similar note, we should make it clear that
>> device-specific configuration fields *can* be read before FEATURES_OK.
>> ie. if a device has a configuration field 'foo' which is only present if
>> FEATURE_FOO, and it offers FEATURE_FOO, the driver can read 'foo'
>> before deciding whether to accept FEATURE_FOO.
>> 
>> >> > Also, I note that QEMU implemented master abort bit by
>> >> > clearing DRIVER_OK - AFAIK useful to stop device in a
>> >> > standard way if driver malfunctions.
>> >> >
>> >> > Also with QEMU you can clear DRIVER_OK and this stops the device. I
>> >> > wonder if someone uses this.
>> >> 
>> >> The host should not change status bits, they should use a different
>> >> field.  This isn't explicitly stated in the spec, but it's a bit weird.
>> >
>> > Well - device already clears FEATURES_OK on configuration failure.
>> > It seems reasonable to clear DRIVER_OK to internal device failure, no?
>> 
>> The device refuses to accept the FEATURES_OK write, which is subtly
>> different from clearing the bit IMHO.
>> 
>> The status bits are for the driver to advertise its status, not the
>> device.
>
>
> I'd like to explain.
> What we are trying to implement is PCI bus master bit.
> Rules:
> - It must be set before DRIVER_OK.
> - If it set/cleared any number of time before DRIVER_OK, device will be
>   livea after DRIVER_OK
> - If it is cleared when driver sets DRIVER_OK, or if
>   it is cleared after driver sets DRIVER_OK, device stops
>   being live, you must reset to make it live again

I'm not sure what the semantics of the PCI bus master bit are on normal
devices.  I assume if you turn it off, the device stops working?  Why
would we insist on a reset to set it again?

Should we leave this interaction as undefined?

>> BTW, the struct virtio_pci_cfg_cap seems weird: why do you trigger on
>> offset read/write, rather than triggering on the actual pci_cfg_data?
>> Or am I misreading the spec?
>> 
>> Thanks,
>> Rusty.
>
> This is not what was intended.
> I see what gives you this impression:
>
> After this field is written by driver, the first length bytes in
> pci_cfg_data are written at the selected offset
> in the selected BAR.
> When this field is read by driver, length bytes at the selected offset
> in the selected BAR are read into
> pci_cfg_data.
>
> "this" should be replaced by pci_cfg_data.

Much better, it's just a simple window.

How's this?

==To access a device region, the driver writes into the capability
structure (ie. within the PCI configuration space) as follows:

\begin{itemize}
\item The driver sets the BAR to access by writing to the cap.bar field.

\item The driver sets the size of the access by writing 1, 2 or 4 to
  the cap.length field.

\item The driver sets the offset within the BAR by writing to the
  cap.offset field.  The driver MUST NOT write an offset which is not
  a multiple of cap.length (ie. all accesses must be aligned).
\end{itemsize}

At that point, the pci_cfg_data field will provide a window of size
cap.length into the given cap.bar at offset cap.offset: writes will
have the same effect as writes into the BAR, and reads will have the
same effect as reads.

The driver MUST perform reads/writes from/to pci_cfg_data of the same
width as given by cap.length.
==
(I assumed the last requirement).

Thanks,
Rusty.
Follow-Ups:
- Re: [virtio-comment] comment on pci section
  - From: "Michael S. Tsirkin" <mst@redhat.com>
References:
- Re: [virtio-comment] comment on pci section
  - From: "Michael S. Tsirkin" <mst@redhat.com>