OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

# virtio-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: Re: [virtio-comment] comment on pci section

• From: "Michael S. Tsirkin" <mst@redhat.com>
• To: Rusty Russell <rusty@au1.ibm.com>
• Date: Mon, 3 Feb 2014 10:36:12 +0200

On Mon, Feb 03, 2014 at 11:36:20AM +1030, Rusty Russell wrote:
> "Michael S. Tsirkin" <mst@redhat.com> writes:
> > On Fri, Jan 31, 2014 at 01:10:30PM +1030, Rusty Russell wrote:
> >> "Michael S. Tsirkin" <mst@redhat.com> writes:
> >> > On Thu, Jan 30, 2014 at 10:11:14AM +1030, Rusty Russell wrote:
> >> >> "Michael S. Tsirkin" <mst@redhat.com> writes:
> >> >> > On Wed, Jan 29, 2014 at 04:38:58PM +1030, Rusty Russell wrote:
> >> >> >> "Michael S. Tsirkin" <mst@redhat.com> writes:
> >> >> >> >
> >> >> >> > 3.1 "The device MUST NOT consume buffers before DRIVER_OK ..." - I
> >> >> >> > personally
> >> >> >> > would like to see also something like:  "device must not generate
> >> >> >> > interrupts if
> >> >> >> > DRIVER_OK is not set” to better handle power management transitions.
> >> >> >>
> >> >> >> That addition seems to make sense, but I don't understand the reference
> >> >> >> to power management transitions?
> >> >> >
> >> >> > Well - ATM we do things like suspend by device reset and re-initialization,
> >> >> > right? I'm guessing during e.g. resume it's important not to get
> >> >> > interrupts too early. I'll check.
> >> >> >
> >> >> > Hmm actually we have a chapter about initialization but not cleanup.
> >> >> > Might be a good addition.
> >> >>
> >> >> Maybe, but it would only need to say to reset the device and you can
> >> >> remove buffers safely.
> >> >>
> >> >> Looking at the current Linux block driver, it drives the device before
> >> >> setting DRIVER_OK (add_disk will do this to scan partitions).  Should we
> >> >> make the rule that we can driver the device after FEATURES_OK instead?
> >> >> In which case, perhaps we should just combine DRIVER_OK and FEATURES_OK
> >> >> into one bit.
> >> >
> >> > Hmm we discussed this already.
> >> > There are two distinct points:
> >> >
> >> > 1. driver has finalized device features (FEATURES_OK)
> >> >
> >> > after this point we start configuring the device:
> >> > VQ setup etc
> >> >
> >> > 2. driver adds buffers, device consumes them (DRIVER_OK)
> >>
> >> Yes, I think I agree.  On a similar note, we should make it clear that
> >> device-specific configuration fields *can* be read before FEATURES_OK.
> >> ie. if a device has a configuration field 'foo' which is only present if
> >> FEATURE_FOO, and it offers FEATURE_FOO, the driver can read 'foo'
> >> before deciding whether to accept FEATURE_FOO.
> >>
> >> >> > Also, I note that QEMU implemented master abort bit by
> >> >> > clearing DRIVER_OK - AFAIK useful to stop device in a
> >> >> > standard way if driver malfunctions.
> >> >> >
> >> >> > Also with QEMU you can clear DRIVER_OK and this stops the device. I
> >> >> > wonder if someone uses this.
> >> >>
> >> >> The host should not change status bits, they should use a different
> >> >> field.  This isn't explicitly stated in the spec, but it's a bit weird.
> >> >
> >> > Well - device already clears FEATURES_OK on configuration failure.
> >> > It seems reasonable to clear DRIVER_OK to internal device failure, no?
> >>
> >> The device refuses to accept the FEATURES_OK write, which is subtly
> >> different from clearing the bit IMHO.
> >>
> >> The status bits are for the driver to advertise its status, not the
> >> device.
> >
> >
> > I'd like to explain.
> > What we are trying to implement is PCI bus master bit.
> > Rules:
> > - It must be set before DRIVER_OK.
> > - If it set/cleared any number of time before DRIVER_OK, device will be
> >   livea after DRIVER_OK
> > - If it is cleared when driver sets DRIVER_OK, or if
> >   it is cleared after driver sets DRIVER_OK, device stops
> >   being live, you must reset to make it live again
>
> I'm not sure what the semantics of the PCI bus master bit are on normal
> devices.  I assume if you turn it off, the device stops working?

Stop being a bus master: doing writes (including msi) and reads.

>  Why
> would we insist on a reset to set it again?
>
> Should we leave this interaction as undefined?

I don't exactly understand.

OS-es clear bus master to disable devices in
a device independent way.
Afterwards, reset must bring the device back
to a functional state (with bus master disabled)
and allow driver to re-attach.
It does not have to be the only way but in practice it is.

> >> BTW, the struct virtio_pci_cfg_cap seems weird: why do you trigger on
> >> offset read/write, rather than triggering on the actual pci_cfg_data?
> >> Or am I misreading the spec?
> >>
> >> Thanks,
> >> Rusty.
> >
> > This is not what was intended.
> > I see what gives you this impression:
> >
> > After this field is written by driver, the first length bytes in
> > pci_cfg_data are written at the selected offset
> > in the selected BAR.
> > When this field is read by driver, length bytes at the selected offset
> > in the selected BAR are read into
> > pci_cfg_data.
> >
> > "this" should be replaced by pci_cfg_data.
>
> Much better, it's just a simple window.

Not exactly. There are corner cases, see below:

> How's this?
>
> ==To access a device region, the driver writes into the capability
> structure (ie. within the PCI configuration space) as follows:
>
> \begin{itemize}
> \item The driver sets the BAR to access by writing to the cap.bar field.
>
> \item The driver sets the size of the access by writing 1, 2 or 4 to
>   the cap.length field.
>
> \item The driver sets the offset within the BAR by writing to the
>   cap.offset field.  The driver MUST NOT write an offset which is not
>   a multiple of cap.length (ie. all accesses must be aligned).
> \end{itemsize}

s/itemsize/itemize/ :)

>
> At that point, the pci_cfg_data field will provide a window of size
> cap.length into the given cap.bar at offset cap.offset: writes will
> have the same effect as writes into the BAR, and reads will have the

better "same effect and return same value as reads from the BAR"

>
> The driver MUST perform reads/writes from/to pci_cfg_data of the same
> width as given by cap.length.

I strongly think we should allow reading a different number of bytes,
for debugging.

For example if you read some register with setpci
it's nice not to have the device crash because access width
did not match length : )
cat /sys/bus/pci/devices/0000\:00\:00.0/config
is also a nice troubleshooting tool, and that
one might read configuration byte by byte.

That's why I wrote it the way I did: I wanted each read
to read exactly length bytes into
pci_cfg_data, whatever the access size.

something like the paragraph that I had:

At that point, the pci_cfg_data field will provide a window of size
cap.length into the given cap.bar at offset cap.offset:
When pci_cfg_data field is written by driver,
device MUST write the first length bytes of pci_cfg_data at the selected
offset in the selected BAR.
When pci_cfg_data field is read by driver,
device MUST read length bytes at the selected offset
in the selected BAR into the first bytes of pci_cfg_data.

Also, I started thinking we should specify whether this gateway access
to a disabled BAR should be allowed.  It would allow using device before
bus enumeration, so useful if we ever want a system device like a pci
host based on virtio.  If yes that would mean access would have a
different effect from BAR access.

> ==
> (I assumed the last requirement).
>
> Thanks,
> Rusty.


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]