[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [virtio-comment] comment on pci section
On Mon, Feb 03, 2014 at 11:36:20AM +1030, Rusty Russell wrote: > "Michael S. Tsirkin" <mst@redhat.com> writes: > > On Fri, Jan 31, 2014 at 01:10:30PM +1030, Rusty Russell wrote: > >> "Michael S. Tsirkin" <mst@redhat.com> writes: > >> > On Thu, Jan 30, 2014 at 10:11:14AM +1030, Rusty Russell wrote: > >> >> "Michael S. Tsirkin" <mst@redhat.com> writes: > >> >> > On Wed, Jan 29, 2014 at 04:38:58PM +1030, Rusty Russell wrote: > >> >> >> "Michael S. Tsirkin" <mst@redhat.com> writes: > >> >> >> > Two comments received from a co-worker: > >> >> >> > > >> >> >> > 3.1 "The device MUST NOT consume buffers before DRIVER_OK ..." - I > >> >> >> > personally > >> >> >> > would like to see also something like: "device must not generate > >> >> >> > interrupts if > >> >> >> > DRIVER_OK is not set” to better handle power management transitions. > >> >> >> > >> >> >> That addition seems to make sense, but I don't understand the reference > >> >> >> to power management transitions? > >> >> > > >> >> > Well - ATM we do things like suspend by device reset and re-initialization, > >> >> > right? I'm guessing during e.g. resume it's important not to get > >> >> > interrupts too early. I'll check. > >> >> > > >> >> > Hmm actually we have a chapter about initialization but not cleanup. > >> >> > Might be a good addition. > >> >> > >> >> Maybe, but it would only need to say to reset the device and you can > >> >> remove buffers safely. > >> >> > >> >> Looking at the current Linux block driver, it drives the device before > >> >> setting DRIVER_OK (add_disk will do this to scan partitions). Should we > >> >> make the rule that we can driver the device after FEATURES_OK instead? > >> >> In which case, perhaps we should just combine DRIVER_OK and FEATURES_OK > >> >> into one bit. > >> > > >> > Hmm we discussed this already. > >> > There are two distinct points: > >> > > >> > 1. driver has finalized device features (FEATURES_OK) > >> > > >> > after this point we start configuring the device: > >> > VQ setup etc > >> > > >> > 2. driver adds buffers, device consumes them (DRIVER_OK) > >> > >> Yes, I think I agree. On a similar note, we should make it clear that > >> device-specific configuration fields *can* be read before FEATURES_OK. > >> ie. if a device has a configuration field 'foo' which is only present if > >> FEATURE_FOO, and it offers FEATURE_FOO, the driver can read 'foo' > >> before deciding whether to accept FEATURE_FOO. > >> > >> >> > Also, I note that QEMU implemented master abort bit by > >> >> > clearing DRIVER_OK - AFAIK useful to stop device in a > >> >> > standard way if driver malfunctions. > >> >> > > >> >> > Also with QEMU you can clear DRIVER_OK and this stops the device. I > >> >> > wonder if someone uses this. > >> >> > >> >> The host should not change status bits, they should use a different > >> >> field. This isn't explicitly stated in the spec, but it's a bit weird. > >> > > >> > Well - device already clears FEATURES_OK on configuration failure. > >> > It seems reasonable to clear DRIVER_OK to internal device failure, no? > >> > >> The device refuses to accept the FEATURES_OK write, which is subtly > >> different from clearing the bit IMHO. > >> > >> The status bits are for the driver to advertise its status, not the > >> device. > > > > > > I'd like to explain. > > What we are trying to implement is PCI bus master bit. > > Rules: > > - It must be set before DRIVER_OK. > > - If it set/cleared any number of time before DRIVER_OK, device will be > > livea after DRIVER_OK > > - If it is cleared when driver sets DRIVER_OK, or if > > it is cleared after driver sets DRIVER_OK, device stops > > being live, you must reset to make it live again > > I'm not sure what the semantics of the PCI bus master bit are on normal > devices. I assume if you turn it off, the device stops working? Stop being a bus master: doing writes (including msi) and reads. > Why > would we insist on a reset to set it again? > > Should we leave this interaction as undefined? I don't exactly understand. OS-es clear bus master to disable devices in a device independent way. Afterwards, reset must bring the device back to a functional state (with bus master disabled) and allow driver to re-attach. It does not have to be the only way but in practice it is. > >> BTW, the struct virtio_pci_cfg_cap seems weird: why do you trigger on > >> offset read/write, rather than triggering on the actual pci_cfg_data? > >> Or am I misreading the spec? > >> > >> Thanks, > >> Rusty. > > > > This is not what was intended. > > I see what gives you this impression: > > > > After this field is written by driver, the first length bytes in > > pci_cfg_data are written at the selected offset > > in the selected BAR. > > When this field is read by driver, length bytes at the selected offset > > in the selected BAR are read into > > pci_cfg_data. > > > > "this" should be replaced by pci_cfg_data. > > Much better, it's just a simple window. Not exactly. There are corner cases, see below: > How's this? > > ==To access a device region, the driver writes into the capability > structure (ie. within the PCI configuration space) as follows: > > \begin{itemize} > \item The driver sets the BAR to access by writing to the cap.bar field. > > \item The driver sets the size of the access by writing 1, 2 or 4 to > the cap.length field. > > \item The driver sets the offset within the BAR by writing to the > cap.offset field. The driver MUST NOT write an offset which is not > a multiple of cap.length (ie. all accesses must be aligned). > \end{itemsize} s/itemsize/itemize/ :) > > At that point, the pci_cfg_data field will provide a window of size > cap.length into the given cap.bar at offset cap.offset: writes will > have the same effect as writes into the BAR, and reads will have the > same effect as reads. better "same effect and return same value as reads from the BAR" > > The driver MUST perform reads/writes from/to pci_cfg_data of the same > width as given by cap.length. I strongly think we should allow reading a different number of bytes, for debugging. For example if you read some register with setpci it's nice not to have the device crash because access width did not match length : ) cat /sys/bus/pci/devices/0000\:00\:00.0/config is also a nice troubleshooting tool, and that one might read configuration byte by byte. That's why I wrote it the way I did: I wanted each read to read exactly length bytes into pci_cfg_data, whatever the access size. So how about we take the itemize above, and then add something like the paragraph that I had: At that point, the pci_cfg_data field will provide a window of size cap.length into the given cap.bar at offset cap.offset: When pci_cfg_data field is written by driver, device MUST write the first length bytes of pci_cfg_data at the selected offset in the selected BAR. When pci_cfg_data field is read by driver, device MUST read length bytes at the selected offset in the selected BAR into the first bytes of pci_cfg_data. Also, I started thinking we should specify whether this gateway access to a disabled BAR should be allowed. It would allow using device before bus enumeration, so useful if we ever want a system device like a pci host based on virtio. If yes that would mean access would have a different effect from BAR access. > == > (I assumed the last requirement). > > Thanks, > Rusty.
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]