[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [virtio-comment] [PATCH RFC] virtio-pci: new configuration layout
"Michael S. Tsirkin" <mst@redhat.com> writes: > On Tue, Aug 27, 2013 at 04:24:47PM +0930, Rusty Russell wrote: >> "Michael S. Tsirkin" <mst@redhat.com> writes: >> > This is the new configuration layout. >> > >> > Notes: >> > - Everything is LE >> > - There's a feature bit that means spec 1.0 compliant. >> > - Both devices and drivers can either require the 1.0 interface >> > or try to include compatibility support. The spec isn't forcing >> > this decision. >> >> Hmm, this kind includes other changes already proposed, like the LE >> change and the framing change. I think this conceptually splits nicely: >> >> 1) Feature bit 32 proposal. >> 2) Endian change. >> 3) Framing change. >> 4) PCI layout change. > > Right - they are mostly in different parts of the document. > I put it all together so it's easy to see how we intend to > handle the transition. > So is everyone OK with keeping this in a single patch? > >> > - I kept documentation of the legacy interface around, and added notes >> > on transition inline. They are in separate sections each clearly marked >> > "Legacy Interface" so we'll be able to separate them out >> > from the final document as necessary - for now I think it's easier >> > to keep it all together. >> >> Good thinking: most of us know the current spec so it's definitely >> clearer. And makes sure we're thinking about the transition. >> >> > Only virtio PCI has been converted. >> > Let's discuss this on the meeting tonight, once we figure out PCI >> > we can do something similar for MMIO and CCW. >> >> > @@ -137,6 +139,11 @@ Feature bits are allocated as follows: >> > 24 to 31: Feature bits reserved for extensions to the queue and >> > feature negotiation mechanisms >> > >> > + 32: Feature bit must be set for any device compliant with this >> > + revision of the specification, and acknowledged by all device drivers. >> > + >> > + 33 to 63: Feature bits reserved for future extensions >> > + >> > For example, feature bit 0 for a network device (i.e. Subsystem >> > Device ID 1) indicates that the device supports checksumming of >> > packets. >> >> Why stop at 63? If we go to a more decentralized feature-assignment >> model, we'll run through those very fast. > > Then we'll just document more, but driver needs to know where to stop > looking for features. Shouldn't it stop looking at the highest feature it understands? Yes, this makes a problem for diagnostics, but they can use a simple heuristic (stop at first word which is all zeroes). Or we need a register with an explicit limit. If the spec says there won't be features above 63, we'd need a revision to change that. >> I think we should stop abusing Revision IDs, and start using them >> to reflect device version changes as intended. >> >> We could reserve revision id 0 for legacy devices, however, which should >> work nicely. > > Hmm I am not sure I agree - what does it buy us that feature bits don't already supply? AFAICT it's a PCI question more than a virtio question. The use of revision IDs is left open in the PCI spec, but Microsoft's requirements here seem reasonable: http://msdn.microsoft.com/en-us/library/windows/hardware/gg463287.aspx > More concerns: > > We are using revision ID now exactly as was intended to disable old > drivers - it served us well for 0.X-1.X and would be as useful if we > ever have 1.X->2.0 transition. I think we'll use feature bit 64 for such a transition, now we don't have a feature bit limit. > Another worry with using revision numbering for features is that > it does not play well with downstreams. > E.g. RHEL might want to cherry-pick a feature without implementing > other features that happened to land in the same revision. > > Also Revision ID is only 8 bit - it's designed for hardware where > making a new revision is expensive. In software we'll run out of that > eventually. This is why I think we leave it open in the standard. If people want to use it as another way for them to implement driver workarounds, they can (probably a combination of device ID, rev id and features offered). >> > 2.4.1.2 PCI Device Layout >> > ------------------------- >> > >> > -To configure the device, we use the first I/O region of the PCI >> > -device. This contains a virtio header followed by a >> > -device-specific region. >> > +To configure the device, >> > +use I/O and/or memory regions and/or PCI configuration space of the PCI device. >> > +These contain the virtio header registers, the notification register, the >> > +ISR status register and device specific registers, as specified by Virtio >> > ++ Structure PCI Capabilities >> > + >> > +There may be different widths of accesses to the I/O region; the >> > +“natural” access method for each field must be >> > +used (i.e. 32-bit accesses for 32-bit fields, etc). >> > + >> > +PCI Device Configuration Layout includes the common configuration, >> > +ISR, notification and device specific configuration >> > +structures. >> > + >> > +Unless explicitly specified otherwise, all multi-byte fields are little-endian. >> > + >> > + >> > +2.4.1.2.1 Common configuration structure layout >> > +------------------------- >> > +Common configuration structure layout is documented below: >> > + >> > +struct virtio_pci_common_cfg { >> > + /* About the whole device. */ >> > + __le32 device_feature_select; /* read-write */ >> > + __le32 device_feature; /* read-only */ >> > + __le32 guest_feature_select; /* read-write */ >> > + __le32 guest_feature; /* read-write */ >> > + __le16 msix_config; /* read-write */ >> > + __le16 num_queues; /* read-only */ >> > + __u8 device_status; /* read-write */ >> > + __u8 unused1; >> > + >> > + /* About a specific virtqueue. */ >> > + __le16 queue_select; /* read-write */ >> > + __le16 queue_size; /* read-write, power of 2, or 0. */ >> > + __le16 queue_msix_vector; /* read-write */ >> > + __le16 queue_enable; /* read-write */ >> > + __le16 queue_notify_off; /* read-only */ >> > + __le64 queue_desc; /* read-write */ >> > + __le64 queue_avail; /* read-write */ >> > + __le64 queue_used; /* read-write */ >> > +}; >> > + >> > +device_feature_select >> > + >> > + Selects which Feature Bits does device_feature field refer to. >> > + Value 0x0 selects Feature Bits 0 to 31 >> > + Value 0x1 selects Feature Bits 32 to 63 >> > + All other values cause reads from device_feature to return 0. >> > + >> > +device_feature >> > + >> > + Used by Device to report Feature Bits to Driver. >> > + Device Feature Bits selected by device_feature_select. >> > + >> > +guest_feature_select >> > + >> > + Selects which Feature Bits does guest_feature field refer to. >> > + Value 0x0 selects Feature Bits 0 to 31 >> > + Value 0x1 selects Feature Bits 32 to 63 >> > + All other values cause writes to guest_feature to be ignored, >> > + and reads to return 0. >> > + >> > +guest_feature >> > + >> > + Used by Driver to acknowledge Feature Bits to Device. >> > + Guest Feature Bits selected by guest_feature_select. >> > + >> > +msix_config >> > + >> > + Configuration Vector for MSI-X. >> > + >> > +num_queues >> > + >> > + Specifies the maximum number of virtqueues supported by device. >> > + >> > +device_status >> > + >> > + Device Status field. >> > + >> > +queue_select >> > + >> > + Queue Select. Selects which virtqueue do other fields refer to. >> > + >> > +queue_size >> > + >> > + Queue Size. On reset, specifies the maximum queue size supported by >> > + the hypervisor. This can be modified by driver to reduce memory requirements. >> > + Set to 0 if this virtqueue is unused. >> > + >> > +queue_msix_vector >> > + >> > + Queue Vector for MSI-X. >> > + >> > +queue_enable >> > + >> > + Used to selectively prevent host from executing requests from this virtqueue. >> > + 1 - enabled; 0 - disabled >> > + >> > +queue_notify_off >> > + >> > + Used to calculate the offset from start of Notification structure at >> > + which this virtqueue is located. >> > + Note: this is *not* an offset in bytes. See notify_off_multiplier below. >> > + >> > +queue_desc >> > + >> > + Physical address of Descriptor Table. >> > + >> > +queue_avail >> > + >> > + Physical address of Available Ring. >> > + >> > +queue_used >> > + >> > + Physical address of Used Ring. >> > + >> > + >> > +2.4.1.2.2 ISR status structure layout >> > +------------------------- >> > +ISR status structure includes a single 8-bite ISR status field >> >> 8-bit > > Right :) > >> > + >> > +2.4.1.2.3 Notification structure layout >> > +------------------------- >> > +Notification structure is always a multiple of 2 bytes in size. >> > +It includes 2-byte Queue Notify fields for each virtqueue of >> > +the device. Note that multiple virtqueues can use the same >> > +Queue Notify field, if necessary. >> >> Hmm, maybe move this down, so you can have a section which starts with >> "If cfg_type is VIRTIO_PCI_CAP_NOTIFY_CFG" below? That would put it all >> together. > > so Move PCI Device Layout to within > PCI-specific Initialization And Device Operation? OK, maybe not. This is simple, we can always reshuffle after we've made all the changes we want. >> > + >> > +2.4.1.2.4 Device specific structure >> > +------------------------- >> > + >> > +Device specific structure is optional. >> > + >> > +2.4.1.2.5 Legacy Interfaces: A Note on PCI Device Layout >> > +------------------------- >> > + >> > +Transitional devices should present part of configuration >> > +registers in a legacy configuration structure in BAR0 in the first I/O >> > +region of the PCI device, as documented below. >> > >> > There may be different widths of accesses to the I/O region; the >> > “natural” access method for each field in the virtio header must be >> > -used (i.e. 32-bit accesses for 32-bit fields, etc), but the >> > +used (i.e. 32-bit accesses for 32-bit fields, etc), but >> > +When accessed through the legacy interface the >> > device-specific region can be accessed using any width accesses, and >> > should obtain the same results. >> > >> > Note that this is possible because while the virtio header is PCI >> > -(i.e. little) endian, the device-specific region is encoded in >> > -the native endian of the guest (where such distinction is >> > +(i.e. little) endian, when using the legacy interface the device-specific >> > +region is encoded in the native endian of the guest (where such distinction is >> > applicable). >> > >> > -2.4.1.2.1 PCI Device Virtio Header >> > ----------------------------------- >> > >> > -The virtio header looks as follows: >> > +When used through the legacy interface, the virtio header looks as follows: >> > >> > +------------++---------------------+---------------------+----------+--------+---------+---------+---------+--------+ >> > | Bits || 32 | 32 | 32 | 16 | 16 | 16 | 8 | 8 | >> > @@ -661,7 +905,6 @@ The virtio header looks as follows: >> > | || Features bits 0:31 | Features bits 0:31 | Address | Size | Select | Notify | Status | Status | >> > +------------++---------------------+---------------------+----------+--------+---------+---------+---------+--------+ >> > >> > - >> > If MSI-X is enabled for the device, two additional fields >> > immediately follow this header:[5] >> > >> > @@ -689,25 +932,154 @@ device-specific headers: >> > | || | >> > +------------++--------------------+ >> > >> > +Note that only Feature Bits 0 to 31 are accessible through the >> > +Legacy Interface. When used through the Legacy Interface, >> > +Transitional Devices must assume that Feature Bits 32 to 63 >> > +are not acknowledged by Driver. >> > + >> > + >> > 2.4.1.3 PCI-specific Initialization And Device Operation >> > -------------------------------------------------------- >> > >> > -The page size for a virtqueue on a PCI virtio device is defined as >> > -4096 bytes. >> > - >> > 2.4.1.3.1 Device Initialization >> > ------------------------------- >> > >> > -2.4.1.3.1.1 Queue Vector Configuration >> > +This documents PCI-specific steps executed during Device Initialization. >> > +As the first step, driver must detect device configuration layout >> > +to locate configuration fields in memory,I/O or configuration space of the >> > +device. >> > + >> > +2.4.1.3.1.1 Virtio Device Configuration Layout Detection >> > +------------------------------- >> > + >> > +As a prerequisite to device initialization, driver executes a >> > +PCI capability list scan, detecting virtio configuration layout using Virtio >> > +Structure PCI capabilities. >> > + >> > +Virtio Device Configuration Layout includes virtio configuration header, Notification >> > +and ISR Status and device configuration structures. >> > +Each structure can be mapped by a Base Address register (BAR) belonging to >> > +the function, located beginning at 10h in Configuration Space, >> > +or accessed though PCI configuration space. >> > + >> > +Actual location of each structure is specified using vendor-specific PCI capability located >> > +on capability list in PCI configuration space of the device. >> > +This virtio structure capability uses little-endian format; all bits are >> > +read-only: >> > + >> > +struct virtio_pci_cap { >> > + __u8 cap_vndr; /* Generic PCI field: PCI_CAP_ID_VNDR */ >> > + __u8 cap_next; /* Generic PCI field: next ptr. */ >> > + __u8 cap_len; /* Generic PCI field: capability length */ >> > + __u8 cfg_type; /* Identifies the structure. */ >> > + __u8 bar; /* Where to find it. */ >> > + __u8 padding[3];/* Pad to full dword. */ >> > + __le32 offset; /* Offset within bar. */ >> > + __le32 length; /* Length of the structure, in bytes. */ >> > +}; >> > + >> > +This structure can optionally followed by extra data, depending on >> > +other fields, as documented below. >> > + >> > +The fields are interpreted as follows: >> > + >> > +cap_vndr >> > + 0x09; Identifies a vendor-specific capability. >> > + >> > +cap_next >> > + Link to next capability in the capability list in the configuration space. >> > + >> > +cap_len >> > + Length of the capability structure, including the whole of >> > + struct virtio_pci_cap, and extra data if any. >> > + This length might include padding, or fields unused by the driver. >> > + >> > +cfg_type >> > + identifies the structure, according to the following table. >> > + >> > + /* Common configuration */ >> > + #define VIRTIO_PCI_CAP_COMMON_CFG 1 >> > + /* Notifications */ >> > + #define VIRTIO_PCI_CAP_NOTIFY_CFG 2 >> > + /* ISR Status */ >> > + #define VIRTIO_PCI_CAP_ISR_CFG 3 >> > + /* Device specific configuration */ >> > + #define VIRTIO_PCI_CAP_DEVICE_CFG 4 >> > + >> > + More than one capability can identify the same structure - this makes it >> > + possible for the device to expose multiple interfaces to drivers. The order of >> > + the capabilities in the capability list specifies the order of preference >> > + suggested by the device; drivers should use the first interface that they can >> > + support. For example, on some hypervisors, notifications using IO accesses are >> > + faster than memory accesses. In this case, hypervisor can expose two >> > + capabilities with cfg_type set to VIRTIO_PCI_CAP_NOTIFY_CFG: >> > + the first one addressing an I/O BAR, the second one addressing a memory BAR. >> > + Driver will use the I/O BAR if I/O resources are available, and fall back on >> > + memory BAR when I/O resources are unavailable. >> > + >> > +bar >> > + >> > + values 0x0 to 0x5 specify a Base Address register (BAR) belonging to >> > + the function located beginning at 10h in Configuration Space >> > + and used to map the structure into Memory or I/O Space. >> > + The BAR is permitted to be either 32-bit or 64-bit, it can map Memory Space >> > + or I/O Space. >> > + >> > + The value 0xF specifies that the structure is in PCI configuration space >> > + inline with this capability structure, following (not necessarily immediately) >> > + the length field. >> >> Why not immediately? >> Or how would the driver know where it is? > > It's at the offset. > > E.g. for notification we stick multiplier after length. > Further, we might extend virtio_pci_cap in the future, > and we don't want to move stuff around like we > had to with MSI-X. So, offset is described as "/* Offset within bar. */", but if bar is 0xF, it's offset within the config space? This adds complexity: do we really want this? Cheers, Rusty. PS. Sorry for delay, took some time off.
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]