OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [virtio] [OASIS Issue Tracker] Created: (VIRTIO-35) race condition with multi-dword config accesses


On Fri, Oct 04, 2013 at 01:06:43PM +0930, Rusty Russell wrote:
> "Michael S. Tsirkin" <mst@redhat.com> writes:
> > On Tue, Sep 24, 2013 at 08:18:20PM +0930, Rusty Russell wrote:
> >> I don't think qemu supports changing a disk underneath a device anyway?
> >
> > QEMU supports resizing I think.
> >
> >> And in practice, I'm not sure it would ever change fast enough to
> >> trigger this race...
> >
> > Well it's easy to make guest very slow :)
> >
> >> So, should we try to fix it?
> >> Rusty.
> >
> > Not really sure but I thought I'd put it on the table.
> 
> OK, I got ambitious!  How's this look?
> 
> Cheers,
> Rusty.
> 
> commit 9aebe9c69be1461dd0110dfc61459a4ac7bda6f9
> Author: Rusty Russell <rusty@au1.ibm.com>
> Date:   Fri Oct 4 13:01:35 2013 +0930
> 
>     Configuration (read) atomicity.
>     
>     Aka issue VIRTIO-35.
>     
>     This is solved per transport:
>     1) PCI: use the 8 bit reserved field.
>     	Assume that if you really change that fast, you'll do it lazily on
>     	config space read.

I guess it's possible ... but isn't it easier to just make it a 32 bit field
than add this lazy update logic to devices?

>     2) MMIO: use offset 0x18.
>     	Qemu returns 0 when they read here anyway, so it's almost 100%
>     	backwards compatible (legacy drivers should still multi-read
>     	for any transport).
>     3) CCW: no transport changes.
>     	They always read/write the entire thing.  This just shows that
>     	Cornelia is smarter than I am.
>     
>     Signed-off-by: Rusty Russell <rusty@au1.ibm.com>
> 
> diff --git a/virtio-v1.0-wd01-part1-specification.txt b/virtio-v1.0-wd01-part1-specification.txt
> index b5a5835..2945db4 100644
> --- a/virtio-v1.0-wd01-part1-specification.txt
> +++ b/virtio-v1.0-wd01-part1-specification.txt
> @@ -201,11 +201,36 @@ Interface' in the section title.
>  -------------------------
>  
>  Configuration space is generally used for rarely-changing or
> -initialization-time parameters.
> +initialization-time parameters.  Drivers must not assume reads from
> +fields greater than 32 bits wide are atomic, nor or reads from
> +multiple fields.
> +
> +Each transport provides a generation count for the configuration
> +space, which must change whenever there is a possibility that two

MUST?

> +accesses to the configuration space can see different versions of that
> +space.
> +
> +Thus drivers should read configuration space fields like so:

SHOULD?

> +
> +	u32 before, after;
> +	do {
> +		before = get_config_generation(device);
> +		// read config entry/entries.
> +		after = get_config_generation(device);
> +	} while (after != before);

BTW this implies reads must never have side
effects (PCI ISR is the only exception, isn't it?).

>  
>  Note that this space is generally the guest's native endian,
>  rather than PCI's little-endian.
>  
> +2.1.3.1.  Legacy Interface: Configuration Space
> +-------------------------
> +
> +Legacy devices did not have a configuration generation field, thus are
> +susceptible to race conditions if configuration is updated.  This
> +effects the block capacity and network mac fields; best practice is to
> +read these fields multiple times until two reads generate a consistent
> +result.
> +
>  2.1.4. Virtqueues
>  ----------------
>  
> @@ -768,7 +793,7 @@ struct virtio_pci_common_cfg {
>  	__le16 msix_config;		/* read-write */
>  	__le16 num_queues;		/* read-only */
>  	__u8 device_status;		/* read-write */
> -	__u8 unused1;
> +	__u8 config_generation;		/* read-only */
>  
>  	/* About a specific virtqueue. */
>  	__le16 queue_select;		/* read-write */
> @@ -820,6 +845,14 @@ device_status
>  	Device Status field. Writing 0 into this field resets the
>  	device.
>  
> +config_generation
> +
> +	Configuration atomicity value.  Changes every time the
> +	configuration noticeably changes.  This means the device may
> +	only change the value after a configuration read operation,
> +	but it must change if there is any risk of a device seeing an
> +	inconsistent configuration state.
> +
>  queue_select
>  
>  	Queue Select. Selects which virtqueue do other fields refer to.
> @@ -941,6 +974,9 @@ Legacy Interface. When used through the Legacy Interface,
>  Transitional Devices must assume that Feature Bits 32 to 63
>  are not acknowledged by Driver.
>  
> +As legacy devices had no configuration generation field, see "2.1.3.1.
> +Legacy Interface: Configuration Space" for workarounds.
> +
>  2.3.1.3. PCI-specific Initialization And Device Operation
>  --------------------------------------------------------
>  
> @@ -1281,6 +1317,13 @@ configuration space. The following list presents their layout:
>    must write a value to the HostFeaturesSel register before
>    reading from the HostFeatures register.
>  
> +• 0x018 | R | ConfigGeneration
> + Configuration atomicity value.
> + Changes every time the configuration noticeably changes.  This means the
> + device may only change the value after a configuration read operation,
> + but it must change if there is any risk of a device seeing an inconsistent
> + configuration state.
> +
>  • 0x020 | W | GuestFeatures
>   Flags representing device features understood and activated by
>    the driver.
> @@ -1396,6 +1439,12 @@ Writing to registers described as “R” and reading from
>  registers described as “W” is not permitted and can cause
>  undefined behavior.
>  
> +2.3.2.2.1. Legacy Interface: MMIO Device Layout
> +--------------------------
> +The ConfigGeneration field does not exist in legacy devices; fortunately
> +it would return 0 if accessed.  Nonetheless, the workarounds in
> +"2.1.3.1. Legacy Interface: Configuration Space" should still be used.
> +
>  100.3.2.2.1.  Virtqueue Layout
>  ------------------------------
>  
> @@ -1631,7 +1680,9 @@ For changing configuration information, the guest may use
>  CCW_CMD_WRITE_CONF, specifying the guest memory for the host to
>  read from.
>  
> -In both cases, the complete configuration space is transmitted.
> +In both cases, the complete configuration space is transmitted.  This
> +allows the guest to compare the new configuration space with the old
> +version, and keep a generation count internally whenever it changes.
>  
>  2.3.3.2.5. Setting Up Indicators
>  --------------------------------
> @@ -3415,12 +3466,15 @@ transmit output.
>  ---------------------------------------
>  
>  Configuration space should only be used for initialization-time
> -parameters.  It is a limited resource with no synchronization, so for
> -most uses it is better to use a virtqueue to update configuration
> -information (the network device does this for filtering,
> +parameters.  It is a limited resource with no synchronization between
> +writable fields, so for most uses it is better to use a virtqueue to update
> +configuration information (the network device does this for filtering,
>  otherwise the table in the config space could potentially be very
>  large).
>  
> +Devices must not assume that configuration fields over 32 bits wide
> +are atomically writable.
> +
>  2.7.3. What Device Number?
>  --------------------------
>  


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]