OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-dev message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [virtio-dev] Device-to-driver notification management for hardware implementations


On Thu, Nov 08, 2018 at 12:16:28PM +0000, David Riddoch wrote:
> On 05/11/18 08:53, Stefan Hajnoczi wrote:
> > On Wed, Sep 12, 2018 at 05:41:07PM +0100, David Riddoch wrote:
> > > The VIRTIO_F_NOTIFICATION_DATA feature helps hardware implementations by
> > > telling them directly how many descriptors are available in the ring, saving
> > > speculative reads.  There is a related issue with interrupts: As things
> > > stand, after writing 'used' descriptors into the ring (packed or split) the
> > > device has to read the event idx to determine whether a device-to-driver
> > > notification (interrupt) is needed.
> > > 
> > > It would be very helpful to a hardware implementation if the event idx could
> > > be written directly to the device.  Has anyone proposed an extension to
> > > address this already?  (I have some thoughts on how to do this if we don't
> > > have a proposal yet).
> > Hi David,
> > Has there been any further discussion on this topic?  It seems like a
> > useful feature to have in VIRTIO.
> Hi Stefan,
> 
> Thanks for the nudge, no further discussion.  So here are my thoughts:
> 
> To provide context here is an outline of how drivers typically process
> completions from a device.  This applies to both para-virtualised and real
> hardware.  (I think we only need to worry about packed-ring).
> 
>    disable_cb();
>  again:
>    while( budget_left && desc_is_used() )
>      handle_used();
>    if( budget_left &&enable_cb() )
>      goto again;
> 
> The budget stuff relates to napi polling, and isn't always there. But if it
> is, and budget is exhausted, then an outer loop will invoke this code again
> and so we don't want an interrupt.
> 
> In our current packed-ring design enable_cb() does roughly this:
> 
>    enable_cb_packed() {
>      guest_mem->event_suppress = last_used_idx;
>      virtio_mb();
>      return desc_is_used();
>    }
> 
> This tells the device to raise an interrupt when moving used_idx over the
> given idx.  (And you can set that idx to a position further in the ring if
> you're not interested in getting woken until a bunch of descriptors have
> been consumed).
> 
> This design is unpleasant for hardware devices, because it forces them to
> issue a read over the PCIe bus after writing used descriptors, in order to
> determine whether an interrupt is needed. Such a read adds complexity, adds
> overhead to the PCIe bus and memory system, and adds latency.
> 
> In hardware devices interrupts are typically enabled like this:
> 
>    enable_cb_hw() {
>      writel(last_used_idx, &dev_bar->int_en_doorbell);
>      return 0;
>    }
> 
> This tells the device to raise an interrupt once the descriptor at the given
> index has been used.  If it has already been used, then an interrupt is
> raised immediately, else the interrupt is enabled and raised after the used
> descriptor has been written.  The race condition is handled without having
> to check the ring again.  Note that interrupts are fire-once: For each write
> to int_en_doorbell you get a single interrupt, and disable_cb() is therefore
> a no-op.
> 
> When you emulate a device using this model, you get a vmexit every time
> int_en_doorbell is written, which is expensive.  Therefore this approach is
> not desirable when emulated.
> 
> So perhaps we should introduce an option to use this hardware-oriented
> model.  The problem is that with vDPA the implementation of a device can
> shift under the feet of a driver between hardware and emulation.  We'd like
> to get the best performance we can in both cases.
> 
> My proposal is that we add an option to use both models at the same time. 
> enable_cb() would become:
> 
> 30   enable_cb_new() {
> 31     if( int_en_doorbell_enabled )
> 32 writel(last_used_idx, p_int_en_doorbell);
> 33 guest_mem->event_suppress = last_used_idx;
> 34     virtio_mb();
> 35     return desc_is_used();
> 36   }
> 
> p_int_en_doorbell it notionally a pointer into a memory mapping onto a
> device BAR, and when backed by hardware that is exactly what it is (and the
> information in the event suppression structure is ignored).  But when backed
> by an emulated device the p_int_en_doorbell mapping should just point at
> normal host memory (which is otherwise unused).  This means that we avoid
> vmexits in the emulated case.
> 
> Comments?

Sounds fine to me.

Could you send a VIRTIO spec patch so this can be discussed in detail?

Thanks,
Stefan

Attachment: signature.asc
Description: PGP signature



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]