[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [virtio-dev] Device-to-driver notification management for hardware implementations
On Thu, Nov 08, 2018 at 12:16:28PM +0000, David Riddoch wrote: > On 05/11/18 08:53, Stefan Hajnoczi wrote: > > On Wed, Sep 12, 2018 at 05:41:07PM +0100, David Riddoch wrote: > > > The VIRTIO_F_NOTIFICATION_DATA feature helps hardware implementations by > > > telling them directly how many descriptors are available in the ring, saving > > > speculative reads. There is a related issue with interrupts: As things > > > stand, after writing 'used' descriptors into the ring (packed or split) the > > > device has to read the event idx to determine whether a device-to-driver > > > notification (interrupt) is needed. > > > > > > It would be very helpful to a hardware implementation if the event idx could > > > be written directly to the device. Has anyone proposed an extension to > > > address this already? (I have some thoughts on how to do this if we don't > > > have a proposal yet). > > Hi David, > > Has there been any further discussion on this topic? It seems like a > > useful feature to have in VIRTIO. > Hi Stefan, > > Thanks for the nudge, no further discussion. So here are my thoughts: > > To provide context here is an outline of how drivers typically process > completions from a device. This applies to both para-virtualised and real > hardware. (I think we only need to worry about packed-ring). > > disable_cb(); > again: > while( budget_left && desc_is_used() ) > handle_used(); > if( budget_left &&enable_cb() ) > goto again; > > The budget stuff relates to napi polling, and isn't always there. But if it > is, and budget is exhausted, then an outer loop will invoke this code again > and so we don't want an interrupt. > > In our current packed-ring design enable_cb() does roughly this: > > enable_cb_packed() { > guest_mem->event_suppress = last_used_idx; > virtio_mb(); > return desc_is_used(); > } > > This tells the device to raise an interrupt when moving used_idx over the > given idx. (And you can set that idx to a position further in the ring if > you're not interested in getting woken until a bunch of descriptors have > been consumed). > > This design is unpleasant for hardware devices, because it forces them to > issue a read over the PCIe bus after writing used descriptors, in order to > determine whether an interrupt is needed. Such a read adds complexity, adds > overhead to the PCIe bus and memory system, and adds latency. > > In hardware devices interrupts are typically enabled like this: > > enable_cb_hw() { > writel(last_used_idx, &dev_bar->int_en_doorbell); > return 0; > } > > This tells the device to raise an interrupt once the descriptor at the given > index has been used. If it has already been used, then an interrupt is > raised immediately, else the interrupt is enabled and raised after the used > descriptor has been written. The race condition is handled without having > to check the ring again. Note that interrupts are fire-once: For each write > to int_en_doorbell you get a single interrupt, and disable_cb() is therefore > a no-op. > > When you emulate a device using this model, you get a vmexit every time > int_en_doorbell is written, which is expensive. Therefore this approach is > not desirable when emulated. > > So perhaps we should introduce an option to use this hardware-oriented > model. The problem is that with vDPA the implementation of a device can > shift under the feet of a driver between hardware and emulation. We'd like > to get the best performance we can in both cases. > > My proposal is that we add an option to use both models at the same time. > enable_cb() would become: > > 30 enable_cb_new() { > 31 if( int_en_doorbell_enabled ) > 32 writel(last_used_idx, p_int_en_doorbell); > 33 guest_mem->event_suppress = last_used_idx; > 34 virtio_mb(); > 35 return desc_is_used(); > 36 } > > p_int_en_doorbell it notionally a pointer into a memory mapping onto a > device BAR, and when backed by hardware that is exactly what it is (and the > information in the event suppression structure is ignored). But when backed > by an emulated device the p_int_en_doorbell mapping should just point at > normal host memory (which is otherwise unused). This means that we avoid > vmexits in the emulated case. > > Comments? Sounds fine to me. Could you send a VIRTIO spec patch so this can be discussed in detail? Thanks, Stefan
Attachment:
signature.asc
Description: PGP signature
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]