OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [virtio-comment] Hardware friendly proposals from Intel for packed-ring-layout


On 24/08/2017 15:11, Tiwei Bie wrote:
> On Thu, Aug 24, 2017 at 02:10:34PM +0200, Paolo Bonzini wrote:
>> On 24/08/2017 13:53, Tiwei Bie wrote:
>>>
>>> * In addition to the DESC_HW flag, each virtio queue has a tail pointer
>>>     - Driver creates suitable (i.e. multiple of cacheline) descriptors,
>>>       then performs MMIO write to tail pointer.
>>
>> If I understand correctly, the tail pointer is the value that is written
>> to the MMIO register.  If that is the case, this is unfortunately bad
>> for virtualization.  Virt prefers a doorbell register where the value
>> doesn't matter.  This is because:
>>
>> 1) the value is not available directly and computing it requires
>> instruction decoding, which in turn requires walking page tables
>>
>> 2) if the value doesn't matter, the hypervisor can simply wake up a
>> userspace thread that processes the virtio queue without bothering to
>> pass the value.
>>
>> On the other hand, writing a tail pointer _before_ the MMIO write may
>> cost a cache miss.  Hence the packed ring layout proposal replaced the
>> tail pointer write with lookahead on the ring buffer's DESC_HW flags.
>> The idea is that lookahead is cheaper, because hopefully the first
>> non-DESC_HW buffer will be in the same cache line as the last DESC_HW
>> buffer.
> 
> Thank you so much for such quick and detailed reply!
> 
> Yeah, we know it's a bit tricky to support the tail pointer in
> software. But it's really helpful for the hardware implementation.
> So we want more discussions on this.
> 
> How about having this feature be switchable at runtime, so it's
> possible to be enabled after migrating to a hardware backend, or
> disabled after migrating to a software backend. So for the software
> backend, it can still use the DESC_HW based mechanism.
> 
> It's just some rough thoughts, and we haven't thought about the
> implementation details. What's your thoughts on this?

Why is lookahead bad for hardware?  Can a PCIe device use burst reads to
retrieve many 2-byte descriptor in a single TLP transaction?

>>> Indirect Chaining
>>> =================
>>>
>>> ## Current proposal
>>>
>>> * Indirect chaining is an optional feature
>>>
>>> ## New proposal
>>>
>>> * Remove this feature from this new ring layout
>>>
>>> It's very unlikely that hardware implementations would support this
>>> due to extra latency of fetching actual descriptors.
>>>
>>> This is a totally new ring layout, and we don't need to worry about the
>>> compatibility issues with the old one. So it's better to not include this
>>> feature in this new ring layout if we can't find it's necessary now.
>>
>> Indirect chaining is actually relatively common for storage devices.
>>
>> Hardware implementations are free not to support indirect chaining if it
>> hurts latency.
> 
> We are proposing removing it if it's not really necessary.
> So if it's really necessary, let's just keep it. :)
> 
> One problem that keeping this feature may introduce is that, if a
> software backend chooses to implement this feature, and a VM is
> running on this backend with this feature enabled, it could be a
> problem to be live-migrated to a hardware backend which doesn't
> support this feature. Yeah, it can be a general problem about live
> migration if we have some features be negotiable, and some types
> of backend don't plan to support them at all. Do you have any
> thoughts on this?

If you are preparing for live migration to a hardware backend, you can
disable indirect rings when starting the VM, even though it will start
on a software backend.

Paolo



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]