[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [virtio-comment] Hardware friendly proposals from Intel for packed-ring-layout
On 24/08/2017 15:11, Tiwei Bie wrote: > On Thu, Aug 24, 2017 at 02:10:34PM +0200, Paolo Bonzini wrote: >> On 24/08/2017 13:53, Tiwei Bie wrote: >>> >>> * In addition to the DESC_HW flag, each virtio queue has a tail pointer >>> - Driver creates suitable (i.e. multiple of cacheline) descriptors, >>> then performs MMIO write to tail pointer. >> >> If I understand correctly, the tail pointer is the value that is written >> to the MMIO register. If that is the case, this is unfortunately bad >> for virtualization. Virt prefers a doorbell register where the value >> doesn't matter. This is because: >> >> 1) the value is not available directly and computing it requires >> instruction decoding, which in turn requires walking page tables >> >> 2) if the value doesn't matter, the hypervisor can simply wake up a >> userspace thread that processes the virtio queue without bothering to >> pass the value. >> >> On the other hand, writing a tail pointer _before_ the MMIO write may >> cost a cache miss. Hence the packed ring layout proposal replaced the >> tail pointer write with lookahead on the ring buffer's DESC_HW flags. >> The idea is that lookahead is cheaper, because hopefully the first >> non-DESC_HW buffer will be in the same cache line as the last DESC_HW >> buffer. > > Thank you so much for such quick and detailed reply! > > Yeah, we know it's a bit tricky to support the tail pointer in > software. But it's really helpful for the hardware implementation. > So we want more discussions on this. > > How about having this feature be switchable at runtime, so it's > possible to be enabled after migrating to a hardware backend, or > disabled after migrating to a software backend. So for the software > backend, it can still use the DESC_HW based mechanism. > > It's just some rough thoughts, and we haven't thought about the > implementation details. What's your thoughts on this? Why is lookahead bad for hardware? Can a PCIe device use burst reads to retrieve many 2-byte descriptor in a single TLP transaction? >>> Indirect Chaining >>> ================= >>> >>> ## Current proposal >>> >>> * Indirect chaining is an optional feature >>> >>> ## New proposal >>> >>> * Remove this feature from this new ring layout >>> >>> It's very unlikely that hardware implementations would support this >>> due to extra latency of fetching actual descriptors. >>> >>> This is a totally new ring layout, and we don't need to worry about the >>> compatibility issues with the old one. So it's better to not include this >>> feature in this new ring layout if we can't find it's necessary now. >> >> Indirect chaining is actually relatively common for storage devices. >> >> Hardware implementations are free not to support indirect chaining if it >> hurts latency. > > We are proposing removing it if it's not really necessary. > So if it's really necessary, let's just keep it. :) > > One problem that keeping this feature may introduce is that, if a > software backend chooses to implement this feature, and a VM is > running on this backend with this feature enabled, it could be a > problem to be live-migrated to a hardware backend which doesn't > support this feature. Yeah, it can be a general problem about live > migration if we have some features be negotiable, and some types > of backend don't plan to support them at all. Do you have any > thoughts on this? If you are preparing for live migration to a hardware backend, you can disable indirect rings when starting the VM, even though it will start on a software backend. Paolo
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]