[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: RE: [virtio-dev] [PATCH RFC] packed ring layout spec v5
> From: Michael S. Tsirkin [mailto:mst@redhat.com] > Sent: 28. november 2017 15:37 > > If the device always returns buffers in order, then couldn't the driver skip the > >step of reading the used.ring for read-only buffers (e.g. TX for net devices)? The > >used.idx tells how many buffers were returned, and since they are returned in > >the same order as the driver sent them, it knows what their indices are. This > >would then save one cache-miss in the old structure too. > > True. That would be another variant to support though. > > I doubt it'll outperform this one but I didn't test it specifically. Care trying to > implement it? Agreed, and I don't see it as competing with the packed ring, however if there are low hanging fruits, that improve performance of the non-ring structure (in at least some significant use cases) they could be worth considering as part of a rev 1.1 specification too. I'll see what can be done for a prototype. > > > And a follow-up questions would then be: if a device always returns buffers in > >order, does the v1.0 specification not require drivers to reuse descriptors in the > >same order as they are returned? I think 3.2.1.1 implies that at least. If so, > >wouldn't new descriptors always be placed back2back in the descriptor table > >(contiguous memory)? > > You probably mean this: > 1. Get the next free descriptor table entry, d > > and you interpret "next" here as "next in ring order". > > I'm not sure everyone follows this interpretation though. > > E.g. Linux does: > static void detach_buf(struct vring_virtqueue *vq, unsigned int head, > void **ctx) > { > ... > vq->vring.desc[i].next = cpu_to_virtio16(vq->vq.vdev, vq->free_head); > vq->free_head = head; > > So descriptors are added at head of the free list. Next is interpreted as next on > this list. E.g. with a single request in flight, it looks like a single descriptor will > keep getting reused. > I guess there isn't an explicit enough requirement in v1.0 to claim right or wrong with regards to this. Enforcing it could however be made part of a driver requirement imposed by the new IN_ORDER feature bit. Thus the IN_ORDER feature bit for the non-ring would be defined to enforce that the descriptor indices are always processed in-order by both the device and the driver. My reason for this is to ensure that new descriptors are placed in a contiguous range of the descriptor table, which should improve the L1$ prefetcher hit rate for batching, and also provide means for efficient DMA in case of HW-offload. With knowledge of the number of elements in each buffer it could maybe also be possible to calculate the descriptor index range to DMA. BR, -Lars
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]