OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-dev message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [virtio-dev] packed ring layout proposal


On Mon, Sep 19, 2016 at 09:06:19PM +0200, Paolo Bonzini wrote:
> 
> 
> On 19/09/2016 19:15, Michael S. Tsirkin wrote:
> > On Mon, Sep 19, 2016 at 02:01:57PM +0200, Paolo Bonzini wrote:
> >> The main issue with VRING_DESC_F_INDIRECT, at least as it is implemented
> >> in Linux, is the cost of allocating and freeing the indirect descriptor.
> >>  But if you know that the number of descriptor is fixed (e.g. 2 for
> >> request header + data) you can pre-allocate the descriptors with a 1:1
> >> relationship with the entries of the ring buffer.  This might be useful
> >> for the virtio-net tx ring.
> > 
> > Yes, I've been thinking the same thing. Is there any advantage
> > to have the indirect addresses pre-programmed as
> > opposed to stored in the descriptors?
> > 
> > In theory
> > 	if (desc.flags & indirect)
> > 		use(*address[i])
> > 
> > might be speculated as compared to
> > 	if (desc.flags & indirect)
> > 		use(*desc.addr)
> > which can't.
> > But OTOH this might add to cache pressure as desc is already in cache
> > anyway.
> 
> Yes, I think this would be the worst of both worlds. :)
> 
> If the driver knows it has 2 descriptors per request, it could allocate
> a big block in advance and do for example
> 
> 	struct {
> 		// pre-allocated and pre-initialized
> 		// to point to ->hdr
> 		struct virtio_desc req_desc;
> 		// only flags prefilled
> 		struct virtio_desc data_desc;
> 		struct req_header  hdr;
> 	} *indirect_data;
> 
> 	i = index++ & (size - 1);
> 	indirect_addr = &indirect_data[i];
> 	indirect_data[i]->data_desc.addr = this_data_addr;
> 	indirect_data[i]->data_desc.len = this_data_len;
> 	// ... fill in indirect_data[i]->hdr ...
> 
> 	desc[i].addr = indirect_addr;
> 	smp_wmb();
> 	desc[i].index = i | DESC_HW;
> 
> where there is no need for an array of descriptor addresses,
> &indirect_data[i].  As long as the indirect_data size is a power of 2,
> indirect_data needn't even be physically contiguous.
> 
> But I prefer all these tricks to be hidden within the driver.  It seems
> a good idea in the beginning to rig the device to second-guess what a
> driver could do, but then it makes things awkward.  (This is also why
> I'd rather get rid of VRING_DESC_F_NEXT).

Right, I'll CC dpdk list on this proposal. As dpdk uses _NEXT almost
exclusively I'd like to make sure we are not messing things up.

Maybe the right thing to do is to disallow _NEXT if _INDIRECT is
negotiated. Each device can then decide whether it wants to
use _INDIRECT or _NEXT (or both). How's that?


> > If all descriptors are known ahead of the time to be indirect we could
> > pack more of them per cache line by dropping addr (and maybe len?).
> > This might work well for virtio blk.
> 
> Nope, virtio-blk doesn't know at all the size of the sg list (for
> small-I/O workloads it is always 2, but potentially unlimited).  For the
> above ideas I was thinking of the virtio tx header (but only for
> MTU=1500, as Stephen pointed out).
> 
> Paolo

Length in the indirect descriptor itself might not be needed to figure
out s.g list if we use the VRING_DESC_F_NEXT in the direct one that it
points to.
Using a flag in the descriptor might cause a stall but I think that's
typically cheaper than a cache miss (although we are not talking
a full miss here, more like 1/8 of a miss).

-- 
MST


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]