[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [virtio-comment] Re: [PATCH RESEND] virtio-net: support setting coalescing params for multiple vqs
On Fri, Dec 22, 2023 at 2:45âPM Heng Qi <hengqi@linux.alibaba.com> wrote: > > > > å 2023/12/22 äå10:40, Jason Wang åé: > > On Thu, Dec 21, 2023 at 12:55âPM Heng Qi <hengqi@linux.alibaba.com> wrote: > >> > >> > >> å 2023/12/21 äå11:37, Jason Wang åé: > >>> On Wed, Dec 20, 2023 at 10:40âPM Heng Qi <hengqi@linux.alibaba.com> wrote: > >>>> Currently, when each time the driver attempts to update the coalescing parameters > >>>> for a vq, it needs to kick the device and wait for the ctrlq response to return. > >>>> > >>>> If a command can only update one vq parameters, when the parameters are updated > >>>> frequently (such as netdim), ctrlq on the device is kicked frequently, which will > >>>> increase the device CPU scheduling overhead, and the number and overhead of device > >>>> DMA will also increase. > >>>> > >>>> Merging multiple vq updated parameters into one command can effectively reduce > >>>> the number of kick devices and device DMA times. > >>>> > >>>> Test results show that this greatly improves the efficiency of the ctrlq in > >>>> responding to multiple vq coalescing parameter updates issued by the driver. > >>> So netdim is per virtqueue, to make use of this, you need to batch the > >>> netdim requests first. > >> Yes. I think the code looks more intuitive. > >> Please view the following PoC test code combined with the current code > >> of upstream virtio netdim > >> > >> +struct ctrl_coal_vq_entry { > >> + u16 vqn; > >> + u16 reserved; > >> + int usecs; > >> + int packets; > >> +}; > >> + > >> +struct virtio_net_ctrl_mrg_coal_vq { > >> + int num_entries; > >> + struct ctrl_coal_vq_entry coal_vqs[]; > >> +}; > >> + > >> > >> @@ -313,12 +335,18 @@ struct virtnet_info { > >> > >> struct control_buf *ctrl; > >> > >> + struct virtio_net_ctrl_mrg_coal_vq *ctrl_coal_vqs; > >> + > >> > >> > >> @@ -4390,6 +4586,7 @@ static int virtnet_alloc_queues(struct > >> virtnet_info *vi) > >> if (!vi->rq) > >> goto err_rq; > >> > >> + vi->ctrl_coal_vqs = kzalloc(sizeof(struct > >> virtio_net_ctrl_mrg_coal_vq) + vi->max_queue_pairs * sizeof(struct > >> ctrl_coal_vq_entry), GFP_KERNEL); > >> > >> > >> + struct scatterlist sg; > >> + for (i = 0; i < vi->curr_queue_pairs; i++) { > >> + vi->ctrl_coal_vqs->coal_vqs[i].vqn = rxq2vq(i); > >> + vi->ctrl_coal_vqs->coal_vqs[i].usecs = 8; > >> + vi->ctrl_coal_vqs->coal_vqs[i].packets = 16; > >> + } > >> + vi->ctrl_coal_vqs->num_entries = i; > >> + sg_init_one(&sg, vi->ctrl_coal_vqs, sizeof(struct > >> virtio_net_ctrl_mrg_coal_vq) + vi->ctrl_coal_vqs->num_entries * > >> sizeof(struct ctrl_coal_vq_entry)); > >> + vi->req_num++; > >> + if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_NOTF_COAL, > >> + VIRTIO_NET_CTRL_NOTF_COAL_*VQS*_SET, &sg)) > >> > >> #define VIRTIO_NET_CTRL_NOTF_COAL_VQ_GET 3 > >> +#define VIRTIO_NET_CTRL_NOTF_COAL_VQS_SET 4 > >> > >>> And if you do that, you can batch the commands > >>> as well. > >> > >> Batch commands we have tried: > >> > >> + for (i = 0; i < vi->curr_queue_pairs; i++) { > >> + err = virtnet_pre_send_cmd(vi, &vi->vq_ctrls[i], > >> 8, 16); > >> + } > >> + > >> + if (unlikely(!virtqueue_kick(vi->cvq))) > >> + return vi->ctrl->status == VIRTIO_NET_OK; > >> + > >> + for (i = 0; i < vi->curr_queue_pairs; i++) { > >> + while (!virtqueue_is_broken(vi->cvq) && > >> !virtqueue_get_buf(vi->cvq, &tmp)) > >> + cpu_relax(); > >> + } > >> + } > >> > >> > >> + > >> +static bool virtnet_pre_send_cmd(struct virtnet_info *vi, struct > >> vq_coal *vq_ctrls, > >> + u32 max_usecs, u32 max_packets) > >> +{ > >> + struct scatterlist hdr, out, stat; > >> + struct scatterlist *sgs[4]; > >> + unsigned out_num = 0; > >> + int ret; > >> + > >> + BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ)); > >> + > >> + vq_ctrls->hdr.class = VIRTIO_NET_CTRL_NOTF_COAL; > >> + vq_ctrls->hdr.cmd = VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET; > >> + sg_init_one(&hdr, &vq_ctrls->hdr, sizeof(vq_ctrls->hdr)); > >> + sgs[out_num++] = &hdr; > >> + > >> + vq_ctrls->coal_vq.vqn = cpu_to_le16(0); > >> + vq_ctrls->coal_vq.coal.max_usecs = cpu_to_le32(max_usecs); > >> + vq_ctrls->coal_vq.coal.max_packets = cpu_to_le32(max_packets); > >> + sg_init_one(&out, &vq_ctrls->ho, sizeof(vq_ctrls->ho)); > >> + > >> + if (&out) > >> + sgs[out_num++] = &out; > >> + > >> + vq_ctrls->status = ~0; > >> + sg_init_one(&stat, &vq_ctrls->status, sizeof(vq_ctrls->status)); > >> + sgs[out_num] = &stat; > >> + > >> + BUG_ON(out_num + 1 > ARRAY_SIZE(sgs)); > >> + ret = virtqueue_add_sgs(vi->cvq, sgs, out_num, 1, vi, GFP_ATOMIC); > >> + if (ret < 0) { > >> + dev_warn(&vi->vdev->dev, > >> + "Failed to add sgs for command vq: %d\n.", ret); > >> + return false; > >> + } > >> > >> > >> Overall, batch reqs are sufficient. Because the current major overhead > >> is the number of DMAs. > >> For example, for a device with 256 queues, > >> > >> For the current upstream code, the overhead is 256 kicks + 256*8 DMA times. > >> The overhead of batch cmds is 1 kick + 256*8 DMA times. > > This can furtherly optimized by using IN_ORDER. The DMA can be batched as well. > > DMA batch reduces the number of DMA requests, but not the actual number > of DMAs. Well, if we do IN_ORDER, then 1 for avail idx. At most 2 DMA is needed for all descriptors. Driver can allocate an array of cvq commands. So CVQ commands could be fetched at most 2 DMA. etc. > For out DPU, how much to batch and when to end the batch also need to be > evaluated in detail. > > > > > Or even using packed virtqueue. > > > >> The overhead of batch reqs is 1 kick + 8 DMA times. > > If this is a real issue for your architecture, we should not make it > > only work for coalescing. Any per vq tweaking may suffer from that. > > YES. > > > E.g if you want to do ARFS in the future. So it should a e.g a > > container command for cvq instead of just for coalescing command > > itself. > > > > We need to make sure if the existing facility is sufficient or not. > > We considered to update the generic ctrlq batch command. But so far we > have only seen > this requirement for netdim and aRFS. Two use cases should be sufficient to justify. We do have others, for example the vlan/mac filters. > At the same time, aRFS has not yet > entered the community. > We will try to add a batch req interface for it and it may have a > dedicated flow vq to ensure that > the number of command issuances larger than netdim is responded to in time. > So we temporarily plan to advance the batch of coalescing parameter > requests first. Thanks
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]