[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [virtio-comment] Re: [PATCH RESEND] virtio-net: support setting coalescing params for multiple vqs
On Thu, Dec 21, 2023 at 03:01:57PM +0800, Heng Qi wrote: > > > å 2023/12/21 äå2:56, Michael S. Tsirkin åé: > > On Wed, Dec 20, 2023 at 10:40:34PM +0800, Heng Qi wrote: > > > Currently, when each time the driver attempts to update the coalescing parameters > > > for a vq, it needs to kick the device and wait for the ctrlq response to return. > > But there's no fundamental reason for it to do so. > > Indeed, please look at the current upstream netdim code: > > static void virtnet_rx_dim_work(struct work_struct *work) > { > ÂÂÂÂÂÂÂ struct dim *dim = container_of(work, struct dim, work); > ÂÂÂÂÂÂÂ struct receive_queue *rq = container_of(dim, > ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ struct receive_queue, dim); > ÂÂÂÂÂÂÂ struct virtnet_info *vi = rq->vq->vdev->priv; > ÂÂÂÂÂÂÂ struct net_device *dev = vi->dev; > ÂÂÂÂÂÂÂ struct dim_cq_moder update_moder; > ÂÂÂÂÂÂÂ int i, qnum, err; > > ÂÂÂÂÂÂÂ if (!rtnl_trylock()) > ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ return; > > ÂÂÂÂÂÂÂ /* Each rxq's work is queued by "net_dim()->schedule_work()" > ÂÂÂÂÂÂÂÂ * in response to NAPI traffic changes. Note that dim->profile_ix > ÂÂÂÂÂÂÂÂ * for each rxq is updated prior to the queuing action. > ÂÂÂÂÂÂÂÂ * So we only need to traverse and update profiles for all rxqs > ÂÂÂÂÂÂÂÂ * in the work which is holding rtnl_lock. > ÂÂÂÂÂÂÂÂ */ > ÂÂÂÂÂÂÂ for (i = 0; i < vi->curr_queue_pairs; i++) {ÂÂÂÂÂÂÂÂÂÂÂÂ > <----------------- > ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ rq = &vi->rq[i]; > ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ dim = &rq->dim; > ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ qnum = rq - vi->rq; > > ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ if (!rq->dim_enabled) > ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ continue; > > ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ update_moder = net_dim_get_rx_moderation(dim->mode, > dim->profile_ix); > ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ if (update_moder.usec != rq->intr_coal.max_usecs || > ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ update_moder.pkts != rq->intr_coal.max_packets) { > ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ err = virtnet_send_rx_ctrl_coal_vq_cmd(vi, qnum, > update_moder.usec, > update_moder.pkts); > ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ if (err) > ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ pr_debug("%s: Failed to send dim parameters > on rxq%d\n", > ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ dev->name, qnum); > ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ dim->state = DIM_START_MEASURE; > ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ } > ÂÂÂÂÂÂÂ } > > ÂÂÂÂÂÂÂ rtnl_unlock(); > } > > > It can just as well submit multiple commands and then wait. > > Sending multiple commands and then waiting reduces the number of kicks. > But it does not reduce the number of device DMAs. I already responded to > this in a thread to Jason. > please check: > > https://lists.oasis-open.org/archives/virtio-comment/202312/msg00142.html > > " > Overall, batch reqs are sufficient. Because the current major overhead is > the number of DMAs. > For example, for a device with 256 queues, > > For the current upstream code, the overhead is 256 kicks + 256*8 DMA times. > The overhead of batch cmds is 1 kick + 256*8 DMA times. > The overhead of batch reqs is 1 kick + 8 DMA times. > > Below is 8 DMA times: > - get avail idx 1 time > - Pull available ring information 1 time > - Pull the desc pointed to by the avail ring 3 times > - Pull the hdr and out bufs pointed to by avail ring desc 2 times > - Write once to the status buf pointed to by status 1 time " > > Thanks. So there's more DMA but it's all slow path. Why do we need to micro-optimize it? What is the overhead practically, in milliseconds? -- MST
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]