OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [virtio-comment] Re: [PATCH RESEND] virtio-net: support setting coalescing params for multiple vqs




å 2023/12/21 äå3:19, Heng Qi åé:


å 2023/12/21 äå3:08, Michael S. Tsirkin åé:
On Thu, Dec 21, 2023 at 03:01:57PM +0800, Heng Qi wrote:

å 2023/12/21 äå2:56, Michael S. Tsirkin åé:
On Wed, Dec 20, 2023 at 10:40:34PM +0800, Heng Qi wrote:
Currently, when each time the driver attempts to update the coalescing parameters for a vq, it needs to kick the device and wait for the ctrlq response to return.
But there's no fundamental reason for it to do so.
Indeed, please look at the current upstream netdim code:

static void virtnet_rx_dim_work(struct work_struct *work)
{
ÂÂÂÂÂÂÂÂ struct dim *dim = container_of(work, struct dim, work);
ÂÂÂÂÂÂÂÂ struct receive_queue *rq = container_of(dim,
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ struct receive_queue, dim);
ÂÂÂÂÂÂÂÂ struct virtnet_info *vi = rq->vq->vdev->priv;
ÂÂÂÂÂÂÂÂ struct net_device *dev = vi->dev;
ÂÂÂÂÂÂÂÂ struct dim_cq_moder update_moder;
ÂÂÂÂÂÂÂÂ int i, qnum, err;

ÂÂÂÂÂÂÂÂ if (!rtnl_trylock())
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ return;

ÂÂÂÂÂÂÂÂ /* Each rxq's work is queued by "net_dim()->schedule_work()"
ÂÂÂÂÂÂÂÂÂ * in response to NAPI traffic changes. Note that dim->profile_ix
ÂÂÂÂÂÂÂÂÂ * for each rxq is updated prior to the queuing action.
ÂÂÂÂÂÂÂÂÂ * So we only need to traverse and update profiles for all rxqs
ÂÂÂÂÂÂÂÂÂ * in the work which is holding rtnl_lock.
ÂÂÂÂÂÂÂÂÂ */
ÂÂÂÂÂÂÂÂ for (i = 0; i < vi->curr_queue_pairs; i++) {
<-----------------
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ rq = &vi->rq[i];
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ dim = &rq->dim;
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ qnum = rq - vi->rq;

ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ if (!rq->dim_enabled)
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ continue;

ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ update_moder = net_dim_get_rx_moderation(dim->mode,
dim->profile_ix);
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ if (update_moder.usec != rq->intr_coal.max_usecs ||
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ update_moder.pkts != rq->intr_coal.max_packets) {
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ err = virtnet_send_rx_ctrl_coal_vq_cmd(vi, qnum,
update_moder.usec,
update_moder.pkts);
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ if (err)
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ pr_debug("%s: Failed to send dim parameters
on rxq%d\n",
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ dev->name, qnum);
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ dim->state = DIM_START_MEASURE;
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ }
ÂÂÂÂÂÂÂÂ }

ÂÂÂÂÂÂÂÂ rtnl_unlock();
}

It can just as well submit multiple commands and then wait.
Sending multiple commands and then waiting reduces the number of kicks.
But it does not reduce the number of device DMAs. I already responded to
this in a thread to Jason.
please check:

https://lists.oasis-open.org/archives/virtio-comment/202312/msg00142.html

"
Overall, batch reqs are sufficient. Because the current major overhead is
the number of DMAs.
For example, for a device with 256 queues,

For the current upstream code, the overhead is 256 kicks + 256*8 DMA times.
The overhead of batch cmds is 1 kick + 256*8 DMA times.
The overhead of batch reqs is 1 kick + 8 DMA times.

Below is 8 DMA times:
- get avail idx 1 time
- Pull available ring information 1 time
- Pull the desc pointed to by the avail ring 3 times
- Pull the hdr and out bufs pointed to by avail ring desc 2 times
- Write once to the status buf pointed to by status 1 time "

Thanks.
So there's more DMA but it's all slow path.
Why do we need to micro-optimize it?

On our each DPU, multiple VMs need to be supported. The ctrlq of these VMs are simulated by software,
which consumes limited CPU resources on the DPU.

Each DMA needs to wait for the DMA hw execution to complete. Then the DPU CPU will do other things, and it will be scheduled when the DMA execution is completed. Therefore, more DMA times means more CPU scheduling.
Affects CPU execution efficiency.

And batch reqs, reducing the number of DMA not only improves DPU CPU execution efficiency, but also optimizes DMA latency.
This is very meaningful for single DPU multi-VM multi-queue scenarios.


Therefore, optimizing the response speed of ctrlq for netdim can better support multiple VMs and the large-queue-sized VM.

What is the overhead practically, in milliseconds?

Latency overhead is only one aspect. More importantly, limited CPU resources need to support
effective command processing of more VMs and large-queue-sized VMs.

Thanks.




This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]