virtio-comment message

Subject: Re: [virtio-comment] Re: [PATCH RESEND] virtio-net: support setting coalescing params for multiple vqs

From: Heng Qi <hengqi@linux.alibaba.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Date: Thu, 21 Dec 2023 15:48:31 +0800



å 2023/12/21 äå3:19, Heng Qi åé:



å 2023/12/21 äå3:08, Michael S. Tsirkin åé:

On Thu, Dec 21, 2023 at 03:01:57PM +0800, Heng Qi wrote:


å 2023/12/21 äå2:56, Michael S. Tsirkin åé:

On Wed, Dec 20, 2023 at 10:40:34PM +0800, Heng Qi wrote:
Currently, when each time the driver attempts to update thecoalescing parametersfor a vq, it needs to kick the device and wait for the ctrlqresponse to return.
But there's no fundamental reason for it to do so.

Indeed, please look at the current upstream netdim code:

static void virtnet_rx_dim_work(struct work_struct *work)
{
ÂÂÂÂÂÂÂÂ struct dim *dim = container_of(work, struct dim, work);
ÂÂÂÂÂÂÂÂ struct receive_queue *rq = container_of(dim,
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ struct receive_queue, dim);
ÂÂÂÂÂÂÂÂ struct virtnet_info *vi = rq->vq->vdev->priv;
ÂÂÂÂÂÂÂÂ struct net_device *dev = vi->dev;
ÂÂÂÂÂÂÂÂ struct dim_cq_moder update_moder;
ÂÂÂÂÂÂÂÂ int i, qnum, err;

ÂÂÂÂÂÂÂÂ if (!rtnl_trylock())
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ return;

ÂÂÂÂÂÂÂÂ /* Each rxq's work is queued by "net_dim()->schedule_work()"

ÂÂÂÂÂÂÂÂÂ * in response to NAPI traffic changes. Note thatdim->profile_ix

ÂÂÂÂÂÂÂÂÂ * for each rxq is updated prior to the queuing action.

ÂÂÂÂÂÂÂÂÂ * So we only need to traverse and update profiles for allrxqs

ÂÂÂÂÂÂÂÂÂ * in the work which is holding rtnl_lock.
ÂÂÂÂÂÂÂÂÂ */
ÂÂÂÂÂÂÂÂ for (i = 0; i < vi->curr_queue_pairs; i++) {
<-----------------
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ rq = &vi->rq[i];
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ dim = &rq->dim;
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ qnum = rq - vi->rq;

ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ if (!rq->dim_enabled)
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ continue;

ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ update_moder = net_dim_get_rx_moderation(dim->mode,
dim->profile_ix);
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ if (update_moder.usec != rq->intr_coal.max_usecs ||
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ update_moder.pkts != rq->intr_coal.max_packets) {

ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ err = virtnet_send_rx_ctrl_coal_vq_cmd(vi,qnum,

update_moder.usec,
update_moder.pkts);
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ if (err)

ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ pr_debug("%s: Failed to send dimparameters

on rxq%d\n",
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ dev->name, qnum);
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ dim->state = DIM_START_MEASURE;
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ }
ÂÂÂÂÂÂÂÂ }

ÂÂÂÂÂÂÂÂ rtnl_unlock();
}

It can just as well submit multiple commands and then wait.

Sending multiple commands and then waiting reduces the number of kicks.

But it does not reduce the number of device DMAs. I alreadyresponded to

this in a thread to Jason.
please check:

https://lists.oasis-open.org/archives/virtio-comment/202312/msg00142.html

Overall, batch reqs are sufficient. Because the current majoroverhead is

the number of DMAs.
For example, for a device with 256 queues,

For the current upstream code, the overhead is 256 kicks + 256*8 DMAtimes.

The overhead of batch cmds is 1 kick + 256*8 DMA times.
The overhead of batch reqs is 1 kick + 8 DMA times.

Below is 8 DMA times:
- get avail idx 1 time
- Pull available ring information 1 time
- Pull the desc pointed to by the avail ring 3 times
- Pull the hdr and out bufs pointed to by avail ring desc 2 times
- Write once to the status buf pointed to by status 1 time "

Thanks.

So there's more DMA but it's all slow path.
Why do we need to micro-optimize it?

On our each DPU, multiple VMs need to be supported. The ctrlq of theseVMs are simulated by software,

which consumes limited CPU resources on the DPU.

Each DMA needs to wait for the DMA hw execution to complete. Then theDPU CPU will do other things, andit will be scheduled when the DMA execution is completed. Therefore,more DMA times means more CPU scheduling.

Affects CPU execution efficiency.

And batch reqs, reducing the number of DMA not only improves DPU CPUexecution efficiency, but also optimizes DMA latency.

This is very meaningful for single DPU multi-VM multi-queue scenarios.

Therefore, optimizing the response speed of ctrlq for netdim canbetter support multiple VMs and the large-queue-sized VM.

What is the overhead practically, in milliseconds?

Latency overhead is only one aspect. More importantly, limited CPUresources need to support

effective command processing of more VMs and large-queue-sized VMs.

Thanks.



This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf

List Guidelines:https://www.oasis-open.org/policies-guidelines/mailing-lists

Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/

References:
- [PATCH RESEND] virtio-net: support setting coalescing params for multiple vqs
  - From: Heng Qi <hengqi@linux.alibaba.com>
- Re: [PATCH RESEND] virtio-net: support setting coalescing params for multiple vqs
  - From: "Michael S. Tsirkin" <mst@redhat.com>
- Re: [virtio-comment] Re: [PATCH RESEND] virtio-net: support setting coalescing params for multiple vqs
  - From: Heng Qi <hengqi@linux.alibaba.com>
- Re: [virtio-comment] Re: [PATCH RESEND] virtio-net: support setting coalescing params for multiple vqs
  - From: "Michael S. Tsirkin" <mst@redhat.com>
- Re: [virtio-comment] Re: [PATCH RESEND] virtio-net: support setting coalescing params for multiple vqs
  - From: Heng Qi <hengqi@linux.alibaba.com>