virtio-comment message

Subject: RE: [virtio-comment] Re: [PATCH v2 0/2] transport-pci: Introduce legacy registers access using AQ

From: Parav Pandit <parav@nvidia.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Date: Mon, 15 May 2023 20:56:42 +0000


> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Monday, May 15, 2023 4:30 PM
> >
> > I am not sure if this is a real issue. Because even the legacy guests
> > have msix enabled by default. In theory yes, it can fall back to intx.
> 
> Well. I feel we should be closer to being sure it's not an issue if we are going to
> ignore it.
> some actual data here:
> 
> Even linux only enabled MSI-X in 2009.
> Of course, other guests took longer. E.g.
> a quick google search gave me this for some bsd variant (2017):
> https://twitter.com/dragonflybsd/status/834494984229421057
> 
> Many guests have tunables to disable msix. Why?
> E.g. BSD keeps maintaining it at
>      hw.virtio.pci.disable_msix
> not a real use-case and you know 100% no guests have set this to work around
> some bug e.g. in bsd MSI-X core?  How can you be sure?
> 
> 
> 
> intx is used when guests run out of legacy interrupts, these setups are not hard
> to create at all: just constrain the number of vCPUs while creating lots of
> devices.
> 
> 
> I could go on.
> 
> 
> 
> > There are few options.
> > 1. A hypervisor driver can be conservative and steal an msix of the VF
> > for transporting intx.
> > Pros: Does not need special things in device
> > Cons:
> > a. Fairly intrusive in hypervisor vf driver.
> > b. May not be ever used as guest is unlikely to fail on msix
> 
> Yea I do not like this since we are burning up msix vectors.
> More reasons: this "pass through" msix has no chance to set ISR properly since
> msix does not set ISR.
> 
> 
> > 2. Since multiple VFs intx to be serviced, one command per VF in AQ is
> > too much overhead that device needs to map a request to,
> >
> > A better way is to have an eventq of depth = num_vfs, like many other
> > virtio devices have it.
> >
> > An eventq can hold per VF interrupt entry including the isr value that
> > you suggest above.
> >
> > Something like,
> >
> > union eventq_entry {
> > 	u8 raw_data[16];
> > 	struct intx_entry {
> > 		u8 event_opcode;
> > 		u8 group_type;
> > 		u8 reserved[6];
> > 		le64 group_identifier;
> > 		u8 isr_status;
> > 	};
> > };
> >
> > This eventq resides on the owner parent PF.
> > isr_status is read on clear like today.
> 
> This is what I wrote no?
> lore.kernel.org/all/20230507050146-mutt-send-email-
> mst%40kernel.org/t.mbox.gz
> 
> 	how about a special command that is used when device would
> 	normally send INT#x? it can also return ISR to reduce latency.
> 
In response to your above suggestion of AQ command,
I suggested the eventq that contains the isr_status that reduces latency as you suggest.

> > May be such eventq can be useful in future for wider case.
> 
> There's no maybe here is there? Things like live migration need events for sure.
> 
> > We may have to find a different name for it as other devices has
> > device specific eventq.
> 
> We don't need a special name for it. Just use an adminq with a special
> command that is only consumed when there is an event.
This requires too many commands to be issued on the PF device.
Potentially one per VF. And device needs to keep track of command to VF mapping.

> Note you only need to queue a command if MSI is disabled.
> Which is nice.
Yes, it is nice.
An eventq is a variation of it, where device can keep reporting the events without doing the extra mapping and without too many commands.

Additionally, eventq also works for 1.x device which will read the ISR status registers directly from the device.

> 
> > I am inclined to differ this to a later point if one can identify the
> > real failure with msix for the guest VM.
> > So far we don't see this ever happening.
> 
> What is the question exactly?
> 
> Just have more devices than vectors,
> an intel CPU only has ~200 of these, and current drivers want to use 2 vectors
> and then fall back on INTx since that is shared.
> Extremely easy to create - do you want a qemu command line to try?
> 
Intel CPU has 256 per core (per vcpu). So they are really a lot.
One needs to connect lot more devices to the cpu to run out of it.
So yes, I would like to try the command to fail.

> Do specific customers event use guests with msi-x disabled? Maybe no.
> Does anyone use virtio with msi-x disabled? Most likely yes.
I just feel that INTx emulation is extremely rare/narrow case of some applications that may never find its use on hw based devices.

> So if we are going for legacy pci emulation let's have a comprehensive legacy
> pci emulation please where host can either enable it for a guest or deny
> completely, not kind of start running then fail mysteriously.
A driver will be easily able to fail the call on INTx configuration failing the guest.

But lets see if can align to eventq/aq scheme to make it work.

Follow-Ups:
- Re: [virtio-comment] Re: [PATCH v2 0/2] transport-pci: Introduce legacy registers access using AQ
  - From: "Michael S. Tsirkin" <mst@redhat.com>

References:
- [PATCH v2 0/2] transport-pci: Introduce legacy registers access using AQ
  - From: Parav Pandit <parav@nvidia.com>
- Re: [PATCH v2 0/2] transport-pci: Introduce legacy registers access using AQ
  - From: "Michael S. Tsirkin" <mst@redhat.com>
- Re: [PATCH v2 0/2] transport-pci: Introduce legacy registers access using AQ
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] Re: [PATCH v2 0/2] transport-pci: Introduce legacy registers access using AQ
  - From: "Michael S. Tsirkin" <mst@redhat.com>