[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [virtio-comment] Re: [PATCH 00/11] Introduce transitional mmr pci device
On Mon, Apr 03, 2023 at 03:36:25PM +0000, Parav Pandit wrote: > > > > From: virtio-comment@lists.oasis-open.org <virtio-comment@lists.oasis- > > open.org> On Behalf Of Michael S. Tsirkin > > > > Transport vq for legacy MMR purpose seems fine with its latency and DMA > > overheads. > > > Your question was about "scalability". > > > After your latest response, I am unclear what "scalability" means. > > > Do you mean saving the register space in the PCI device? > > > > yes that's how you used scalability in the past. > > > Ok. I am aligned. > > > > If yes, than, no for legacy guests for scalability it is not required, because the > > legacy register is subset of 1.x. > > > > Weird. what does guest being legacy have to do with a wish to save registers > > on the host hardware? > Because legacy has subset of the registers of 1.x. So no new registers additional expected on legacy side. > > > You don't have so many legacy guests as modern > > guests? Why? > > > This isn't true. > > There is a trade-off, upto certain N, MMR based register access is fine. > This is because 1.x is exposing super set of registers of legacy. > Beyond a certain point device will have difficulty in doing MMR for legacy and 1.x. > At that point, legacy over tvq can be better scale but with lot higher latency order of magnitude higher compare to MMR. > If tvq being the only transport for these registers access, it would hurt at lower scale too, due the primary nature of non_register access. > And scale is relative from device to device. Wow! Why an order of magnitide? > > > > > > > > > And presumably it can all be done in firmware ... > > > > > > Is there actual hardware that can't implement transport vq but > > > > > > is going to implement the mmr spec? > > > > > > > > > > > Nvidia and Marvell DPUs implement MMR spec. > > > > > > > > Hmm implement it in what sense exactly? > > > > > > > Do not follow the question. > > > The proposed series will be implemented as PCI SR-IOV devices using MMR > > spec. > > > > > > > > Transport VQ has very high latency and DMA overheads for 2 to 4 > > > > > bytes > > > > read/write. > > > > > > > > How many of these 2 byte accesses trigger from a typical guest? > > > > > > > Mostly during the VM boot time. 20 to 40 registers read write access. > > > > That is not a lot! How long does a DMA operation take then? > > > > > > > And before discussing "why not that approach", lets finish > > > > > reviewing "this > > > > approach" first. > > > > > > > > That's a weird way to put it. We don't want so many ways to do > > > > legacy if we can help it. > > > Sure, so lets finish the review of current proposal details. > > > At the moment > > > a. I don't see any visible gain of transport VQ other than device reset part I > > explained. > > > > For example, we do not need a new range of device IDs and existing drivers can > > bind on the host. > > > So, unlikely due to already discussed limitation of feature negotiation. > Existing transitional driver would also look for an IOBAR being second limitation. Some confusion here. If you have a transitional driver you do not need a legacy device. > > > b. it can be a way with high latency, DMA overheads on the virtqueue for > > read/writes for small access. > > > > numbers? > It depends on the implementation, but at minimum, writes and reads can pay order of magnitude higher in 10 msec range. A single VQ roundtrip takes a minimum of 10 milliseconds? This is indeed completely unworkable for transport vq. Points: - even for memory mapped you have an access take 1 millisecond? Extremely slow. Why? - Why is DMA 10x more expensive? I expect it to be 2x more expensive: Normal read goes cpu -> device -> cpu, DMA does cpu -> device -> memory -> device -> cpu Reason I am asking is because it is important for transport vq to have a workable design. But let me guess. Is there a chance that you are talking about an interrupt driven design? *That* is going to be slow though I don't think 10msec, more like 10usec. But I expect transport vq to typically work by (adaptive?) polling mostly avoiding interrupts. -- MST
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]