OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [virtio-comment] Re: [PATCH v3 6/8] admin: Add theory of operation for write recording commands


On Fri, Nov 17, 2023 at 10:52:49AM +0000, Parav Pandit wrote:
> 
> 
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Friday, November 17, 2023 4:08 PM
> > 
> > On Fri, Nov 17, 2023 at 09:57:52AM +0000, Parav Pandit wrote:
> > >
> > > > From: virtio-comment@lists.oasis-open.org
> > > > <virtio-comment@lists.oasis- open.org> On Behalf Of Michael S.
> > > > Tsirkin
> > > > Sent: Friday, November 17, 2023 3:21 PM
> > > >
> > > > On Fri, Nov 17, 2023 at 09:41:40AM +0000, Parav Pandit wrote:
> > > > >
> > > > >
> > > > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > > > Sent: Friday, November 17, 2023 3:08 PM
> > > > > >
> > > > > > On Fri, Nov 17, 2023 at 09:14:21AM +0000, Parav Pandit wrote:
> > > > > > >
> > > > > > >
> > > > > > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > > > > > Sent: Friday, November 17, 2023 2:16 PM In any case you can
> > > > > > > > safely assume that many users will have migration that takes
> > > > > > > > seconds and minutes.
> > > > > > >
> > > > > > > Strange, but ok. I don't see any problem with current method.
> > > > > > > 8MB is used for very large VM of 1TB takes minutes. Should be fine.
> > > > > >
> > > > > > The problem is simple: vendors selling devices have no idea how
> > > > > > large the VM will be. So you have to over-provision for the max VM size.
> > > > > > If there was a way to instead allocate that in host memory, that
> > > > > > would improve on this.
> > > > >
> > > > > Not sure what to over provision for max VM size.
> > > > > Vendor does not know how many vcpus will be needed. It is no
> > > > > different
> > > > problem.
> > > > >
> > > > > When the VM migration is started, the individual tracking range is
> > > > > supplied by
> > > > the hypervisor to device.
> > > > > Device allocates necessary memory on this instruction.
> > > > >
> > > > > When the VM with certain size is provisioned, the member device
> > > > > can be
> > > > provisioned for the VM size.
> > > > > And if it cannot be provisioned, possibly this may not the right
> > > > > member device
> > > > to use at that point in time.
> > > >
> > > > For someone who keeps arguing against adding single bit registers
> > > > "because it does not scale" you seem very nonchalant about adding
> > 8Mbytes.
> > > >
> > > There is fundamental difference on how/when a bit is used.
> > > One wants to use a bit for non-performance part and keep it always available
> > vs data path.
> > > Not same comparison.
> > >
> > > > I thought we have a nicely contained and orthogonal feature, so if
> > > > it's optional it's not a problem.
> > > It is optional as always.
> > >
> > > >
> > > > But with such costs and corner cases what exactly is the motivation
> > > > for the feature here?
> > > New generations DPUs have memory for device data path workloads but not
> > for bits.
> > >
> > > > Do you have a PoC showing how this works better than e.g.
> > > > shadow VQ?
> > > >
> > > Not yet.
> > > But I don't think this can be even a criteria to consider as dependency on
> > PASID is nonstarter with other limitations.
> > 
> > You just need dirty bit in PTE, whether that is tied to PASID depends very much
> > on the platform.  For VTD I think it is.  And if shadow vq works as a fallback, it
> > just might be reasonable not to do any tracking in virtio.
> >
> Somehow the claim of shadow vq is great without sharing any performance numbers is what I don't agree with.

It's upstream in QEMU. Test it youself.

> And it fundamentally does not fit the generic stack where virtio to be used.
> 
> We have accelerated some of the shadow vq for non virtio devices and those optimizations are not elegant enough that I wouldn't want to bring to virtio spec.
> A different discussion.

Let's just say, it's more elegant than what I saw so far.

> > > > Maybe IOMMU based and shadow VQ based tracking are the way to go
> > > > initially, and if there's a problem then we should add this later, on top.
> > > >
> > > For the cpus that does not support IOMMU cannot shift to shadow VQ either.
> > 
> > I don't know what this means (no IOMMU at all?) but it looks like shadow vq
> > and similar approaches are in production with vdpa and have been
> > demonstrated for a while. All we are doing is supporting them in virtio proper.
> > 
> IOMMU is present but does not have support for D bit.

yes, there are systems like this.  It would be interesting to see some
info on how widespread this is.  Sometimes it is easier to just tell
customers "so buy a better IOMMU" instead of investing in work-arounds.

> > > > I really want us to finally make progress merging features and
> > > > anything that reduces scope initially is good for that.
> > > >
> > > Yes, if you prefer to split the last three patches, I am fine.
> > > Please let me know.
> > 
> > As here have not been any comments on 1-5 I don't think there's need to repost
> > this just yet. I'll review 1-5 next week.
> > I think in the next version it might be wise to split this and post as two series,
> > yes.
> Ok.



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]