virtio-comment message

Subject: RE: [PATCH v3 6/8] admin: Add theory of operation for write recording commands

From: Parav Pandit <parav@nvidia.com>
To: Jason Wang <jasowang@redhat.com>
Date: Wed, 22 Nov 2023 04:19:35 +0000


> From: Jason Wang <jasowang@redhat.com>
> Sent: Wednesday, November 22, 2023 9:45 AM
> 
> On Wed, Nov 22, 2023 at 12:26âAM Parav Pandit <parav@nvidia.com> wrote:
> >
> >
> > > From: Jason Wang <jasowang@redhat.com>
> > > Sent: Tuesday, November 21, 2023 9:55 AM
> > >
> > > On Fri, Nov 17, 2023 at 11:02âAM Parav Pandit <parav@nvidia.com>
> wrote:
> > > >
> > > >
> > > >
> > > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > > Sent: Thursday, November 16, 2023 11:51 PM
> > > > >
> > > > > On Thu, Nov 16, 2023 at 05:29:49PM +0000, Parav Pandit wrote:
> > > > > >
> > > > > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > > > > Sent: Thursday, November 16, 2023 10:56 PM
> > > > > > >
> > > > > > > On Thu, Nov 16, 2023 at 04:26:53PM +0000, Parav Pandit wrote:
> > > > > > > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > > > > > > Sent: Thursday, November 16, 2023 5:18 PM
> > > > > > > > >
> > > > > > > > > On Thu, Nov 16, 2023 at 07:40:57AM +0000, Parav Pandit
> wrote:
> > > > > > > > > >
> > > > > > > > > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > > > > > > > > Sent: Thursday, November 16, 2023 1:06 PM
> > > > > > > > > > >
> > > > > > > > > > > On Thu, Nov 16, 2023 at 12:51:40AM -0500, Michael S.
> > > > > > > > > > > Tsirkin
> > > wrote:
> > > > > > > > > > > > On Thu, Nov 16, 2023 at 05:29:54AM +0000, Parav
> > > > > > > > > > > > Pandit
> > > wrote:
> > > > > > > > > > > > > We should expose a limit of the device in the
> > > > > > > > > > > > > proposed
> > > > > > > > > > > WRITE_RECORD_CAP_QUERY command, that how much
> range
> > > > > > > > > > > it can
> > > > > > > track.
> > > > > > > > > > > > > So that future provisioning framework can use it.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I will cover this in v5 early next week.
> > > > > > > > > > > >
> > > > > > > > > > > > I do worry about how this can even work though. If
> > > > > > > > > > > > you want a generic device you do not get to
> > > > > > > > > > > > dictate how much memory
> > > > > VM has.
> > > > > > > > > > > >
> > > > > > > > > > > > Aren't we talking bit per page? With 1TByte of
> > > > > > > > > > > > memory to track
> > > > > > > > > > > > -> 256Gbit -> 32Gbit -> 8Gbyte per VF?
> > > > > > > > > > >
> > > > > > > > > > > Ugh. Actually of course:
> > > > > > > > > > > With 1TByte of memory to track -> 256Mbit -> 32Mbit
> > > > > > > > > > > -> 8Mbyte per VF
> > > > > > > > > > >
> > > > > > > > > > > 8Gbyte per *PF* with 1K VFs.
> > > > > > > > > > >
> > > > > > > > > > Device may not maintain as a bitmap.
> > > > > > > > >
> > > > > > > > > However you maintain it, there's 256Mega bit of information.
> > > > > > > > There may be other data structures that device may deploy
> > > > > > > > as for example
> > > > > > > hash or tree or something else.
> > > > > > >
> > > > > > > Point being?
> > > > > > The device may have some hashing accelerator or other
> > > > > > improvements that
> > > > > may perform better than bitmap as many queues in parallel
> > > > > attempt to update the shared database.
> > > > >
> > > > > Maybe, I didn't give this thought.
> > > > >
> > > > > My point was that to be able to keep all combinations of
> > > > > dirty/non dirty page for each 4k page in a 1TByte guest device
> > > > > needs 8MBytes of on-device memory per VF. As designed the query
> > > > > also has to report it for each VF accurately even if multiple VFs are
> accessing same guest.
> > > > Yes.
> > > >
> > > > >
> > > > > > >
> > > > > > > > And this is runtime memory only during the short live
> > > > > > > > migration period of
> > > > > > > 400msec or less.
> > > > > > > > It is not some _always_ resident memory.
> > >
> > > When developing the spec, we should not have any assumption for the
> > > implementation. For example, you can't just assume virtio is always
> > > emulated in the software in the DPU.
> > >
> > There is no such assumption.
> > It is supported on non DPU devices too.
> 
> You meant e.g a 8MB on-chip resource per VF is good to go?
>
It is the device implementation detail. Maybe it uses 8MB, may be not.
And if you are going to compare again with slow registers memory, it is not apple to apple comparison anyway.

Non DPU device may have such memory for data path acceleration.
 
> >
> > > How can you make sure you can converge in 400ms without having a
> > > interface for the driver to set the correct parameter like dirty rates?
> >
> > 400msec is also written anywhere as requirement if this is what you want to
> argue about.
> 
> No, the downtime needs to coordinate with the hypervisor, that is what I
> want to say. Unfortunately, I don't see any interface in this series.
> 
What do you mean by coordinated?
This series has mechanism to eliminate the downtime on src and dst side during device migration during pre-copy phase.

> > There is nothing prevents to extend the interface to define the SLA as
> additional commands in the future to improve the solution.
> >
> > There is no need to boil the ocean now. Once the base infrastructure is
> built, we will improve it further.
> > And proposed patches are reasonably well covered to our knowledge.
> 
> Well, it is not me but you that claims it can be done in 400ms. I'm wondering
> how and you told me it could be done in the future?
>
In our tests it is near to this number.
The discussion is about programming the SLA and that can be an extension.

Follow-Ups:
- Re: [PATCH v3 6/8] admin: Add theory of operation for write recording commands
  - From: Jason Wang <jasowang@redhat.com>

References:
- RE: [PATCH v3 6/8] admin: Add theory of operation for write recording commands
  - From: Parav Pandit <parav@nvidia.com>
- Re: [PATCH v3 6/8] admin: Add theory of operation for write recording commands
  - From: Jason Wang <jasowang@redhat.com>
- RE: [PATCH v3 6/8] admin: Add theory of operation for write recording commands
  - From: Parav Pandit <parav@nvidia.com>
- Re: [PATCH v3 6/8] admin: Add theory of operation for write recording commands
  - From: "Michael S. Tsirkin" <mst@redhat.com>
- Re: [PATCH v3 6/8] admin: Add theory of operation for write recording commands
  - From: "Michael S. Tsirkin" <mst@redhat.com>
- RE: [PATCH v3 6/8] admin: Add theory of operation for write recording commands
  - From: Parav Pandit <parav@nvidia.com>
- Re: [PATCH v3 6/8] admin: Add theory of operation for write recording commands
  - From: "Michael S. Tsirkin" <mst@redhat.com>
- RE: [PATCH v3 6/8] admin: Add theory of operation for write recording commands
  - From: Parav Pandit <parav@nvidia.com>
- Re: [PATCH v3 6/8] admin: Add theory of operation for write recording commands
  - From: "Michael S. Tsirkin" <mst@redhat.com>
- RE: [PATCH v3 6/8] admin: Add theory of operation for write recording commands
  - From: Parav Pandit <parav@nvidia.com>
- Re: [PATCH v3 6/8] admin: Add theory of operation for write recording commands
  - From: "Michael S. Tsirkin" <mst@redhat.com>
- RE: [PATCH v3 6/8] admin: Add theory of operation for write recording commands
  - From: Parav Pandit <parav@nvidia.com>
- Re: [PATCH v3 6/8] admin: Add theory of operation for write recording commands
  - From: Jason Wang <jasowang@redhat.com>
- RE: [PATCH v3 6/8] admin: Add theory of operation for write recording commands
  - From: Parav Pandit <parav@nvidia.com>
- Re: [PATCH v3 6/8] admin: Add theory of operation for write recording commands
  - From: Jason Wang <jasowang@redhat.com>