OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: RE: [virtio-comment] Re: [PATCH v3 6/8] admin: Add theory of operation for write recording commands



> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Friday, November 17, 2023 5:35 PM
> To: Parav Pandit <parav@nvidia.com>
> 
> On Fri, Nov 17, 2023 at 11:45:20AM +0000, Parav Pandit wrote:
> >
> > > From: Michael S. Tsirkin <mst@redhat.com>
> > > Sent: Friday, November 17, 2023 5:04 PM
> > >
> > > On Fri, Nov 17, 2023 at 11:05:16AM +0000, Parav Pandit wrote:
> > > >
> > > >
> > > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > > Sent: Friday, November 17, 2023 4:30 PM
> > > > >
> > > > > On Fri, Nov 17, 2023 at 10:03:47AM +0000, Parav Pandit wrote:
> > > > > >
> > > > > >
> > > > > > > From: Zhu, Lingshan <lingshan.zhu@intel.com>
> > > > > > > Sent: Friday, November 17, 2023 3:30 PM
> > > > > > >
> > > > > > > On 11/16/2023 7:59 PM, Michael S. Tsirkin wrote:
> > > > > > > > On Thu, Nov 16, 2023 at 06:28:07PM +0800, Zhu, Lingshan wrote:
> > > > > > > >>
> > > > > > > >> On 11/16/2023 1:51 PM, Michael S. Tsirkin wrote:
> > > > > > > >>> On Thu, Nov 16, 2023 at 05:29:54AM +0000, Parav Pandit wrote:
> > > > > > > >>>> We should expose a limit of the device in the proposed
> > > > > > > WRITE_RECORD_CAP_QUERY command, that how much range it can
> > > track.
> > > > > > > >>>> So that future provisioning framework can use it.
> > > > > > > >>>>
> > > > > > > >>>> I will cover this in v5 early next week.
> > > > > > > >>> I do worry about how this can even work though. If you
> > > > > > > >>> want a generic device you do not get to dictate how much
> > > > > > > >>> memory VM
> > > has.
> > > > > > > >>>
> > > > > > > >>> Aren't we talking bit per page? With 1TByte of memory to
> > > > > > > >>> track
> > > > > > > >>> -> 256Gbit -> 32Gbit -> 8Gbyte per VF?
> > > > > > > >>>
> > > > > > > >>> And you happily say "we'll address this in the future"
> > > > > > > >>> while at the same time fighting tooth and nail against
> > > > > > > >>> adding single bit status registers because scalability?
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>> I have a feeling doing this completely theoretical like
> > > > > > > >>> this is
> > > problematic.
> > > > > > > >>> Maybe you have it all laid out neatly in your head but I
> > > > > > > >>> suspect not all of TC can picture it clearly enough
> > > > > > > >>> based just on spec
> > > text.
> > > > > > > >>>
> > > > > > > >>> We do sometimes ask for POC implementation in linux /
> > > > > > > >>> qemu to demonstrate how things work before merging code.
> > > > > > > >>> We skipped this for admin things so far but I think it's
> > > > > > > >>> a good idea to start doing it here.
> > > > > > > >>>
> > > > > > > >>> What makes me pause a bit before saying please do a PoC
> > > > > > > >>> is all the opposition that seems to exist to even using
> > > > > > > >>> admin commands in the 1st place. I think once we finally
> > > > > > > >>> stop arguing about whether to use admin commands at all
> > > > > > > >>> then a PoC will be needed
> > > > > before merging.
> > > > > > > >> We have POR productions that implemented the approach in
> > > > > > > >> my
> > > series.
> > > > > > > >> They are multiple generations of productions in market
> > > > > > > >> and running in customers data centers for years.
> > > > > > > >>
> > > > > > > >> Back to 2019 when we start working on vDPA, we have sent
> > > > > > > >> some samples of production(e.g., Cascade Glacier) and the
> > > > > > > >> datasheet, you can find live migration facilities there,
> > > > > > > >> includes suspend, vq state and other features.
> > > > > > > >>
> > > > > > > >> And there is an reference in DPDK live migration, I have
> > > > > > > >> provided this page
> > > > > > > >> before:
> > > > > > > >> https://doc.dpdk.org/guides-21.11/vdpadevs/ifc.html, it
> > > > > > > >> has been working for long long time.
> > > > > > > >>
> > > > > > > >> So if we let the facts speak, if we want to see if the
> > > > > > > >> proposal is proven to work, I would
> > > > > > > >> say: They are POR for years, customers already deployed
> > > > > > > >> them for
> > > years.
> > > > > > > > And I guess what you are trying to say is that this
> > > > > > > > patchset we are reviewing here should be help to the same
> > > > > > > > standard and there should be a PoC? Sounds reasonable.
> > > > > > > Yes and the in-marketing productions are POR, the series
> > > > > > > just improves the design, for example, our series also use
> > > > > > > registers to track vq state, but improvements than CG or
> > > > > > > BSC. So I think they are proven
> > > > > to work.
> > > > > >
> > > > > > If you prefer to go the route of POR and production and proven
> > > > > > documents
> > > > > etc, there is ton of it of multiple types of products I can dump
> > > > > here with open- source code and documentation and more.
> > > > > > Let me know what you would like to see.
> > > > > >
> > > > > > Michael has requested some performance comparisons, not all
> > > > > > are ready to
> > > > > share yet.
> > > > > > Some are present that I will share in coming weeks.
> > > > > >
> > > > > > And all the vdpa dpdk you published does not have basic CVQ
> > > > > > support when I
> > > > > last looked at it.
> > > > > > Do you know when was it added?
> > > > >
> > > > > It's good enough for PoC I think, CVQ or not.
> > > > > The problem with CVQ generally, is that VDPA wants to shadow CVQ
> > > > > it at all times because it wants to decode and cache the
> > > > > content. But this problem has nothing to do with dirty tracking
> > > > > even though it also
> > > mentions "shadow":
> > > > > if device can report it's state then there's no need to shadow CVQ.
> > > >
> > > > For the performance numbers with the pre-copy and device context
> > > > of
> > > patches posted 1 to 5, the downtime reduction of the VM is 3.71x
> > > with active traffic on 8 RQs at 100Gbps port speed.
> > >
> > > Sounds good can you please post a bit more detail?
> > > which configs are you comparing what was the result on each of them.
> >
> > Common config: 8+8 tx and rx queues.
> > Port speed: 100Gbps
> > QEMU 8.1
> > Libvirt 7.0
> > GVM: Centos 7.4
> > Device: virtio VF hardware device
> >
> > Config_1: virtio suspend/resume similar to what Lingshan has, largely
> > vdpa stack
> > Config_2: Device context method of admin commands
> 
> OK that sounds good. The weird thing here is that you measure "downtime".
> What exactly do you mean here?
> I am guessing it's the time to retrieve on source and re-program device state on
> destination? And this is 3.71x out of how long?
Yes. Downtime is the time during which the VM is not responding or receiving packets, which involves reprogramming the device.
3.71x is relative time for this discussion.


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]