OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [virtio-comment] Re: [PATCH v3 6/8] admin: Add theory of operation for write recording commands


On Fri, Nov 17, 2023 at 12:11:15PM +0000, Parav Pandit wrote:
> 
> 
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Friday, November 17, 2023 5:35 PM
> > To: Parav Pandit <parav@nvidia.com>
> > 
> > On Fri, Nov 17, 2023 at 11:45:20AM +0000, Parav Pandit wrote:
> > >
> > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > Sent: Friday, November 17, 2023 5:04 PM
> > > >
> > > > On Fri, Nov 17, 2023 at 11:05:16AM +0000, Parav Pandit wrote:
> > > > >
> > > > >
> > > > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > > > Sent: Friday, November 17, 2023 4:30 PM
> > > > > >
> > > > > > On Fri, Nov 17, 2023 at 10:03:47AM +0000, Parav Pandit wrote:
> > > > > > >
> > > > > > >
> > > > > > > > From: Zhu, Lingshan <lingshan.zhu@intel.com>
> > > > > > > > Sent: Friday, November 17, 2023 3:30 PM
> > > > > > > >
> > > > > > > > On 11/16/2023 7:59 PM, Michael S. Tsirkin wrote:
> > > > > > > > > On Thu, Nov 16, 2023 at 06:28:07PM +0800, Zhu, Lingshan wrote:
> > > > > > > > >>
> > > > > > > > >> On 11/16/2023 1:51 PM, Michael S. Tsirkin wrote:
> > > > > > > > >>> On Thu, Nov 16, 2023 at 05:29:54AM +0000, Parav Pandit wrote:
> > > > > > > > >>>> We should expose a limit of the device in the proposed
> > > > > > > > WRITE_RECORD_CAP_QUERY command, that how much range it can
> > > > track.
> > > > > > > > >>>> So that future provisioning framework can use it.
> > > > > > > > >>>>
> > > > > > > > >>>> I will cover this in v5 early next week.
> > > > > > > > >>> I do worry about how this can even work though. If you
> > > > > > > > >>> want a generic device you do not get to dictate how much
> > > > > > > > >>> memory VM
> > > > has.
> > > > > > > > >>>
> > > > > > > > >>> Aren't we talking bit per page? With 1TByte of memory to
> > > > > > > > >>> track
> > > > > > > > >>> -> 256Gbit -> 32Gbit -> 8Gbyte per VF?
> > > > > > > > >>>
> > > > > > > > >>> And you happily say "we'll address this in the future"
> > > > > > > > >>> while at the same time fighting tooth and nail against
> > > > > > > > >>> adding single bit status registers because scalability?
> > > > > > > > >>>
> > > > > > > > >>>
> > > > > > > > >>> I have a feeling doing this completely theoretical like
> > > > > > > > >>> this is
> > > > problematic.
> > > > > > > > >>> Maybe you have it all laid out neatly in your head but I
> > > > > > > > >>> suspect not all of TC can picture it clearly enough
> > > > > > > > >>> based just on spec
> > > > text.
> > > > > > > > >>>
> > > > > > > > >>> We do sometimes ask for POC implementation in linux /
> > > > > > > > >>> qemu to demonstrate how things work before merging code.
> > > > > > > > >>> We skipped this for admin things so far but I think it's
> > > > > > > > >>> a good idea to start doing it here.
> > > > > > > > >>>
> > > > > > > > >>> What makes me pause a bit before saying please do a PoC
> > > > > > > > >>> is all the opposition that seems to exist to even using
> > > > > > > > >>> admin commands in the 1st place. I think once we finally
> > > > > > > > >>> stop arguing about whether to use admin commands at all
> > > > > > > > >>> then a PoC will be needed
> > > > > > before merging.
> > > > > > > > >> We have POR productions that implemented the approach in
> > > > > > > > >> my
> > > > series.
> > > > > > > > >> They are multiple generations of productions in market
> > > > > > > > >> and running in customers data centers for years.
> > > > > > > > >>
> > > > > > > > >> Back to 2019 when we start working on vDPA, we have sent
> > > > > > > > >> some samples of production(e.g., Cascade Glacier) and the
> > > > > > > > >> datasheet, you can find live migration facilities there,
> > > > > > > > >> includes suspend, vq state and other features.
> > > > > > > > >>
> > > > > > > > >> And there is an reference in DPDK live migration, I have
> > > > > > > > >> provided this page
> > > > > > > > >> before:
> > > > > > > > >> https://doc.dpdk.org/guides-21.11/vdpadevs/ifc.html, it
> > > > > > > > >> has been working for long long time.
> > > > > > > > >>
> > > > > > > > >> So if we let the facts speak, if we want to see if the
> > > > > > > > >> proposal is proven to work, I would
> > > > > > > > >> say: They are POR for years, customers already deployed
> > > > > > > > >> them for
> > > > years.
> > > > > > > > > And I guess what you are trying to say is that this
> > > > > > > > > patchset we are reviewing here should be help to the same
> > > > > > > > > standard and there should be a PoC? Sounds reasonable.
> > > > > > > > Yes and the in-marketing productions are POR, the series
> > > > > > > > just improves the design, for example, our series also use
> > > > > > > > registers to track vq state, but improvements than CG or
> > > > > > > > BSC. So I think they are proven
> > > > > > to work.
> > > > > > >
> > > > > > > If you prefer to go the route of POR and production and proven
> > > > > > > documents
> > > > > > etc, there is ton of it of multiple types of products I can dump
> > > > > > here with open- source code and documentation and more.
> > > > > > > Let me know what you would like to see.
> > > > > > >
> > > > > > > Michael has requested some performance comparisons, not all
> > > > > > > are ready to
> > > > > > share yet.
> > > > > > > Some are present that I will share in coming weeks.
> > > > > > >
> > > > > > > And all the vdpa dpdk you published does not have basic CVQ
> > > > > > > support when I
> > > > > > last looked at it.
> > > > > > > Do you know when was it added?
> > > > > >
> > > > > > It's good enough for PoC I think, CVQ or not.
> > > > > > The problem with CVQ generally, is that VDPA wants to shadow CVQ
> > > > > > it at all times because it wants to decode and cache the
> > > > > > content. But this problem has nothing to do with dirty tracking
> > > > > > even though it also
> > > > mentions "shadow":
> > > > > > if device can report it's state then there's no need to shadow CVQ.
> > > > >
> > > > > For the performance numbers with the pre-copy and device context
> > > > > of
> > > > patches posted 1 to 5, the downtime reduction of the VM is 3.71x
> > > > with active traffic on 8 RQs at 100Gbps port speed.
> > > >
> > > > Sounds good can you please post a bit more detail?
> > > > which configs are you comparing what was the result on each of them.
> > >
> > > Common config: 8+8 tx and rx queues.
> > > Port speed: 100Gbps
> > > QEMU 8.1
> > > Libvirt 7.0
> > > GVM: Centos 7.4
> > > Device: virtio VF hardware device
> > >
> > > Config_1: virtio suspend/resume similar to what Lingshan has, largely
> > > vdpa stack
> > > Config_2: Device context method of admin commands
> > 
> > OK that sounds good. The weird thing here is that you measure "downtime".
> > What exactly do you mean here?
> > I am guessing it's the time to retrieve on source and re-program device state on
> > destination? And this is 3.71x out of how long?
> Yes. Downtime is the time during which the VM is not responding or receiving packets, which involves reprogramming the device.
> 3.71x is relative time for this discussion.

Oh interesting. So VM state movement including reprogramming the CPU is
dominated by reprogramming this single NIC, by a factor of almost 4?
CAn we get some absolute numbers too, please?

-- 
MST



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]