[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [virtio-comment] Re: [PATCH v3 6/8] admin: Add theory of operation for write recording commands
On Wed, Nov 22, 2023 at 12:28âPM Parav Pandit <parav@nvidia.com> wrote: > > > > > From: Jason Wang <jasowang@redhat.com> > > Sent: Wednesday, November 22, 2023 9:50 AM > > > > On Wed, Nov 22, 2023 at 12:30âAM Parav Pandit <parav@nvidia.com> wrote: > > > > > > > > > > From: Jason Wang <jasowang@redhat.com> > > > > Sent: Tuesday, November 21, 2023 12:25 PM > > > > > > > > On Fri, Nov 17, 2023 at 10:48âPM Parav Pandit <parav@nvidia.com> > > wrote: > > > > > > > > > > > > > > > > From: Michael S. Tsirkin <mst@redhat.com> > > > > > > Sent: Friday, November 17, 2023 7:31 PM > > > > > > To: Parav Pandit <parav@nvidia.com> > > > > > > > > > > > > On Fri, Nov 17, 2023 at 01:03:03PM +0000, Parav Pandit wrote: > > > > > > > > > > > > > > > > > > > > > > From: Michael S. Tsirkin <mst@redhat.com> > > > > > > > > Sent: Friday, November 17, 2023 6:02 PM > > > > > > > > > > > > > > > > On Fri, Nov 17, 2023 at 12:11:15PM +0000, Parav Pandit wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > From: Michael S. Tsirkin <mst@redhat.com> > > > > > > > > > > Sent: Friday, November 17, 2023 5:35 PM > > > > > > > > > > To: Parav Pandit <parav@nvidia.com> > > > > > > > > > > > > > > > > > > > > On Fri, Nov 17, 2023 at 11:45:20AM +0000, Parav Pandit wrote: > > > > > > > > > > > > > > > > > > > > > > > From: Michael S. Tsirkin <mst@redhat.com> > > > > > > > > > > > > Sent: Friday, November 17, 2023 5:04 PM > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Nov 17, 2023 at 11:05:16AM +0000, Parav Pandit > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From: Michael S. Tsirkin <mst@redhat.com> > > > > > > > > > > > > > > Sent: Friday, November 17, 2023 4:30 PM > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Nov 17, 2023 at 10:03:47AM +0000, Parav > > > > > > > > > > > > > > Pandit > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From: Zhu, Lingshan <lingshan.zhu@intel.com> > > > > > > > > > > > > > > > > Sent: Friday, November 17, 2023 3:30 PM > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On 11/16/2023 7:59 PM, Michael S. Tsirkin wrote: > > > > > > > > > > > > > > > > > On Thu, Nov 16, 2023 at 06:28:07PM +0800, > > > > > > > > > > > > > > > > > Zhu, Lingshan > > > > > > > > wrote: > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> On 11/16/2023 1:51 PM, Michael S. Tsirkin wrote: > > > > > > > > > > > > > > > > >>> On Thu, Nov 16, 2023 at 05:29:54AM > > > > > > > > > > > > > > > > >>> +0000, Parav Pandit > > > > > > > > wrote: > > > > > > > > > > > > > > > > >>>> We should expose a limit of the device > > > > > > > > > > > > > > > > >>>> in the proposed > > > > > > > > > > > > > > > > WRITE_RECORD_CAP_QUERY command, that how > > > > > > > > > > > > > > > > much > > > > > > range > > > > > > > > > > > > > > > > it can > > > > > > > > > > > > track. > > > > > > > > > > > > > > > > >>>> So that future provisioning framework can use > > it. > > > > > > > > > > > > > > > > >>>> > > > > > > > > > > > > > > > > >>>> I will cover this in v5 early next week. > > > > > > > > > > > > > > > > >>> I do worry about how this can even work > > though. > > > > > > > > > > > > > > > > >>> If you want a generic device you do not > > > > > > > > > > > > > > > > >>> get to dictate how much memory VM > > > > > > > > > > > > has. > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > >>> Aren't we talking bit per page? With > > > > > > > > > > > > > > > > >>> 1TByte of memory to track > > > > > > > > > > > > > > > > >>> -> 256Gbit -> 32Gbit -> 8Gbyte per VF? > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > >>> And you happily say "we'll address this in the > > future" > > > > > > > > > > > > > > > > >>> while at the same time fighting tooth > > > > > > > > > > > > > > > > >>> and nail against adding single bit > > > > > > > > > > > > > > > > >>> status registers because > > > > > > scalability? > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > >>> I have a feeling doing this completely > > > > > > > > > > > > > > > > >>> theoretical like this is > > > > > > > > > > > > problematic. > > > > > > > > > > > > > > > > >>> Maybe you have it all laid out neatly in > > > > > > > > > > > > > > > > >>> your head but I suspect not all of TC > > > > > > > > > > > > > > > > >>> can picture it clearly enough based just > > > > > > > > > > > > > > > > >>> on spec > > > > > > > > > > > > text. > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > >>> We do sometimes ask for POC > > > > > > > > > > > > > > > > >>> implementation in linux / qemu to > > > > > > > > > > > > > > > > >>> demonstrate how things work before > > > > > > > > > > > > > > > > >>> merging > > > > > > > > code. > > > > > > > > > > > > > > > > >>> We skipped this for admin things so far > > > > > > > > > > > > > > > > >>> but I think it's a good idea to start doing it here. > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > >>> What makes me pause a bit before saying > > > > > > > > > > > > > > > > >>> please do a PoC is all the opposition > > > > > > > > > > > > > > > > >>> that seems to exist to even using admin > > > > > > > > > > > > > > > > >>> commands in the 1st place. I think once > > > > > > > > > > > > > > > > >>> we finally stop arguing about whether to > > > > > > > > > > > > > > > > >>> use admin commands at all then a PoC > > > > > > > > > > > > > > > > >>> will be needed > > > > > > > > > > > > > > before merging. > > > > > > > > > > > > > > > > >> We have POR productions that implemented > > > > > > > > > > > > > > > > >> the approach in my > > > > > > > > > > > > series. > > > > > > > > > > > > > > > > >> They are multiple generations of > > > > > > > > > > > > > > > > >> productions in market and running in > > > > > > > > > > > > > > > > >> customers data centers for > > > > years. > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> Back to 2019 when we start working on > > > > > > > > > > > > > > > > >> vDPA, we have sent some samples of > > > > > > > > > > > > > > > > >> production(e.g., Cascade > > > > > > > > > > > > > > > > >> Glacier) and the datasheet, you can find > > > > > > > > > > > > > > > > >> live migration facilities there, includes > > > > > > > > > > > > > > > > >> suspend, vq state and other > > > > > > > > features. > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> And there is an reference in DPDK live > > > > > > > > > > > > > > > > >> migration, I have provided this page > > > > > > > > > > > > > > > > >> before: > > > > > > > > > > > > > > > > >> https://doc.dpdk.org/guides-21.11/vdpadev > > > > > > > > > > > > > > > > >> s/if c.ht ml, it has been working for > > > > > > > > > > > > > > > > >> long long time. > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> So if we let the facts speak, if we want > > > > > > > > > > > > > > > > >> to see if the proposal is proven to work, > > > > > > > > > > > > > > > > >> I would > > > > > > > > > > > > > > > > >> say: They are POR for years, customers > > > > > > > > > > > > > > > > >> already deployed them for > > > > > > > > > > > > years. > > > > > > > > > > > > > > > > > And I guess what you are trying to say is > > > > > > > > > > > > > > > > > that this patchset we are reviewing here > > > > > > > > > > > > > > > > > should be help to the same standard and > > > > > > > > > > > > > > > > > there should be a PoC? Sounds > > > > > > reasonable. > > > > > > > > > > > > > > > > Yes and the in-marketing productions are > > > > > > > > > > > > > > > > POR, the series just improves the design, > > > > > > > > > > > > > > > > for example, our series also use registers > > > > > > > > > > > > > > > > to track vq state, but improvements than CG > > > > > > > > > > > > > > > > or BSC. So I think they are proven > > > > > > > > > > > > > > to work. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > If you prefer to go the route of POR and > > > > > > > > > > > > > > > production and proven documents > > > > > > > > > > > > > > etc, there is ton of it of multiple types of > > > > > > > > > > > > > > products I can dump here with open- source code > > > > > > > > > > > > > > and documentation and > > > > > > more. > > > > > > > > > > > > > > > Let me know what you would like to see. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Michael has requested some performance > > > > > > > > > > > > > > > comparisons, not all are ready to > > > > > > > > > > > > > > share yet. > > > > > > > > > > > > > > > Some are present that I will share in coming weeks. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > And all the vdpa dpdk you published does not > > > > > > > > > > > > > > > have basic CVQ support when I > > > > > > > > > > > > > > last looked at it. > > > > > > > > > > > > > > > Do you know when was it added? > > > > > > > > > > > > > > > > > > > > > > > > > > > > It's good enough for PoC I think, CVQ or not. > > > > > > > > > > > > > > The problem with CVQ generally, is that VDPA > > > > > > > > > > > > > > wants to shadow CVQ it at all times because it > > > > > > > > > > > > > > wants to decode and cache the content. But this > > > > > > > > > > > > > > problem has nothing to do with dirty tracking > > > > > > > > > > > > > > even though it also > > > > > > > > > > > > mentions "shadow": > > > > > > > > > > > > > > if device can report it's state then there's no > > > > > > > > > > > > > > need to shadow > > > > > > CVQ. > > > > > > > > > > > > > > > > > > > > > > > > > > For the performance numbers with the pre-copy and > > > > > > > > > > > > > device context of > > > > > > > > > > > > patches posted 1 to 5, the downtime reduction of the > > > > > > > > > > > > VM is 3.71x with active traffic on 8 RQs at 100Gbps port > > speed. > > > > > > > > > > > > > > > > > > > > > > > > Sounds good can you please post a bit more detail? > > > > > > > > > > > > which configs are you comparing what was the result > > > > > > > > > > > > on each of > > > > > > them. > > > > > > > > > > > > > > > > > > > > > > Common config: 8+8 tx and rx queues. > > > > > > > > > > > Port speed: 100Gbps > > > > > > > > > > > QEMU 8.1 > > > > > > > > > > > Libvirt 7.0 > > > > > > > > > > > GVM: Centos 7.4 > > > > > > > > > > > Device: virtio VF hardware device > > > > > > > > > > > > > > > > > > > > > > Config_1: virtio suspend/resume similar to what > > > > > > > > > > > Lingshan has, largely vdpa stack > > > > > > > > > > > Config_2: Device context method of admin commands > > > > > > > > > > > > > > > > > > > > OK that sounds good. The weird thing here is that you > > > > > > > > > > measure > > > > > > "downtime". > > > > > > > > > > What exactly do you mean here? > > > > > > > > > > I am guessing it's the time to retrieve on source and > > > > > > > > > > re-program device state on destination? And this is > > > > > > > > > > 3.71x out of > > > > how long? > > > > > > > > > Yes. Downtime is the time during which the VM is not > > > > > > > > > responding or receiving > > > > > > > > packets, which involves reprogramming the device. > > > > > > > > > 3.71x is relative time for this discussion. > > > > > > > > > > > > > > > > Oh interesting. So VM state movement including reprogramming > > > > > > > > the CPU is dominated by reprogramming this single NIC, by a > > > > > > > > factor of > > > > almost 4? > > > > > > > Yes. > > > > > > > > > > > > Could you post some numbers too then? I want to know whether > > > > > > that would imply that VM boot is slowed down significantly too. > > > > > > If yes that's another motivation for pci transport 2.0. > > > > > It was 1.8 sec down to 480msec. > > > > > > > > Well, there's work ongoing to reduce the downtime of the shadow > > virtqueue. > > > > > > > > Eugenio or Si-wei may share an exact number, but it should be > > > > several hundreds of ms. > > > > > > > Shadow vq is not applicable at all as comparison point because there is no > > virtio specific qemu etc software involved here. > > > > I don't get the point. > > > > Shadow virtqueue is virtio specific for sure and the core logic is decoupled of > > the vDPA logic. If not, it's bug and we need to fix. > > > The base requirement is that the software is not mediating any virtio interfaces (config, cvq, data vqs). I think we agree that any proposal should work in both passthrough and non-passthrough. No? Otherwise we circle back. Thanks
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]