[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [virtio-comment] Re: [PATCH v3 6/8] admin: Add theory of operation for write recording commands
On 11/20/2023 10:55 PM, Jason Wang wrote:
That was mostly for device teardown time at the the source but there's also setup cost at the destination that needs to be counted. Several hundred of milliseconds would be the ultimate goal I would say (right now the numbers from Parav more or less reflects the status quo but there's ongoing work to make it further down), and I don't doubt several hundreds of ms is possible. But to be fair, on the other hand, shadow vq on real vdpa hardware device would need a lot of dedicated optimization work across all layers (including hardware or firmware) all over the places to achieve what a simple suspend-resume (save/load) interface can easily do with VFIO migration.On Fri, Nov 17, 2023 at 10:48âPM Parav Pandit <parav@nvidia.com> wrote:From: Michael S. Tsirkin <mst@redhat.com> Sent: Friday, November 17, 2023 7:31 PM To: Parav Pandit <parav@nvidia.com> On Fri, Nov 17, 2023 at 01:03:03PM +0000, Parav Pandit wrote:From: Michael S. Tsirkin <mst@redhat.com> Sent: Friday, November 17, 2023 6:02 PM On Fri, Nov 17, 2023 at 12:11:15PM +0000, Parav Pandit wrote:From: Michael S. Tsirkin <mst@redhat.com> Sent: Friday, November 17, 2023 5:35 PM To: Parav Pandit <parav@nvidia.com> On Fri, Nov 17, 2023 at 11:45:20AM +0000, Parav Pandit wrote:From: Michael S. Tsirkin <mst@redhat.com> Sent: Friday, November 17, 2023 5:04 PM On Fri, Nov 17, 2023 at 11:05:16AM +0000, Parav Pandit wrote:From: Michael S. Tsirkin <mst@redhat.com> Sent: Friday, November 17, 2023 4:30 PM On Fri, Nov 17, 2023 at 10:03:47AM +0000, Parav Pandit wrote:From: Zhu, Lingshan <lingshan.zhu@intel.com> Sent: Friday, November 17, 2023 3:30 PM On 11/16/2023 7:59 PM, Michael S. Tsirkin wrote:On Thu, Nov 16, 2023 at 06:28:07PM +0800, Zhu, Lingshanwrote:On 11/16/2023 1:51 PM, Michael S. Tsirkin wrote:On Thu, Nov 16, 2023 at 05:29:54AM +0000, Parav Panditwrote:We should expose a limit of the device in the proposedWRITE_RECORD_CAP_QUERY command, that how muchrangeit cantrack.So that future provisioning framework can use it. I will cover this in v5 early next week.I do worry about how this can even work though. If you want a generic device you do not get to dictate how much memory VMhas.Aren't we talking bit per page? With 1TByte of memory to track -> 256Gbit -> 32Gbit -> 8Gbyte per VF? And you happily say "we'll address this in the future" while at the same time fighting tooth and nail against adding single bit status registers becausescalability?I have a feeling doing this completely theoretical like this isproblematic.Maybe you have it all laid out neatly in your head but I suspect not all of TC can picture it clearly enough based just on spectext.We do sometimes ask for POC implementation in linux / qemu to demonstrate how things work before mergingcode.We skipped this for admin things so far but I think it's a good idea to start doing it here. What makes me pause a bit before saying please do a PoC is all the opposition that seems to exist to even using admin commands in the 1st place. I think once we finally stop arguing about whether to use admin commands at all then a PoC will be neededbefore merging.We have POR productions that implemented the approach in myseries.They are multiple generations of productions in market and running in customers data centers for years. Back to 2019 when we start working on vDPA, we have sent some samples of production(e.g., Cascade Glacier) and the datasheet, you can find live migration facilities there, includes suspend, vq state and otherfeatures.And there is an reference in DPDK live migration, I have provided this page before: https://doc.dpdk.org/guides-21.11/vdpadevs/ifc.ht ml, it has been working for long long time. So if we let the facts speak, if we want to see if the proposal is proven to work, I would say: They are POR for years, customers already deployed them foryears.And I guess what you are trying to say is that this patchset we are reviewing here should be help to the same standard and there should be a PoC? Soundsreasonable.Yes and the in-marketing productions are POR, the series just improves the design, for example, our series also use registers to track vq state, but improvements than CG or BSC. So I think they are provento work.If you prefer to go the route of POR and production and proven documentsetc, there is ton of it of multiple types of products I can dump here with open- source code and documentation andmore.Let me know what you would like to see. Michael has requested some performance comparisons, not all are ready toshare yet.Some are present that I will share in coming weeks. And all the vdpa dpdk you published does not have basic CVQ support when Ilast looked at it.Do you know when was it added?It's good enough for PoC I think, CVQ or not. The problem with CVQ generally, is that VDPA wants to shadow CVQ it at all times because it wants to decode and cache the content. But this problem has nothing to do with dirty tracking even though it alsomentions "shadow":if device can report it's state then there's no need to shadowCVQ.For the performance numbers with the pre-copy and device context ofpatches posted 1 to 5, the downtime reduction of the VM is 3.71x with active traffic on 8 RQs at 100Gbps port speed. Sounds good can you please post a bit more detail? which configs are you comparing what was the result on each ofthem.Common config: 8+8 tx and rx queues. Port speed: 100Gbps QEMU 8.1 Libvirt 7.0 GVM: Centos 7.4 Device: virtio VF hardware device Config_1: virtio suspend/resume similar to what Lingshan has, largely vdpa stack Config_2: Device context method of admin commandsOK that sounds good. The weird thing here is that you measure"downtime".What exactly do you mean here? I am guessing it's the time to retrieve on source and re-program device state on destination? And this is 3.71x out of how long?Yes. Downtime is the time during which the VM is not responding or receivingpackets, which involves reprogramming the device.3.71x is relative time for this discussion.Oh interesting. So VM state movement including reprogramming the CPU is dominated by reprogramming this single NIC, by a factor of almost 4?Yes.Could you post some numbers too then? I want to know whether that would imply that VM boot is slowed down significantly too. If yes that's another motivation for pci transport 2.0.It was 1.8 sec down to 480msec.Well, there's work ongoing to reduce the downtime of the shadow virtqueue. Eugenio or Si-wei may share an exact number, but it should be several hundreds of ms.
Yep. The slowness on mapping part is mostly due to the artifact of software-based implementation. IMHO for live migration p.o.v it's better to not involve any mapping operation in the down time path at all.But it seems the shadow virtqueue itself is not the major factor but the time spent on programming vendor specific mappings for example.
-Siwei
ThanksThe time didn't come from pci side or boot side. For pci side of things you would want to compare the pci vs non pci device based VM boot time.This publicly archived list offers a means to provide input to the OASIS Virtual I/O Device (VIRTIO) TC. In order to verify user consent to the Feedback License terms and to minimize spam in the list archive, subscription is required before posting. Subscribe: virtio-comment-subscribe@lists.oasis-open.org Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org List help: virtio-comment-help@lists.oasis-open.org List archive: https://lists.oasis-open.org/archives/virtio-comment/ Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists Committee: https://www.oasis-open.org/committees/virtio/ Join OASIS: https://www.oasis-open.org/join/
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]