[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [PATCH v2] Add device reset timeout field
On Fri, Oct 08, 2021 at 12:12:35PM +0000, Parav Pandit wrote: > > > > From: Cornelia Huck <cohuck@redhat.com> > > Sent: Friday, October 8, 2021 5:18 PM > > > > On Fri, Oct 08 2021, "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > > > On Fri, Oct 08, 2021 at 10:59:02AM +0000, Parav Pandit wrote: > > >> It may be even a pre-boot environment where 100msec or 10msec may be > > too short interval as other extreme of VM boot time example. > > > > > > I don't really know what this means. We are talking about how long it > > > takes device to calm down and stop poking at the host after it's told > > > to reset. 10ms worst case not enough for this? > > > > To me, that sounded more like a physical device that needs to do something > > like boot its firmware before it can perform an actual virtio operation (and > > reset simply happens to be the first one.) > > > > So, I'm getting more confused about the scope of this timeout. If it's more a > > "device might not be ready yet" issue, I don't think we need a timeout for reset > > specifically. Same for races with hotplugging. If it's about "reset may take some > > time, because it will take some time before all operations have quiesced", I > > don't see how the device can come up with a value that isn't anything other > > than a wild guess, and the driver could do wild guessing equally well. > > Device implementation has good knowledge of how a given virtio device is implemented to not do wild guess. > I will take real world examples. > 1. A physical virtio device backed by a firmware will take more than 10msec of boot time to respond to the reset operation. > > 2. A sriov VF virtio device for our case takes a lot lesser than this, but may take anywhere between 10 msec to 250msec. > This can happen on a firmware where user enabled 500 SR-IOV VFs. > Pci spec indicates that all VFs to initialize within 100msec. This translates to 0.2msec for one VF. > In some scenario this can be a hard to initialize a VF in 0.2 msec depending on what else a firmware is doing at that time. That's separate from virtio reset though. virtio reset is much lighter weight than a VF reset, all it needs to do is return config space to original values and stop DMA. > 3. A system has one or more virtio boot devices. > One of them happens to be faulty after a firmware upgrade. > Pre-boot env is infinitely waiting. Michael suggest to do disable such PCI slot by means of abstract Ctrl+C. > If PCI slot is disabled, that device must be physically taken out for recovery. > In an alternative, if device advertised a finite timeout, that device didn't boot, system gave up after finite timeout and server picked second boot option, and booted. > Now a system admin can repair the faulty device without physically taking it out. > Will infinite timeout help here? Or a device advertising finite timeout and recovering the system more useful? > > 4. device was hotplug in system and before it is fully probed, a hot unplug is triggered. I don't get this one. Are you talking about surprise removal here? The way to handle that is surely not a timeout, we should be able to test for device presence. > Device cannot respond to reset, because its hot unplugged. > OS waits infinitely for reset to complete. > And system component is stuck just because of one device. > Would a finite timeout help to abort this operation? Yes. Except if it takes minutes it is not agile enough for many workloads. > > So is wild guess of 10msec for all devices or an infinite time most efficient way to handle above scenarios? Donnu, but as I hope you begin to see, as we start digging into actual requirements, neither does a huge reset promise by the device. How about some "keepalive" signal then? E.g. a register where each read needs to respond with a different value, if it's the same then device is stuck ... -- MST
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]