[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: RE: [PATCH v2] Add device reset timeout field
> From: Michael S. Tsirkin <mst@redhat.com> > Sent: Friday, October 8, 2021 6:27 PM > > 2. A sriov VF virtio device for our case takes a lot lesser than this, but may > take anywhere between 10 msec to 250msec. > > This can happen on a firmware where user enabled 500 SR-IOV VFs. > > Pci spec indicates that all VFs to initialize within 100msec. This translates to > 0.2msec for one VF. > > In some scenario this can be a hard to initialize a VF in 0.2 msec depending > on what else a firmware is doing at that time. > > That's separate from virtio reset though. virtio reset is much lighter weight > than a VF reset, all it needs to do is return config space to original values and > stop DMA. Again you took the valid example to stop the DMA of already initialized device, while above case is for the first init. :-) virtio device is going the first reset during initialization. It should be able to tell how long to wait. A device firmware may take more than 0.2msec to finish needed initialization to serve a virtio device. Infinite wait of today works here. Question was for wild guess by driver for 100msec vs 10msec vs 0.2 msec. Is that enough? > > > 3. A system has one or more virtio boot devices. > > One of them happens to be faulty after a firmware upgrade. > > Pre-boot env is infinitely waiting. Michael suggest to do disable such PCI slot > by means of abstract Ctrl+C. > > If PCI slot is disabled, that device must be physically taken out for recovery. > > In an alternative, if device advertised a finite timeout, that device didn't boot, > system gave up after finite timeout and server picked second boot option, and > booted. > > Now a system admin can repair the faulty device without physically taking it > out. > > Will infinite timeout help here? Or a device advertising finite timeout and > recovering the system more useful? > > > > 4. device was hotplug in system and before it is fully probed, a hot unplug is > triggered. > > > I don't get this one. Are you talking about surprise removal here? Yes. > The way to handle that is surely not a timeout, we should be able to test for > device presence. Yes, it should be possible to update device presence of device under probe while its surprised removed. I will look into it more. However, this is not the only place timeout is used. > > > Device cannot respond to reset, because its hot unplugged. > > OS waits infinitely for reset to complete. > > And system component is stuck just because of one device. > > Would a finite timeout help to abort this operation? Yes. > > Except if it takes minutes it is not agile enough for many workloads. > > > > > So is wild guess of 10msec for all devices or an infinite time most efficient > way to handle above scenarios? > > Donnu, but as I hope you begin to see, as we start digging into actual > requirements, neither does a huge reset promise by the device. A finite reset timeout helps in making the virtio devices more predicable to use. > How about some "keepalive" signal then? E.g. a register where each read > needs to respond with a different value, if it's the same then device is stuck ... A device should be out of the reset, keep alive feature negotiated to respond to a keep alive requests from host driver. Keep alive is useful post the reset+ init stage. (keep alive is also used by nvme devices, similar to device ready TIMEOUT with granularity of 500msec, similar to virtio device reset timeout).
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]