OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-dev message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: RE: [PATCH v2] Add device reset timeout field



> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Friday, October 8, 2021 6:27 PM

> > 2. A sriov VF virtio device for our case takes a lot lesser than this, but may
> take anywhere between 10 msec to 250msec.
> > This can happen on a firmware where user enabled 500 SR-IOV VFs.
> > Pci spec indicates that all VFs to initialize within 100msec. This translates to
> 0.2msec for one VF.
> > In some scenario this can be a hard to initialize a VF in 0.2 msec depending
> on what else a firmware is doing at that time.
> 
> That's separate from virtio reset though. virtio reset is much lighter weight
> than a VF reset, all it needs to do is return config space to original values and
> stop DMA.
Again you took the valid example to stop the DMA of already initialized device, while above case is for the first init. :-)
virtio device is going the first reset during initialization. It should be able to tell how long to wait.
A device firmware may take more than 0.2msec to finish needed initialization to serve a virtio device.
Infinite wait of today works here. Question was for wild guess by driver for 100msec vs 10msec vs 0.2 msec.
Is that enough? 

> 
> > 3. A system has one or more virtio boot devices.
> > One of them happens to be faulty after a firmware upgrade.
> > Pre-boot env is infinitely waiting. Michael suggest to do disable such PCI slot
> by means of abstract Ctrl+C.
> > If PCI slot is disabled, that device must be physically taken out for recovery.
> > In an alternative, if device advertised a finite timeout, that device didn't boot,
> system gave up after finite timeout and server picked second boot option, and
> booted.
> > Now a system admin can repair the faulty device without physically taking it
> out.
> > Will infinite timeout help here? Or a device advertising finite timeout and
> recovering the system more useful?
> >
> > 4. device was hotplug in system and before it is fully probed, a hot unplug is
> triggered.
> 
> 
> I don't get this one. Are you talking about surprise removal here?
Yes.
> The way to handle that is surely not a timeout, we should be able to test for
> device presence.
Yes, it should be possible to update device presence of device under probe while its surprised removed.
I will look into it more.
However, this is not the only place timeout is used.
> 
> > Device cannot respond to reset, because its hot unplugged.
> > OS waits infinitely for reset to complete.
> > And system component is stuck just because of one device.
> > Would a finite timeout help to abort this operation? Yes.
> 
> Except if it takes minutes it is not agile enough for many workloads.
> 
> >
> > So is wild guess of 10msec for all devices or an infinite time most efficient
> way to handle above scenarios?
> 
> Donnu, but as I hope you begin to see, as we start digging into actual
> requirements, neither does a huge reset promise by the device.

A finite reset timeout helps in making the virtio devices more predicable to use.

> How about some "keepalive" signal then? E.g. a register where each read
> needs to respond with a different value, if it's the same then device is stuck ...

A device should be out of the reset, keep alive feature negotiated to respond to a keep alive requests from host driver.
Keep alive is useful post the reset+ init stage.
(keep alive is also used by nvme devices, similar to device ready TIMEOUT with granularity of 500msec, similar to virtio device reset timeout).


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]