OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-dev message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: RE: [virtio-dev] RE: [PATCH v2] Add device reset timeout field



> From: Jason Wang <jasowang@redhat.com>
> Sent: Friday, October 15, 2021 2:12 PM
> 
> On Fri, Oct 15, 2021 at 4:21 PM Parav Pandit <parav@nvidia.com> wrote:
> >
> >
> > > From: Jason Wang <jasowang@redhat.com>
> > > Sent: Friday, October 15, 2021 12:33 PM
> > > > >
> > > > Do you have pointer to it? I do not understand transport level reset.
> > >
> > > https://lore.kernel.org/virtualization/7ebb9ba0-69a0-2279-9b9e-
> > > 60c50db06e94@redhat.com/
> > >
> > I read it. PCI level reset to recover a faulty device is useful.
> > But we are not talking about resetting a PCI level here.
> >
> > It is very straight forward. A virtio level is reseting the device and infinitely
> waiting for it to come up.
> > Instead of doing so, device is providing hint to let it wait for the time driver
> should.
> 
> See the question below. But what I want to say is something like:
> 
> 1) transport driver may have a transport specific timeout
Yes. virtio PCI transport has transport specific timeout provided by the PCI device in this proposal.

> 2) if the transport driver meet that timeout during virtio reset, perform the
> transport specific reset
>
This is the implementation choice when a device specified timeout expires, what to do.
Should system abort initializing the device further?
Should qemu user do ctrl+c?
Should it retry device init at virtio level?
Should it call transport (in this case PCI FLR) to reset the device?
This depends on where virtio device is used.
This proposal doesn't define what a system should do when reset timer expires. It out of scope and implementation choice based on the env.

> It looks to me it will guarantee the success of the reset?

You are describing the timeout expiration handling.
Yes, an implementation can choose to perform transport level reset when timeout expires.

> 
> >
> > > > If you are asking can PCI reset can be used to reset the device? Yes.
> > >
> > > Yes, I meant PCI reset.
> > >
> > > > But here what we are talking about is, when virtio layer issue the
> > > > device
> > > reset, how long should it wait for that reset to complete.
> > >
> > > A question is that what do we expect the driver to do if there's a
> > > timeout on the device reset?
> > Oh ok. I understand your question now.
> >
> > As described in this proposal, when timeout expires during
> > initialization probe time, this device cannot be operated by the virtio layer,
> So it simply aborts further initialization sequence of the spec in section 3.1.1
> after step_1.
> >
> > Device probe has failed. So HV has several options to act on it.
> > For example, do device specific debug, or do transport specific reset, recreate
> the device.
> 
> But how about the device removal?
> 
> >
> > >
> > > > Typical example is virtio driver loading on physical virtio device
> > > > agnostic of
> > > the transport and waiting for the device to come out of reset.
> > >
> > > I'm not sure how hard we can meet a device/transport agnostic timeout.
> > > E.g how can we know the timeout satisfy for the requirement of all
> > > the transport?
> > >
> > > >
> > > > > > If src is hw and dst is sw, sw will likely have similar
> > > > > > capabilities as hw, not just
> > > > > this particular one but many other. Isn't it?
> > > > >
> > > > > The problem is the when src is software backend without this
> capability.
> > > >
> > > > It isnât any different than any other RW field of the device.
> > > > For example sw did virtio pci device emulation with 30 msix
> > > > vectors and hw
> > > has only 2.
> > >
> > > In the case of MSI-X, the migration can't be done directly from src
> > > to dest. The only choice is some kind of software mediation.
> > >
> > > In the case of timeout, the mediation won't even help, since we
> > > don't even present the timeout to guest. Even if the hypervisor can
> > > see the timeout interface.
> > Since sw didnât presented the timeout to the guest, it will find a HW device
> with similar capability (without timeout).
> > Or may be it can find a device that already passed the device reset state and
> attach such device.
> > HV has its options.
> >
> > Still not any different than rest of the RW capabilities/registers.
> 
> So the timeout doesn't work if we can enforce that. 
Didnât follow your comment. Enforce what?

> Seems sub optimal than doing a transport reset if possible.

Transport level reset is not helpful for two main reasons.
1. A device under reset has nothing wrong at the PCI level.
Device is just still under boot/resetting to service virtio level commands.
So doing PCI level reset is actually adversely affect it further because PCI level reset will trigger more layers of unwanted reset.

2. PCI level reset must complete in 100msec.

I still don't understand mixing transport level reset with virtio level reset in context of LM.

If I read your original concern, that a sw device was exposed on HV-1 without device reset timeout.
And now user wants to migrate this VM to HV-2 which has a hw device with device reset timeout.
And now this device cannot be passed to the HV-2 because older sw device was attached to the VM.
I am frankly not sure, if this is really a case to consider.
It still similar to any other migration case where two devices have different capabilities.
For example, SW device supported 16 RSS queues. HW has 4 RSS queues....
So now emulate that as well? :)
In another MSI-X example you provided emulated MSIX alterative was proposed. This is again not mapping the full hw device.

So it is still sw device on other hand. User might as well just continue in sw mode.

Let's assume for a moment that, this (rare) case should be considered.

Once way to overcome this is below.
(a) Do not place new device_reset_timeout field in struct virtio_pci_common_cfg.
This is because this map be in memory mapped BAR directly accessible to the guest and future version of the spec has more fields after device reset timeout.
Such fields are somehow implemented by the sw device on HV-1.

(b) Instead, define new cfg_type VIRTIO_PCI_CAP_DEV_TIMEOUT_CFG.
This points to the struct containing device reset timeout.
Exposing this capability can be skipped when mapping the hw virtio device to VM on HV-2 to match sw virtio device of HV-1.


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]