OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-dev message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [virtio-dev] RE: [PATCH v2] Add device reset timeout field


On Fri, Oct 15, 2021 at 4:21 PM Parav Pandit <parav@nvidia.com> wrote:
>
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Friday, October 15, 2021 12:33 PM
> > > >
> > > Do you have pointer to it? I do not understand transport level reset.
> >
> > https://lore.kernel.org/virtualization/7ebb9ba0-69a0-2279-9b9e-
> > 60c50db06e94@redhat.com/
> >
> I read it. PCI level reset to recover a faulty device is useful.
> But we are not talking about resetting a PCI level here.
>
> It is very straight forward. A virtio level is reseting the device and infinitely waiting for it to come up.
> Instead of doing so, device is providing hint to let it wait for the time driver should.

See the question below. But what I want to say is something like:

1) transport driver may have a transport specific timeout
2) if the transport driver meet that timeout during virtio reset,
perform the transport specific reset

It looks to me it will guarantee the success of the reset?

>
> > > If you are asking can PCI reset can be used to reset the device? Yes.
> >
> > Yes, I meant PCI reset.
> >
> > > But here what we are talking about is, when virtio layer issue the device
> > reset, how long should it wait for that reset to complete.
> >
> > A question is that what do we expect the driver to do if there's a timeout on the
> > device reset?
> Oh ok. I understand your question now.
>
> As described in this proposal, when timeout expires during initialization probe time, this device cannot be operated by the virtio layer,
> So it simply aborts further initialization sequence of the spec in section 3.1.1 after step_1.
>
> Device probe has failed. So HV has several options to act on it.
> For example, do device specific debug, or do transport specific reset, recreate the device.

But how about the device removal?

>
> >
> > > Typical example is virtio driver loading on physical virtio device agnostic of
> > the transport and waiting for the device to come out of reset.
> >
> > I'm not sure how hard we can meet a device/transport agnostic timeout.
> > E.g how can we know the timeout satisfy for the requirement of all the
> > transport?
> >
> > >
> > > > > If src is hw and dst is sw, sw will likely have similar
> > > > > capabilities as hw, not just
> > > > this particular one but many other. Isn't it?
> > > >
> > > > The problem is the when src is software backend without this capability.
> > >
> > > It isnât any different than any other RW field of the device.
> > > For example sw did virtio pci device emulation with 30 msix vectors and hw
> > has only 2.
> >
> > In the case of MSI-X, the migration can't be done directly from src to dest. The
> > only choice is some kind of software mediation.
> >
> > In the case of timeout, the mediation won't even help, since we don't even
> > present the timeout to guest. Even if the hypervisor can see the timeout
> > interface.
> Since sw didnât presented the timeout to the guest, it will find a HW device with similar capability (without timeout).
> Or may be it can find a device that already passed the device reset state and attach such device.
> HV has its options.
>
> Still not any different than rest of the RW capabilities/registers.

So the timeout doesn't work if we can enforce that. Seems sub optimal
than doing a transport reset if possible.

Thanks



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]