virtio-dev message

Subject: RE: [virtio-dev] RE: [PATCH v2] Add device reset timeout field

From: Parav Pandit <parav@nvidia.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Date: Thu, 28 Oct 2021 04:01:52 +0000


> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Wednesday, October 27, 2021 1:56 PM
> 
> On Wed, Oct 27, 2021 at 08:04:10AM +0000, Parav Pandit wrote:
> >
> > > From: Jason Wang <jasowang@redhat.com>
> > > Sent: Tuesday, October 26, 2021 9:34 AM
> > >
> > > On Mon, Oct 25, 2021 at 2:11 PM Parav Pandit <parav@nvidia.com> wrote:
> > > >
> > > >
> > > >
> > > > > From: Jason Wang <jasowang@redhat.com>
> > > > > Sent: Monday, October 25, 2021 11:12 AM
> > > > [..]
> > > > > So what I meant is, if PCI gives us an upper limit of the reset
> > > > > (e.g 100ms as you said), can virtio device perform reset more than that?
> > > > > If not, can we simply use 100ms for virtio-pci or can we gain
> > > > > from having a
> > > smaller per device timeout?
> > > > > If yes, it looks like a violation for PCI.
> > > > No. Not at all a violation of PCI spec.
> > > > PCI spec is transport layer.
> > > > Virtio io is layer above transport.
> > > >
> > > > PCI device can be reset, but not necessary, it is ready to service
> > > > upper layer
> > > protocols who may need more time to finish the reset.
> > >
> > > Isn't transport level reset implies a virtio reset? I think this is
> > > the question we need to answer or clarify it in the spec. All my
> > > suggestions were based on the answer "yes".
> > Transport level resets are often too short. It ensures that reset occurred,
> hence, notifications and DMA activities won't occur after transport reset.
> > However, it may not have finished the upper layer reset to re-service the
> device after transport reset.
> >
> > >
> > > Another question is that we don't define reset actually,
> > It has reasonably well definition in the spec in commit: a306bf467850e
> > ("clarify device reset")
> >
> > > do we need to do that
> > > before trying to define reset timeout? To me the reset means the
> > > device can be used by the driver which means all the basic device facilities
> are ready.
> > Reset is well in spec 1.0 compare to 0.95 indicating that driver need to wait
> reset to finish.
> >
> > i.e. that driver should wait for device status to turn zero during reset and
> must do reset.
> >
> > Unlike 0.95 that explicitly stated " Reset the device. This is not required on
> initial startup." In section 2.2.1.
> 
> 
> Right. So again, I'm trying to help you here. I think the problem with your
> approach so far is that it conflates the two things, physical device startup
> might take a lot of time and it's ok since physical host boot takes very long.
> 
It is the preboot env driver that may need to use the physical virtio device before the OS driver uses it.
Physical host booting before OS takes lot less than complete physical server booting.

> This does not seem to be related to a driver init at all which is what the spec
> describes.

> 
> 
> A reset timeout which can run into seconds will break usecases such as kata
> containers. This is why there's so much resistance to adding such a feature to
> the spec. reset should be short, in fact it seems reasonable for spec to give
> guidance that it should take around ~10ms tops.
It doesnât make sense to put such short timing requirements on the device implementation.
One size doesn't fit all.
Those use cases that requires extremely short device reset timing, will not use physical devices or device with higher reset timeout anyway.
So there is no reason to limit device timing by just one use case, which is even shorter than PCI reset timing!
 
> 
> > > It
> > > means the device is ready from the perspective of a "driver", if the
> > > device requires a connection to a remote subsystem, whether or not
> > > such connection is established is out of the scope of virtio reset.
> > Sure such implementation detail about remote system connection is outside
> the scope of spec.
> >
> > > Devices can choose not to process the requests from the virtqueue.
> > Yes, that is right, but it is not limited to only vq processing.
> > As mentioned in the spec section 3.1.1 before VQs are created feature bits,
> virtio config space values can be read only after reset step_1 is completed.
> >
> > >
> > > Another question is that, can we present a PCI device if the
> > > underlayer device logic is not ready?
> > Yes, this is very common for hw based devices, unlike synthetic devices for
> guest VMs.
> > This is also discussed in this thread with sriov example.
> >
> > > If the question is yes, do we need a new facility to tell the driver
> > > that the device is being initialized?
> > >
> > May be yes, a different register than the reset can be helpful to indicate that.
> > Though it shares lot of functionalities common to the reseting the device.
> >
> > Given that this is too early in device initialization sequence, Device
> > doesnât know if should return fake reset_completed status and if driver is
> going to wait for initializing register or not.
> >
> > So let reset register indicate if the reset is completed or not seems more
> appropriate and less complex solution.
> 
> It's not reset though. What is not completed is startup. virtio reset can't take
> seconds.
> 
One use case as lowest common denominator doesn't drive the spec. it should be able to cover wide range of use cases.
And this proposal covers it. Kata container use case can even use the device that doesn't have timeout.

> 
> And just to make things clear this is not a reset timeout.
> It's a readiness timeout unaffected by reset.
>
Yes.
 
> Here's one way I see it working that will likely cause much less contention than
> your current proposals:
> 
> - new capability that allows reporting device readiness.

Yes. I will explore this option.

>   how it looks like, whether there is a timeout is
>   a separate question.

> E.g. a read counter sounds
>   more generic, if device is stuck it can just stop
>   incrementing it.
This is not scalable solution for hundreds of devices and it breaks the layering too.
A virtio layer itself is not in up_state to perform counter update.
A non virtio layer is expected to implement this counter. This is not right.

But this timer is not useful for publishing device readiness. Having device readiness register is enough to indicate that.
For health monitoring such counting register is useful. But its out of scope to address this problem.
So will skip this.

I will explore the device readiness register and update the proposal.
Thanks.

Follow-Ups:
- Re: [virtio-dev] RE: [PATCH v2] Add device reset timeout field
  - From: "Michael S. Tsirkin" <mst@redhat.com>

References:
- Re: [virtio-dev] RE: [PATCH v2] Add device reset timeout field
  - From: Jason Wang <jasowang@redhat.com>
- RE: [virtio-dev] RE: [PATCH v2] Add device reset timeout field
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-dev] RE: [PATCH v2] Add device reset timeout field
  - From: Jason Wang <jasowang@redhat.com>
- RE: [virtio-dev] RE: [PATCH v2] Add device reset timeout field
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-dev] RE: [PATCH v2] Add device reset timeout field
  - From: Jason Wang <jasowang@redhat.com>
- RE: [virtio-dev] RE: [PATCH v2] Add device reset timeout field
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-dev] RE: [PATCH v2] Add device reset timeout field
  - From: Jason Wang <jasowang@redhat.com>
- RE: [virtio-dev] RE: [PATCH v2] Add device reset timeout field
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-dev] RE: [PATCH v2] Add device reset timeout field
  - From: Jason Wang <jasowang@redhat.com>
- RE: [virtio-dev] RE: [PATCH v2] Add device reset timeout field
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-dev] RE: [PATCH v2] Add device reset timeout field
  - From: "Michael S. Tsirkin" <mst@redhat.com>