OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-dev message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [virtio-dev] RE: [PATCH v2] Add device reset timeout field



å 2021/10/15 äå1:20, Parav Pandit åé:

From: Jason Wang <jasowang@redhat.com>
Sent: Friday, October 15, 2021 10:46 AM


å 2021/10/15 äå12:36, Parav Pandit åé:
From: Michael S. Tsirkin <mst@redhat.com>
Sent: Friday, October 15, 2021 3:59 AM

On Thu, Oct 14, 2021 at 05:35:37PM +0000, Parav Pandit wrote:
Hi Michael, Cornelia,

From: Parav Pandit
Sent: Tuesday, October 12, 2021 2:42 PM

From: Michael S. Tsirkin <mst@redhat.com>
Sent: Tuesday, October 12, 2021 2:32 PM

On Tue, Oct 12, 2021 at 08:51:34AM +0000, Parav Pandit wrote:
From: Michael S. Tsirkin <mst@redhat.com>
Sent: Monday, October 11, 2021 9:30 PM

On Mon, Oct 11, 2021 at 03:44:14PM +0000, Parav Pandit wrote:
This is unlikely to work the reset is completed. Because a
real device
implementing this would prefer to do this in fw for 1000
virtio devices sitting on the physical card.
And it is very much driven by such implementation at device
devel.
So it cannot update the counter value if reset is not
completed for the
device.
I think read only device reset timeout is most elegant option
during device
initialization phase that eliminates infinite loop of today.

Why can't a driver just go ahead and do a timeout regardless?
o.k. lets consider this thought exercise. What is the timeout
value that driver
will choose if device doesn't specify one?
I explained in previous thread and you acked that actual fw
based device
may take longer to initialize than pure sw implementation backend.
In second example a pre-boot device can take even longer
initialization
time.
Sriov VF device may initialize lot faster.
Instead of driver having such transport, and device specific
checks, (or some
very short or very long timeout), we propose, that let device
mention such timeout value.

Parav I think you are conflating reset with initialization time.
initialization is just for host boot which takes seconds anyway
- but no, minutes is not reasonable their, either.
reset affects guest boot. This needs to complete in milliseconds.

I cannot promise, but with newer generation devices usually
functionality
improves.
Enforcing in milliseconds doesn't look practical for type of devices.
Some of the block devices may need to establish TCP connections
in the
backend.
It is more useful to wait for few more seconds to initialize
device after power
on the system, instead of giving up booting the server completely.
For example, a nvme block device starts with a minimum timeout of
500msec.
Yes, I agree to your point that a device given to a guest VM will
likely have
very short reset time that should complete in milliseconds.
This conflation is IMHO one of the problems with this proposal.
Device initialization consist of device reset from the spec section 3.1.1.
It does. But maybe we need to create a way for driver to
distinguish between the two. When under reset, use a driver
supplied
timeout.
This make sense, because as we discussed when device undergo a
reset with active DMA, after timeout expires, driver still cannot cleanup.
So this can be short driver decided value as longer timeout is not useful.

When powering up, use a longer device supplied one.
In v0, v1 I initially considered only the powering up case of the
device initialization. There was text around that.
And v2 I removed the initialization text, and I totally missed the
above case with active DMA.
This should work.
We should word this part of the spec accordingly.
Below changes are good for v3?
1. driver should use device reset time during initialization stage
How does driver identify this though?
Existence of device_reset_timeout field in struct virtio_pci_common_cfg
indicates that this field exists.
If device support it, it will place non zero value and driver knows that this
field should be used.
2. remove feature bit as feature bits are only readable after reset
is completed 3. device reset timeout field of zero indicates that
device doesn't
support it.

I'm not sure about 3. I think each transport will need its own way to do it.

For pci a value of zero indicates it isn't supported.
For mmio DeviceResetTimeout at offset 0x04c indicates same.
Currently only these 2 transports have the use.

So I propose: maybe a capability like this, with a timeout field?
Do you mean a new capability like say VIRTIO_PCI_DEVICE_TIMEOUT like
VIRTIO_PCI_CAP_COMMON_CFG?
This will contain one or more timeout? For example with his proposal it
contains only device reset timeout.
Later same capability will be further extended to contains command timeout
too? Yes?
And within VMs, we can just do without, since it got out of reset
once it will surely get out of reset again...
Yes, VM might not need it. It is really the HV's choice to implement and not
part of the virtio spec.


Well, this will break the migration between HW virtio and SW virtio.

How does it break? Can you please explain?
SW virtio will emulate what HW virtio does. This field is exposing only read-only field to driver to not wait infinitely.


As discussed, can we do transport layer reset in this case?


If src is hw and dst is sw, sw will likely have similar capabilities as hw, not just this particular one but many other. Isn't it?


But how about the case when src is software without those capabilities?

Thanks




[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]