OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: RE: [virtio-comment] [RFC PATCH v2] virtio-rtc: Add device specification.


> From: Peter Hilber <peter.hilber@opensynergy.com>
> Sent: Thursday, July 27, 2023 3:51 PM


> >> In the spec, clock ids are 64 bit wide to have unique ids when
> >> supporting clock
> > It is unclear the namespace of uniqueness.
> > Do you mean unique id across multiple virtio devices?
> 
> The idea was not to have a unique id across multiple virtio devices.
> Rather, the idea was to simplify lifetime management for clocks in a single
> virtio device (if such lifetimes would be introduced in the
> future) by not having to worry about race conditions due to clock id reuse.
> 
If the id not unique, 16-bit is likely more than enough.
There should not be any race condition with respect to clock id as its per device.

> > If so, how is it proposed, we probably need UUID?
> >
> 
> So it would be up to the device implementation to assign unique ids for each
> clock lifetime; incrementing a counter should do.
> 
Ok. uniqueness is per device hence any scheme by device is fine as you explained.

> >> hot-plugging in the future.
> >>
> >> Any feedback is appreciated very much.
> >>
> > This is a good feature.
> >
> > A network device has a PHC which is very similar to what is being proposed
> here which is super set of this proposal.
> >
> > A network device will support packet timestamping as phase-1, followed by
> PTP synchronized UTC/TAI format in phase-2 followed by PTP timestamping
> offload in phase-3.
> > Phase-1 is already being proposed using monotonic clock method at [1].
> >
> > [1]
> > https://lists.oasis-open.org/archives/virtio-comment/202306/msg00287.h
> > tml
> >
> > We would like to have a single virtio network device to expose the optional
> RTC without creating a full blown new virtio device.
> >
> > A quick grep in the industry indicates that at least 15+ nics in industry with
> dual PTP clock + nic functionality.
> > And we would like virtio network device also to have it.
> > At minimum such solution avoids creating an additional full blown virtio
> device.
> >
> > As proposed here, RTC device as stand alone itself has its merit and use.
> >
> > So lets build the RTC device interface that can be of dual use.
> > 1. as part of stand alone RTC device
> > 2. as part of virtio network device
> >
> 
> To me it seems preferable to first try to keep the two devices separated. 
Devices can be separate, the driver to device interface between them should be common to reuse it.

> AFAIU
> PTP clocks are also usually represented as a device separate from network
> devices.
> 
Both the modes are in use, hence we want to support #1 and #2 both without duplicating the interfaces.

> The extra effort to implement another simple virtio device (rather than putting
> everything in a single device) seems to be small when compared to the
> complexity of creating an extensible specification which should work both
> standalone and as part of an already existing, but still growing, device.
It may sound as small effort with only a focused view on RTC only device.
But both types are needed.
From actual implementation point of view and use case wise, yet another PCI device at scale of virtual machines is not desired, especially when it has tight usage completing on network device.


> E.g., the
> RTC device should also get an alarm, with an associated eventq, in the future.
Sure, it is ok. it will be optional when it is RTC device.

> Also, I wonder if the driver implementation should then work both as a
> standalone Virtio driver and as part of the net driver.
>
Driver layer will be fine. Linux stack evolved over last 3 years to support multi-functionality device.
A RTC/PTP driver will be a stand alone entity that works on underlying layered way to issue read cmd, config etc.
 
> Maybe the above use case could also be addressed by a standalone RTC device
> as follows:
> 
> 1) The net device exports a UUID object to which the RTC device can refer to (so
> the drivers can link the net device to the RTC clock).
> 
> 2) If needed, the RTC device aligns the lifetime of the clock to the net device.
> Then, clock lifetimes are no longer static, and can be aligned with the net
> device. Through an eventq (to be added), the RTC device announces clock
> creation/destruction, which can be synchronized to the net device state.
> 
All of the infrastructure, cross device synchronization, its device side management will come with equal complexity.
And we will end up with sub-optimal way to using multiple PCI devices.

Net already has cvq required by RTC. So cvq commands can be common.
Readq is additional, we need to define common between the two and way to discover it.

> > Such work will enable to re-invent the wheel 2nd time in virtio net.
> >
> > What I can think of is following definitions being common between two device
> types.
> > This will be able to leverage your rtc work done as part of this patchset.
> >
> > 1. _F feature bit that can be common between the rtc and net device 2.
> > control vq command opcode and format be common 3. discovery of the
> > readq id possibly using cvq and it must depend on _F_RESET bit.
> >
> 
> Not sure what _F_RESET refers to, is it VIRTIO_F_RING_RESET? And why?
Yes, because one needs the ability to create the readq dynamically only when/if such virtio_rtc driver is loaded when a device is multi-functionality device.

> 
> >> --- /dev/null
> >> +++ b/device-types/rtc/description.tex


> If this sounds better, I could rephrase to "The driver enqueues clock reading
> requests to the readq."
> 
Looks good.

> > \footnote{The readq is a dedicated virtqueue to facilitate
> >> +reducing latency for clock readings, which may be more
> >> +performance-sensitive than other requests.} The driver uses the
> >> +controlq for other requests to the device.
> >> +
> >> +\subsection{Feature bits}\label{sec:Device Types / RTC Device /
> >> +Feature bits}
> >> +
> >> +\begin{description}
> >> +\item[VIRTIO_RTC_F_READ_CROSS (0)] Device supports cross-
> timestamping.
> >> +\end{description}
> >> +
> >> +VIRTIO_RTC_F_READ_CROSS determines whether the device supports clock
> >> +reading with cross-timestamping. Without this feature, the device
> >> +does not support cross-timestamping. Even when this feature was
> >> +negotiated, the device may still not support cross-timestamping for
> >> +some clocks, or for particular requests.
> >> +
> > Can you please explain why even after feature negotiation, device may still
> not support?
> > Is it because the cross timestamping is per clock feature?
> 
> Actually, the cross-timestamping support can be specific to the combination of
> clock and hardware counter. This is explained in more detail in the context of
> VIRTIO_RTC_M_READ_CROSS. I will mention this here as well.
> 
Hardware counter is a misleading term here. It is really platform (system) specific identifier.

Maybe I am wrong, but I think cross timestamping functionality can be achieved at the driver level, reading counter value from two sources?
Unlikely? because the variance is lot?

If so, why do we need to annotate arm/x86, can it be done without it as just system clock/cycles?

> > If so, I guess no need of a feature bit as it is not serving any meaningful
> purpose.
> 
> The only purpose is to potentially save the device some initialization work or
> resources in case the feature is not negotiated.
> 
Ok. In that case we should move it to per clock enablement instead of feature bit.

> > Each clock information can advertise what is supports, right?
> >
> 
> The driver can use VIRTIO_RTC_M_CROSS_CAP to inquire about this (or the
> driver could also just try if VIRTIO_RTC_M_READ_CROSS works).
> 
VIRTIO_RTC_M_CROSS_CAP is reliable way to discover the device support.
Direct command if it results in error can throw unwanted kernel errors resulting in false alarms.

> > If this clock device is a PCI device, on an empty system, a PCI device has to do
> at least couple of DMA.
> > 1. read the descriptor (250 to 300nsec) 2. Read the request (250 to
> > 300nsec) 3. Write the response (50 nsec+) 4. raise interrupt.
> >
> > How is this time delta accounted in clock reading?
> > Is driver expected to record system time and subtract it on receiving
> response?
> > Or?
> 
> As a disclaimer, I am no expert for clock reading, so any additional insights are
> appreciated very much.
> 
> I think how the driver accounts for the time delta should remain outside of the
> scope of the specification. In the current spec the interpretation of clock reading
> is behavior internal to the driver.
Well, directly reading the clock as device register can be more precise.
But this can be limited to small number of clocks.

> Also, AFAIU interpretation of clock readings is a scientific and technical topic of
> its own, which has been extensively discussed in the past decades. The choice of
> a clock reading algorithm implies trade-offs between precision, implementation
> complexity, and resource usage/clock reading speed (when re-reading the
> clock).
> 
> The Linux kernel has ioctl PTP_SYS_OFFSET_EXTENDED, which will read the
> system time immediately before and after each PTP clock reading, leaving it up
> to userspace how to interpret the clock reading. This ioctl can also be used with
> the RFC Linux kernel driver implementing this spec [2], which exposes a virtual
> PTP clock. So in this case, it would hardly help to stipulate Virtio driver behavior,
> when the actual clock reading interpretation happens in userspace daemons
> using PTP_SYS_OFFSET_EXTENDED, such as chrony.
> 
> I see the above discussed VIRTIO_RTC_M_READ method mainly as a fallback in
> cases where the VIRTIO_RTC_M_READ_CROSS method would not be available.
> The VIRTIO_RTC_M_READ_CROSS method is more or less independent of the
> time delta (which can be in the 100 ms range, as discussed in the driver patch
> series cover letter [2]).
> 
Do you mean that the system clock and device clock reading happening in single call, hence the _CROSS gives more precise result?

> The above question is addressed in the introductory paragraph for "readq
> Operation" as follows:
> 
> > The driver may enhance and interpret the clock readings through
> > methods which are beyond the scope of this specification.



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]