virtio-comment message

Subject: Re: [virtio-comment] [PATCH v4 1/7] transport-fabrics: introduce Virtio Over Fabrics overview
From: Raphael Norwitz <raphael.norwitz@nutanix.com>
To: zhenwei pi <pizhenwei@bytedance.com>
Date: Sat, 29 Jul 2023 23:23:59 +0000
Hey Zhenwei,

Interesting proposal! 

I have taken a pass through the series and have some thoughts/feedback. I see potential value in a way to desegregate device processing from the hypervisor, while re-using a lot of the existing code and well defined abstractions in the virtio ecosystem.

For storage in particular it may have some performance impact by cutting out virtio processing from the hot path, but IMO there are better ways to achieve that working out of the box today. Your âeager buffersâ are a nice touch for the write path, but come with their own costs Iâll describe later.

With that, I do see some limitations.

By design, this protocol gives the guest kernel direct access to the target. That in turn becomes a potential attack surface if a VM is compromised, whereas a virtio-over-{PCI/CCW/MMIO} abstraction shields the target from direct access.

While migration should work cleanly for a TCP fabric, have you thought about how migration will work with RDMA? In theory, as VFIO migration matures, itâs possible the guest kernel and user space stacks could be hardened to deal with hardware being ripped out from under it, but that may be a bit futuristic.

Also could you ever effectively migrate a VM from a TCP host to an RDMA accelerated one? The guest could have some smart logic to figure out ânow my NIC doesnât support RDMA, let me switch to TCPâ. Today if you use RDMA acceleration behind any of the existing transports the guest does not have to worry about this and it can safely migrate anywhere.

Similarly what happens if you need to change the target IP or if one target goes down? Is there any way to handle that without making changes inside the guest? I think you should describe âredirectionâ behavior like iSCSI has, but Iâll describe more in the relevant patch.

For the storage use-case in general, Iâm struggling to understand the advantages of virtio-over-fabrics over either the kernel NVMEoF or iSER initiator. It seems like it fundamentally boils down to a similar paradigm, the kernel submits IOVs to the target directly and it DMAs back and forth as required. In my experience, those who want to squeeze every last op out of their VMs skip virtio and do that anyways. For NVMeOF, the only limitation is that you need to use NVMe, but given how virtualization-friendly and ubiquitous itâs becoming for storage I donât think thatâs bad. iSER has some challenges but they are definitely surmountable.

Again, I like the âeager buffers", but if youâre thinking about storage performance, why not rather try introduce that into NVMEoF or iSER?

Other than storage, what devices will benefit from fabrics acceleration enough to compromise the ergonomics of the existing virtio transports?

In summary, Iâd like to see:

[1] More discussion of the kinds of devices which will benefit from Virtio Over Fabrics.

[2] A comprehensive description of what the dataflow should look like. From having read the spec I get the gist but an example flow would be helpful for reference. Maybe just in the cover letter?

[3] If the goal is storage performance, a discussion of how this protocol provides performance benefits over and above the alternatives.

> On Jun 26, 2023, at 3:25 AM, zhenwei pi <pizhenwei@bytedance.com> wrote:
> 
> In the past years, virtio supports lots of device specifications by
> PCI/MMIO/CCW. These devices work fine in the virtualization environment.
> 
> Introduce Virtio Over Fabrics transport to support "network attached
> peripheral devices". With this transport, Many Virtio based devices
> work over fabrics. Note that the balloon device may not make sense,

Is it just balloon/shared memory devices that donât make sense? Does virtio-net make sense?

Virtio-oF as described has fundamental drawbacks for SCSI in that a single virtio-scsi controller VQ may service many different iSCSI targets and your proposed protocol has one connection per VQ. Connections are likely to be made to different hosts for a single controller, so youâd have to create some abstraction to allow for multiple connections per VQ, or (more likely) rework the virtio-scsi device definition specifically for Virtio-oF.


> shared memory regions won't work.
> 
> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
> ---
> content.tex           |  1 +
> transport-fabrics.tex | 32 ++++++++++++++++++++++++++++++++
> 2 files changed, 33 insertions(+)
> create mode 100644 transport-fabrics.tex
> 
> diff --git a/content.tex b/content.tex
> index d2ab9eb..bbbd79e 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -637,6 +637,7 @@ \chapter{Virtio Transport Options}\label{sec:Virtio Transport Options}
> \input{transport-pci.tex}
> \input{transport-mmio.tex}
> \input{transport-ccw.tex}
> +\input{transport-fabrics.tex}
> 
> \chapter{Device Types}\label{sec:Device Types}
> 
> diff --git a/transport-fabrics.tex b/transport-fabrics.tex
> new file mode 100644
> index 0000000..d10be2a
> --- /dev/null
> +++ b/transport-fabrics.tex
> @@ -0,0 +1,32 @@
> +\section{Virtio Over Fabrics}\label{sec:Virtio Transport Options / Virtio Over Fabrics}
> +
> +Virtio Over Fabrics (Virtio-oF) enables operations over fabrics that rely
> +primarily on message passing.
> +
> +Virtio-oF uses a reliable connection to transmit data. The reliable
> +connection facilitates communication between entities playing the following roles:
> +
> +\begin{itemize}
> +\item A Virtio-oF initiator functions as a Virtio-oF client.
> +The Virtio-oF initiator sends commands and associated data from the driver
> +to the Virtio-oF target.
> +\item A Virtio-oF target functions as a Virtio-oF server.
> +The Virtio-oF target forwards commands to the device and sends completions
> +and associated data back to the Virtio-oF initiator.
> +\end{itemize}
> +
> +Virtio-oF has the following features:
> +
> +\begin{itemize}
> +\item A Virtio-oF target is allowed to be connected by 0 or more Virtio-oF initiators.
> +\item A Virtio-oF initiator is allowed to connect to a single Virtio-oF target only.
> +A Virtio-oF device instance is a virtio device that the Virtio-oF initiator is
> +accessing through the Virtio-oF target.
> +\item There is a one-to-one mapping between the Virtio-oF queue and the reliable connection.

Again - there are drawbacks for some device types (i.e. SCSI). Maybe soften the language to suggest that it could be extended in the future?

> +\item There is one, and only one, Virtio-oF control queue for a Virtio-oF device instance.

Any particular reason for that?

> +The Virtio-oF control queue is used to execute control commands,
> +for example, to get the Virtio Device ID.
> +\item There is a one-to-one mapping between virtqueue and Virtio-oF virtqueue
> +which executes the bulk data transport on virtio devices.
> +\item The arrival of data on the Virtio-oF queue indicates that a notification has arrived.
> +\end{itemize}
> -- 
> 2.25.1
> 
> 
> This publicly archived list offers a means to provide input to the
> OASIS Virtual I/O Device (VIRTIO) TC.
> 
> In order to verify user consent to the Feedback License terms and
> to minimize spam in the list archive, subscription is required
> before posting.
> 
> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> List help: virtio-comment-help@lists.oasis-open.org
> List archive: https://lists.oasis-open.org/archives/virtio-comment/
> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/virtio/
> Join OASIS: https://www.oasis-open.org/join/
> 
>
Follow-Ups:
- Re: Re: [virtio-comment] [PATCH v4 1/7] transport-fabrics: introduce Virtio Over Fabrics overview
  - From: zhenwei pi <pizhenwei@bytedance.com>