virtio-comment message

Subject: Re: Re: [virtio-comment] [PATCH v4 1/7] transport-fabrics: introduce Virtio Over Fabrics overview

From: zhenwei pi <pizhenwei@bytedance.com>
To: Raphael Norwitz <raphael.norwitz@nutanix.com>
Date: Mon, 31 Jul 2023 12:44:36 +0800



On 7/30/23 07:23, Raphael Norwitz wrote:

Hey Zhenwei,

Interesting proposal!

I have taken a pass through the series and have some thoughts/feedback. I see potential value in a way to desegregate device processing from the hypervisor, while re-using a lot of the existing code and well defined abstractions in the virtio ecosystem.


Hi Raphael,

From the point of my view, Virtio-oF will support several scenarios:

[1] host OS/containers. The host kernel connects the target directly,then the host OS emulate a virtio device.[2] hypervisor. A hypervisor handles request from guest, then translatethe request into Virtio-oF, forward requests to target.

[3] vhost-user. Like hypervisor, but has a higher performance.
[4] virtio hardware acceleration on DPU/smartNIC.

For storage in particular it may have some performance impact by cutting out virtio processing from the hot path, but IMO there are better ways to achieve that working out of the box today. Your âeager buffersâ are a nice touch for the write path, but come with their own costs Iâll describe later.

With that, I do see some limitations.

By design, this protocol gives the guest kernel direct access to the target. That in turn becomes a potential attack surface if a VM is compromised, whereas a virtio-over-{PCI/CCW/MMIO} abstraction shields the target from direct access.

The eager buffer is WRITE-only MR to the initiator, would you pleasegive me more hints about 'potential attack'?

While migration should work cleanly for a TCP fabric, have you thought about how migration will work with RDMA? In theory, as VFIO migration matures, itâs possible the guest kernel and user space stacks could be hardened to deal with hardware being ripped out from under it, but that may be a bit futuristic.

Do you mean the VM migration in scenario[2] and scenario[3]? as far as Ican see, draining the inflight requests would be fine on migration.

Also could you ever effectively migrate a VM from a TCP host to an RDMA accelerated one? The guest could have some smart logic to figure out ânow my NIC doesnât support RDMA, let me switch to TCPâ. Today if you use RDMA acceleration behind any of the existing transports the guest does not have to worry about this and it can safely migrate anywhere.

Frankly, I don't think this should be part of 'Virtio-oF specification'.From the point of my view, the specification defines the transportbinding only, the uplayer plays 'smart-select' game.

Similarly what happens if you need to change the target IP or if one target goes down? Is there any way to handle that without making changes inside the guest? I think you should describe âredirectionâ behavior like iSCSI has, but Iâll describe more in the relevant patch.


Let's continue this discussion in
'[PATCH v4 4/7] transport-fabrics: introduce command set'

For the storage use-case in general, Iâm struggling to understand the advantages of virtio-over-fabrics over either the kernel NVMEoF or iSER initiator. It seems like it fundamentally boils down to a similar paradigm, the kernel submits IOVs to the target directly and it DMAs back and forth as required. In my experience, those who want to squeeze every last op out of their VMs skip virtio and do that anyways. For NVMeOF, the only limitation is that you need to use NVMe, but given how virtualization-friendly and ubiquitous itâs becoming for storage I donât think thatâs bad. iSER has some challenges but they are definitely surmountable.

Again, I like the âeager buffers", but if youâre thinking about storage performance, why not rather try introduce that into NVMEoF or iSER?

Other than storage, what devices will benefit from fabrics acceleration enough to compromise the ergonomics of the existing virtio transports?

In summary, Iâd like to see:

[1] More discussion of the kinds of devices which will benefit from Virtio Over Fabrics.

About 2022 Q1, I developed virtio-crypto akcipher service(currently RSAoffload supported only), this allows HTTPS performance(Nginx+openssl) upto ~300% in a guest.

In our product environment, we use a lot of QAT cards as aakcipher-service-pool, and Crypto Over Fabrics is used between QEMU andserver.

Then the VM has no dependence on host server, and support VM migration.

On the other hand, I have a plan to develop GPU Over Fabrics for thevirtual desktop. So I have to develop a GPU Over Fabrics protocol...

Once the Virtio-Camera or Virtio-Video gets ready, Camera Over Fabricsis needed for camera redirect ...

Rather than XXX Over Fabrics again and again, I'd like to re-use Virtiodevices, a single Virtio-oF is enough. This is the background andmotivation about Virtio-oF.

[2] A comprehensive description of what the dataflow should look like. From having read the spec I get the gist but an example flow would be helpful for reference. Maybe just in the cover letter?

[3] If the goal is storage performance, a discussion of how this protocol provides performance benefits over and above the alternatives.


As the above description, the goal is *NOT* storage performance.

Frankly, I don't think there is essential difference betweenNVMe-oF(multi queues supported) and Virtio-oF blk(multi queuessupported), but different with iSCSI/iSER(single queue supported). Theformat of command(SCSI/NVMe/Virtio) hardly affects the storage performance.

On Jun 26, 2023, at 3:25 AM, zhenwei pi <pizhenwei@bytedance.com> wrote:

In the past years, virtio supports lots of device specifications by
PCI/MMIO/CCW. These devices work fine in the virtualization environment.

Introduce Virtio Over Fabrics transport to support "network attached
peripheral devices". With this transport, Many Virtio based devices
work over fabrics. Note that the balloon device may not make sense,


Is it just balloon/shared memory devices that donât make sense? Does virtio-net make sense?

Virtio-oF as described has fundamental drawbacks for SCSI in that a single virtio-scsi controller VQ may service many different iSCSI targets and your proposed protocol has one connection per VQ. Connections are likely to be made to different hosts for a single controller, so youâd have to create some abstraction to allow for multiple connections per VQ, or (more likely) rework the virtio-scsi device definition specifically for Virtio-oF.

shared memory regions won't work.

Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
---
content.tex           |  1 +
transport-fabrics.tex | 32 ++++++++++++++++++++++++++++++++
2 files changed, 33 insertions(+)
create mode 100644 transport-fabrics.tex

diff --git a/content.tex b/content.tex
index d2ab9eb..bbbd79e 100644
--- a/content.tex
+++ b/content.tex
@@ -637,6 +637,7 @@ \chapter{Virtio Transport Options}\label{sec:Virtio Transport Options}
\input{transport-pci.tex}
\input{transport-mmio.tex}
\input{transport-ccw.tex}
+\input{transport-fabrics.tex}

\chapter{Device Types}\label{sec:Device Types}

diff --git a/transport-fabrics.tex b/transport-fabrics.tex
new file mode 100644
index 0000000..d10be2a
--- /dev/null
+++ b/transport-fabrics.tex
@@ -0,0 +1,32 @@
+\section{Virtio Over Fabrics}\label{sec:Virtio Transport Options / Virtio Over Fabrics}
+
+Virtio Over Fabrics (Virtio-oF) enables operations over fabrics that rely
+primarily on message passing.
+
+Virtio-oF uses a reliable connection to transmit data. The reliable
+connection facilitates communication between entities playing the following roles:
+
+\begin{itemize}
+\item A Virtio-oF initiator functions as a Virtio-oF client.
+The Virtio-oF initiator sends commands and associated data from the driver
+to the Virtio-oF target.
+\item A Virtio-oF target functions as a Virtio-oF server.
+The Virtio-oF target forwards commands to the device and sends completions
+and associated data back to the Virtio-oF initiator.
+\end{itemize}
+
+Virtio-oF has the following features:
+
+\begin{itemize}
+\item A Virtio-oF target is allowed to be connected by 0 or more Virtio-oF initiators.
+\item A Virtio-oF initiator is allowed to connect to a single Virtio-oF target only.
+A Virtio-oF device instance is a virtio device that the Virtio-oF initiator is
+accessing through the Virtio-oF target.
+\item There is a one-to-one mapping between the Virtio-oF queue and the reliable connection.


Again - there are drawbacks for some device types (i.e. SCSI). Maybe soften the language to suggest that it could be extended in the future?

+\item There is one, and only one, Virtio-oF control queue for a Virtio-oF device instance.


Any particular reason for that?



--
zhenwei pi

References:
- Re: [virtio-comment] [PATCH v4 1/7] transport-fabrics: introduce Virtio Over Fabrics overview
  - From: Raphael Norwitz <raphael.norwitz@nutanix.com>