OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: Re: [virtio-comment] [PATCH v4 1/7] transport-fabrics: introduce Virtio Over Fabrics overview

Some quick responses. Iâll get back to you on patches 3 and 5 in a while.

> On Jul 31, 2023, at 12:44 AM, zhenwei pi <pizhenwei@bytedance.com> wrote:
> On 7/30/23 07:23, Raphael Norwitz wrote:
>> Hey Zhenwei,
>> Interesting proposal!
>> I have taken a pass through the series and have some thoughts/feedback. I see potential value in a way to desegregate device processing from the hypervisor, while re-using a lot of the existing code and well defined abstractions in the virtio ecosystem.
> Hi Raphael,
> From the point of my view, Virtio-oF will support several scenarios:
> [1] host OS/containers. The host kernel connects the target directly, then the host OS emulate a virtio device.
> [2] hypervisor. A hypervisor handles request from guest, then translate the request into Virtio-oF, forward requests to target.
> [3] vhost-user. Like hypervisor, but has a higher performance.
> [4] virtio hardware acceleration on DPU/smartNIC.

The only difference between 2 and 3 being emulation in user space vs the kernel?

>> For storage in particular it may have some performance impact by cutting out virtio processing from the hot path, but IMO there are better ways to achieve that working out of the box today. Your âeager buffersâ are a nice touch for the write path, but come with their own costs Iâll describe later.
>> With that, I do see some limitations.
>> By design, this protocol gives the guest kernel direct access to the target. That in turn becomes a potential attack surface if a VM is compromised, whereas a virtio-over-{PCI/CCW/MMIO} abstraction shields the target from direct access.
> The eager buffer is WRITE-only MR to the initiator, would you please give me more hints about 'potential attack'?

I was imagining something like scenario [1] where an untrusted guest was given access to a Virtio-OF target, like in-guest iSCSI or NVMEoF. 

Iâm not saying it necessarily creates vulnerabilities, just that if you use it for storage by giving the guest direct access to the target it has security implications which virtio over PCI doesnât have. It is certainly no worse than in-guest iSCSI or NVMEoF.

>> While migration should work cleanly for a TCP fabric, have you thought about how migration will work with RDMA? In theory, as VFIO migration matures, itâs possible the guest kernel and user space stacks could be hardened to deal with hardware being ripped out from under it, but that may be a bit futuristic.
> Do you mean the VM migration in scenario[2] and scenario[3]? as far as I can see, draining the inflight requests would be fine on migration.

I was talking about scenario [1]-like cases. Correct - migration concerns go away with your scenarios [2] and  [3]. Probably [4] too.

>> Also could you ever effectively migrate a VM from a TCP host to an RDMA accelerated one? The guest could have some smart logic to figure out ânow my NIC doesnât support RDMA, let me switch to TCPâ. Today if you use RDMA acceleration behind any of the existing transports the guest does not have to worry about this and it can safely migrate anywhere.
> Frankly, I don't think this should be part of 'Virtio-oF specification'. From the point of my view, the specification defines the transport binding only, the uplayer plays 'smart-select' game.

Agreed - it should not be a part of the spec. 

I was just trying to point out tradeoffs and complications I imagined with an eye towards storage. 

>> Similarly what happens if you need to change the target IP or if one target goes down? Is there any way to handle that without making changes inside the guest? I think you should describe âredirectionâ behavior like iSCSI has, but Iâll describe more in the relevant patch.
> Let's continue this discussion in
> '[PATCH v4 4/7] transport-fabrics: introduce command set'
>> For the storage use-case in general, Iâm struggling to understand the advantages of virtio-over-fabrics over either the kernel NVMEoF or iSER initiator. It seems like it fundamentally boils down to a similar paradigm, the kernel submits IOVs to the target directly and it DMAs back and forth as required. In my experience, those who want to squeeze every last op out of their VMs skip virtio and do that anyways. For NVMeOF, the only limitation is that you need to use NVMe, but given how virtualization-friendly and ubiquitous itâs becoming for storage I donât think thatâs bad. iSER has some challenges but they are definitely surmountable.
>> Again, I like the âeager buffers", but if youâre thinking about storage performance, why not rather try introduce that into NVMEoF or iSER?
>> Other than storage, what devices will benefit from fabrics acceleration enough to compromise the ergonomics of the existing virtio transports?
>> In summary, Iâd like to see:
>> [1] More discussion of the kinds of devices which will benefit from Virtio Over Fabrics. 
> About 2022 Q1, I developed virtio-crypto akcipher service(currently RSA offload supported only), this allows HTTPS performance(Nginx+openssl) up to ~300% in a guest.
> In our product environment, we use a lot of QAT cards as a akcipher-service-pool, and Crypto Over Fabrics is used between QEMU and server.
> Then the VM has no dependence on host server, and support VM migration.
> On the other hand, I have a plan to develop GPU Over Fabrics for the virtual desktop. So I have to develop a GPU Over Fabrics protocol...
> Once the Virtio-Camera or Virtio-Video gets ready, Camera Over Fabrics is needed for camera redirect ...
> Rather than XXX Over Fabrics again and again, I'd like to re-use Virtio devices, a single Virtio-oF is enough. This is the background and motivation about Virtio-oF.

All sound like very compelling use cases. That clears up my main concern.

>> [2] A comprehensive description of what the dataflow should look like. From having read the spec I get the gist but an example flow would be helpful for reference. Maybe just in the cover letter?
>> [3] If the goal is storage performance, a discussion of how this protocol provides performance benefits over and above the alternatives.
> As the above description, the goal is *NOT* storage performance.

Thanks for making that clear. I had thought this was intended to be a storage protocol.

> Frankly, I don't think there is essential difference between NVMe-oF(multi queues supported) and Virtio-oF blk(multi queues supported), but different with iSCSI/iSER(single queue supported). The format of command(SCSI/NVMe/Virtio) hardly affects the storage performance.

Great - weâre on the same page as far as storage is concerned.

>>> On Jun 26, 2023, at 3:25 AM, zhenwei pi <pizhenwei@bytedance.com> wrote:
>>> In the past years, virtio supports lots of device specifications by
>>> PCI/MMIO/CCW. These devices work fine in the virtualization environment.
>>> Introduce Virtio Over Fabrics transport to support "network attached
>>> peripheral devices". With this transport, Many Virtio based devices
>>> work over fabrics. Note that the balloon device may not make sense,
>> Is it just balloon/shared memory devices that donât make sense? Does virtio-net make sense?
>> Virtio-oF as described has fundamental drawbacks for SCSI in that a single virtio-scsi controller VQ may service many different iSCSI targets and your proposed protocol has one connection per VQ. Connections are likely to be made to different hosts for a single controller, so youâd have to create some abstraction to allow for multiple connections per VQ, or (more likely) rework the virtio-scsi device definition specifically for Virtio-oF.
>>> shared memory regions won't work.
>>> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
>>> ---
>>> content.tex           |  1 +
>>> transport-fabrics.tex | 32 ++++++++++++++++++++++++++++++++
>>> 2 files changed, 33 insertions(+)
>>> create mode 100644 transport-fabrics.tex
>>> diff --git a/content.tex b/content.tex
>>> index d2ab9eb..bbbd79e 100644
>>> --- a/content.tex
>>> +++ b/content.tex
>>> @@ -637,6 +637,7 @@ \chapter{Virtio Transport Options}\label{sec:Virtio Transport Options}
>>> \input{transport-pci.tex}
>>> \input{transport-mmio.tex}
>>> \input{transport-ccw.tex}
>>> +\input{transport-fabrics.tex}
>>> \chapter{Device Types}\label{sec:Device Types}
>>> diff --git a/transport-fabrics.tex b/transport-fabrics.tex
>>> new file mode 100644
>>> index 0000000..d10be2a
>>> --- /dev/null
>>> +++ b/transport-fabrics.tex
>>> @@ -0,0 +1,32 @@
>>> +\section{Virtio Over Fabrics}\label{sec:Virtio Transport Options / Virtio Over Fabrics}
>>> +
>>> +Virtio Over Fabrics (Virtio-oF) enables operations over fabrics that rely
>>> +primarily on message passing.
>>> +
>>> +Virtio-oF uses a reliable connection to transmit data. The reliable
>>> +connection facilitates communication between entities playing the following roles:
>>> +
>>> +\begin{itemize}
>>> +\item A Virtio-oF initiator functions as a Virtio-oF client.
>>> +The Virtio-oF initiator sends commands and associated data from the driver
>>> +to the Virtio-oF target.
>>> +\item A Virtio-oF target functions as a Virtio-oF server.
>>> +The Virtio-oF target forwards commands to the device and sends completions
>>> +and associated data back to the Virtio-oF initiator.
>>> +\end{itemize}
>>> +
>>> +Virtio-oF has the following features:
>>> +
>>> +\begin{itemize}
>>> +\item A Virtio-oF target is allowed to be connected by 0 or more Virtio-oF initiators.
>>> +\item A Virtio-oF initiator is allowed to connect to a single Virtio-oF target only.
>>> +A Virtio-oF device instance is a virtio device that the Virtio-oF initiator is
>>> +accessing through the Virtio-oF target.
>>> +\item There is a one-to-one mapping between the Virtio-oF queue and the reliable connection.
>> Again - there are drawbacks for some device types (i.e. SCSI). Maybe soften the language to suggest that it could be extended in the future?
>>> +\item There is one, and only one, Virtio-oF control queue for a Virtio-oF device instance.
>> Any particular reason for that?
> -- 
> zhenwei pi

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]