OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-dev message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [virtio-comment] [PROPOSAL] Virtio Over Fabrics(TCP/RDMA)


On Sun, Apr 23, 2023 at 7:31âPM zhenwei pi <pizhenwei@bytedance.com> wrote:
>
> Hi,
>
> In the past years, virtio supports lots of device specifications by
> PCI/MMIO/CCW. These devices work fine in the virtualization environment,
> and we have a chance to support virtio device family for the
> container/host scenario.

PCI can work for containers for sure (or does it meet any issue like
scalability?). It's better to describe what problems you met and why
you choose this way to solve it.

It's better to compare this with

1) hiding the fabrics details via DPU
2) vDPA

>
> - Theory
> "Virtio Over Fabrics" aims at "reuse virtio device specifications", and
> provides network defined peripheral devices.
> And this protocol also could be used in virtualization environment,
> typically hypervisor(or vhost-user process) handles request from virtio
> PCI/MMIO/CCW, remaps request and forwards to target by fabrics.

This requires meditation in the datapath, isn't it?

>
> - Protocol
> The detail protocol definition see:
> https://github.com/pizhenwei/linux/blob/virtio-of-github/include/uapi/linux/virtio_of.h

I'd say a RFC patch for virtio spec is more suitable than the codes.

>
> Example of virtio-blk read/write by TCP/RDMA:
> 1. Virtio Over TCP
> 1.1 An example of virtio-blk write(8K) command:
> Initiator side sends a stream buffer(command + 4 * desc + 8208 bytes):
>   COMMAND            +------+
>                      |opcode|  ->  virtio_of_op_vring
>                      +------+
>                      |cmd id|  ->  10
>                      +------+
>                      |length|  ->  8208
>                      +------+
>                      |ndesc |  ->  4
>                      +------+
>                      |rsvd  |
>                      +------+
>
>   DESC0              +------+
>                +-----|addr  |  -> 0
>                |     +------+
>                |     |length|  -> 16 (virtio blk write command)
>                |     +------+
>                |     |id    |  -> 10
>                |     +------+
>                |     |flags |  -> VRING_DESC_F_NEXT
>                |     +------+
>                |
>   DESC1        |     +------+
>                | +---|addr  |  -> 16
>                | |   +------+
>                | |   |length|  -> 4096
>                | |   +------+
>                | |   |id    |  -> 11
>                | |   +------+
>                | |   |flags |  -> VRING_DESC_F_NEXT
>                | |   +------+
>                | |
>   DESC2        | |   +------+
>                | |   |addr  |  -> 4112
>                | |   +------+
>                | | +-|length|  -> 4096
>                | | | +------+
>                | | | |id    |  -> 12
>                | | | +------+
>                | | | |flags |  -> VRING_DESC_F_NEXT
>                | | | +------+
>                | | |
>   DESC3        | | | +------+
>                | | | |addr  |  -> 0
>                | | | +------+
>                | | | |length|  -> 1
>                | | | +------+
>                | | | |id    |  -> 13
>                | | | +------+
>                | | | |flags |  -> VRING_DESC_F_WRITE
>                | | | +------+
>                | | |
>   DATA         +-+-+>+------+  -> 0
>                  | | |......|
>                  +-+>+------+  -> 16
>                    | |......|
>                    +>+------+  -> 4112
>                      |......|
>                      +------+  -> 8208
>
> Target side sends a stream buffer(completion + 1 * desc + 1 bytes):
>   COMPLETION         +------+
>                      |status|  ->  VIRTIO_OF_SUCCESS
>                      +------+
>                      |cmd id|  ->  10
>                      +------+
>                      |ndesc |  ->  1
>                      +------+
>                      |rsvd  |
>                      +------+
>                      |value |  -> 1 (value.u32)
>                      +------+
>
>   DESC0              +------+
>                    +-|addr  |  -> 0
>                    | +------+
>                    | |length|  -> 1
>                    | +------+
>                    | |id    |  -> 13
>                    | +------+
>                    | |flags |  -> VRING_DESC_F_WRITE
>                    | +------+
>                    |
>   DATA             |>+------+  -> 0
>                      |......|
>                      +------+  -> 1
>
> 1.2 An example of virtio-blk read(8K) command:
> Initiator side sends a stream buffer(command + 4 * desc + 16 bytes):
>   COMMAND            +------+
>                      |opcode|  ->  virtio_of_op_vring
>                      +------+
>                      |cmd id|  ->  14
>                      +------+
>                      |length|  ->  16 (virtio blk read command)
>                      +------+
>                      |ndesc |  ->  4
>                      +------+
>                      |rsvd  |
>                      +------+
>
>   DESC0              +------+
>                    +-|addr  |  -> 0
>                    | +------+
>                    | |length|  -> 16
>                    | +------+
>                    | |id    |  -> 14
>                    | +------+
>                    | |flags |  -> VRING_DESC_F_NEXT
>                    | +------+
>                    |
>   DESC1            | +------+
>                    | |addr  |  -> 16
>                    | +------+
>                    | |length|  -> 4096
>                    | +------+
>                    | |id    |  -> 15
>                    | +------+
>                    | |flags |  -> VRING_DESC_F_NEXT | VRING_DESC_F_WRITE
>                    | +------+
>                    |
>   DESC2            | +------+
>                    | |addr  |  -> 4112
>                    | +------+
>                    | |length|  -> 4096
>                    | +------+
>                    | |id    |  -> 16
>                    | +------+
>                    | |flags |  -> VRING_DESC_F_NEXT | VRING_DESC_F_WRITE
>                    | +------+
>                    |
>   DESC3            | +------+
>                    | |addr  |  -> 0
>                    | +------+
>                    | |length|  -> 1
>                    | +------+
>                    | |id    |  -> 17
>                    | +------+
>                    | |flags |  -> VRING_DESC_F_WRITE
>                    | +------+
>                    |
>   DATA             +>+------+  -> 0
>                      |......|
>                      +------+  -> 16
>
> Target side sends a stream buffer(completion + 3 * desc + 8193 bytes):
>   COMPLETION         +------+
>                      |status|  ->  VIRTIO_OF_SUCCESS
>                      +------+
>                      |cmd id|  ->  14
>                      +------+
>                      |ndesc |  ->  3
>                      +------+
>                      |rsvd  |
>                      +------+
>                      |value |  -> 8193 (value.u32)
>                      +------+
>
>   DESC0              +------+
>                +-----|addr  |  -> 0
>                |     +------+
>                |     |length|  -> 4096
>                |     +------+
>                |     |id    |  -> 15
>                |     +------+
>                |     |flags |  -> VRING_DESC_F_NEXT | VRING_DESC_F_WRITE
>                |     +------+
>                |
>   DESC1        |     +------+
>                | +---|addr  |  -> 4096
>                | |   +------+
>                | |   |length|  -> 4096
>                | |   +------+
>                | |   |id    |  -> 16
>                | |   +------+
>                | |   |flags |  -> VRING_DESC_F_NEXT | VRING_DESC_F_WRITE
>                | |   +------+
>                | |
>   DESC2        | |   +------+
>                | |   |addr  |  -> 8192
>                | |   +------+
>                | | +-|length|  -> 1
>                | | | +------+
>                | | | |id    |  -> 17
>                | | | +------+
>                | | | |flags |  -> VRING_DESC_F_WRITE
>                | | | +------+
>                | | |
>   DATA         +-+-+>+------+  -> 0
>                  | | |......|
>                  +-+>+------+  -> 4096
>                    | |......|
>                    +>+------+  -> 8192
>                      |......|
>                      +------+  -> 8193
>
> 1. Virtio Over RDMA
> 2.1 An example of virtio-blk write(8K) command:
> Initiator side sends a message (command + 4 * desc) by RDMA POST SEND:
>   COMMAND            +------+
>                      |opcode|  ->  virtio_of_op_vring
>                      +------+
>                      |cmd id|  ->  10
>                      +------+
>                      |length|  ->  0
>                      +------+
>                      |ndesc |  ->  4
>                      +------+
>                      |rsvd  |
>                      +------+
>
>   DESC0              +------+
>                      |addr  |  -> 0xffff012345670000
>                      +------+
>                      |length|  -> 16 (virtio blk write command)
>                      +------+
>                      |id    |  -> 10
>                      +------+
>                      |flags |  -> VRING_DESC_F_NEXT
>                      +------+
>                      |key   |  -> 0x1234
>                      +------+
>
>   DESC1              +------+
>                      |addr  |  -> 0xffff012345671000
>                      +------+
>                      |length|  -> 4096
>                      +------+
>                      |id    |  -> 11
>                      +------+
>                      |flags |  -> VRING_DESC_F_NEXT
>                      +------+
>                      |key   |  -> 0x1236
>                      +------+
>
>   DESC2              +------+
>                      |addr  |  -> 0xffff012345673000
>                      +------+
>                      |length|  -> 4096
>                      +------+
>                      |id    |  -> 12
>                      +------+
>                      |flags |  -> VRING_DESC_F_NEXT
>                      +------+
>                      |key   |  -> 0x1238
>                      +------+
>
>   DESC3              +------+
>                      |addr  |  -> 0xffff012345677000
>                      +------+
>                      |length|  -> 1
>                      +------+
>                      |id    |  -> 13
>                      +------+
>                      |flags |  -> VRING_DESC_F_WRITE
>                      +------+
>                      |key   |  -> 0x1239
>                      +------+
>
> Target side reads the remote address of DESC0/DESC1/DESC2 by RDMA POST
> READ, and writes the remote address of DESC3 by RDMA POST WRITE, sends a
> completion by POST SEND:
>   COMPLETION         +------+
>                      |status|  ->  VIRTIO_OF_SUCCESS
>                      +------+
>                      |cmd id|  ->  10
>                      +------+
>                      |ndesc |  ->  0
>                      +------+
>                      |rsvd  |
>                      +------+
>                      |value |  -> 1 (value.u32)
>                      +------+
>
> 2.2 An example of virtio-blk read(8K) command:
> This is quite similar to 2.1 except flags in DESC1/DESC2, target side
> reads the remote address of DESC0 by RDMA POST READ, and writes the
> remote address of DESC1/DESC2/DESC3 by RDMA POST WRITE, sends a
> completion by POST SEND.
>
> - Example
> I develop an kernel initiator(unstable, WIP version, currently TCP/RDMA
> supported):
> https://github.com/pizhenwei/linux/tree/virtio-of-github

A quick glance at the code told me it's a mediation layer that convert
descriptors in the vring to the fabric specific packet. This is the
vDPA way.

If we agree virtio of fabic is useful, we need invent facilities to
allow building packet directly without bothering the virtqueue (the
API is layout independent anyhow).

Thanks

>
> And a target(unstable, WIP version, currently blk/crypto/rng supported):
> https://github.com/pizhenwei/virtio-target/tree/WIP
>
> Run target firstly: ~# ./vtgt vtgt.conf
> Then install kernel modules in initiator side:
>   ~# insmod ./virtio_fabrics.ko
>   ~# insmod ./virtio_tcp.ko
>   ~# insmod ./virtio_rdma.ko
>
> Create a virtio-blk device over TCP by command:
>   ~# echo
> command=create,transport=tcp,taddr=192.168.122.1,tport=15771,tvqn=virtio-target/block/block0.service,iaddr=192.168.122.1,iport=0,ivqn=vqn.uuid:42761df9-4c3f-4b27-843d-c88d1dcdce32
>  > /dev/virtio-fabrics
>
> Or create a virtio-crypto device over RDMA by command:
>   ~# echo
> command=create,transport=rdma,taddr=192.168.122.1,tport=15771,tvqn=virtio-target/crypto/crypto0.service,iaddr=192.168.122.1,iport=0,ivqn=vqn.uuid:42761df9-4c3f-4b27-843d-c88d1dcdce32
>  > /dev/virtio-fabrics
>
> Or destroy a virtio-of device by command:
>   ~# echo
> command=destroy,transport=tcp,taddr=192.168.122.1,tport=15771,tvqn=vqn.uuid:2d5130d8-36d5-4fe8-ae55-48ea51e0391a,iaddr=192.168.122.1,ivqn=vqn.uuid:42761df9-4c3f-4b27-843d-c88d1dcdce32
>  > /dev/virtio-fabrics
>
> --
> zhenwei pi
>
> This publicly archived list offers a means to provide input to the
> OASIS Virtual I/O Device (VIRTIO) TC.
>
> In order to verify user consent to the Feedback License terms and
> to minimize spam in the list archive, subscription is required
> before posting.
>
> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> List help: virtio-comment-help@lists.oasis-open.org
> List archive: https://lists.oasis-open.org/archives/virtio-comment/
> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/virtio/
> Join OASIS: https://www.oasis-open.org/join/
>



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]