OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: Re: Re: [PATCH v2 06/11] transport-fabrics: introduce command set


On 6/6/23 00:30, Stefan Hajnoczi wrote:
On Fri, Jun 02, 2023 at 01:15:00PM +0800, zhenwei pi wrote:


On 6/1/23 01:10, Stefan Hajnoczi wrote:
On Thu, May 04, 2023 at 04:19:05PM +0800, zhenwei pi wrote:
Introduce command structures for Virtio-oF.

Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
---
   transport-fabrics.tex | 209 ++++++++++++++++++++++++++++++++++++++++++
   1 file changed, 209 insertions(+)

diff --git a/transport-fabrics.tex b/transport-fabrics.tex
index 7711321..37f57c6 100644
--- a/transport-fabrics.tex
+++ b/transport-fabrics.tex
@@ -495,3 +495,212 @@ \subsubsection{Buffer Mapping Definition}\label{sec:Virtio Transport Options / V
                       |value |  -> 8193 (value.u32)
                       +------+
   \end{lstlisting}
+
+\subsubsection{Commands Definition}\label{sec:Virtio Transport Options / Virtio Over Fabrics / Transmission Protocol / Commands Definition}
+This section defines command structures for Virtio Over Fabrics.
+
+A common structure virtio_of_value is fixed to 8 bytes and MUST be used as one
+of the following format:
+
+\begin{itemize}
+\item u8
+\item le16
+\item le32
+\item le64
+\end{itemize}

The way it's written does not document where the u8, u16, u32 bytes are
located and that the unused bytes are 0. I think I understand what you
mean though:

    le64 value = cpu_to_le64((u64)v); /* v is u8, u16, u32, or u64 */

Please clarify.


I want to describe an union structure of 8 bytes:
union virtio_of_value {
     u8;
     u16;
     u32;
     u64;
};

Depending on the opcode, use the right one.

I was trying to point out that the memory layout of C unions is not
portable. Your example does not define the exact in-memory layout of
union virtio_of_value. Here is the first web search result I found about
this topic:

   "Q: And a related question: if you dump unions in binary form to a file,
   and then reload them from the file on a different platform, or with a
   program compiled by a different compiler, are you guaranteed to get
   back what you stored? (I think not, but I'm not sure)

   A: You're right; you're not."

   https://bytes.com/topic/c/answers/220372-unions-storage-abis

In the cpu_to_le64() code example that I gave, the exact in-memory
layout is well-defined. There is no ambiguity.


OK, thanks.

+\hline
+0xff00 - 0xfffd & Reserved \\
+\hline
+\end{tabular}
+
+\paragraph{Connect Command}\label{sec:Virtio Transport Options / Virtio Over Fabrics / Transmission Protocol / Commands Definition / Connect Command}
+The Connect Command is used to establish Virtio Over Fabrics queue. The control
+queue MUST be established firstly, then the Connect command establishes an
+association between the initiator and the target.

Is a "Virtio Over Fabrics queue" different from a virtqueue?

If I understand correctly, the control queue must be established by the
initiator first and then the Connect command is sent to begin
communication between the initiator and the target?


The queue mapping is missing in the '[PATCH v2 01/11] transport-fabrics:
introduce Virtio Over Fabrics overview', like:
A "Virtio Over Fabrics queue" is a reliable connection between initiator and
target. There are 2 types of Virtio Over Fabrics queue:
+\begin{itemize}
+\item A single Control queue is required to execute control operations.
+\item 0 or more Virtio Over Fabrics queues map the virtqueues.
+\end{itemize}

That helps, thanks!


+
+The Target ID of 0xffff is reserved, then:

Please move this after the fields have been shown and the purpose of the
Target ID field has been explained.

+\begin{itemize}
+\item The Target ID of 0xffff MUST be specified as the Target ID in a Connect
+Command for the control queue.
+\item The target SHOULD allocate any available Target ID to the initiator,
+and return the allocated Target ID in the Completion.
+\item The returned Target ID MUST be specified as the Target ID, and the Queue ID
+MUST be specified in a Connect Command for the virtqueue.
+\end{itemize}

What is the purpose of the Target ID? Is it to allow a server to provide
access to multiple targets over the same connection?


A target listens on a port, and provides access to 0 or more targets. An
initiator connect the specific target by TVQN of connect command.
An initiator could connect a single target, multiple initiators could
connect the same target(typically, shared disk/fs).

Why is the target ID separate from the TVQN? If the Target ID is a
separate parameter then users will have to learn additional
syntax/command-line options to specify the TVQN + Target ID and that
syntax may vary between software.


+
+The Connect Command has following structure:
+
+\begin{lstlisting}
+struct virtio_of_command_connect {
+        le16 opcode;
+        le16 command_id;
+        le16 target_id;
+        le16 queue_id;
+        le16 ndesc;

Where is this field documented?


OK. Will add.

Why does the initiator send ndesc to the target? Normally a VIRTIO Transport reports the device's max descriptors and then the driver can tell the device to reduce the number of descriptors, if desired.


A target supports at lease 1 descriptor. The 'ndesc' of struct
virtio_of_command_connect indicates the full PDU contains: struct
virtio_of_command_connect + 1 * virtio_of_vq_desc + data.

+#define VIRTIO_OF_CONNECTION_TCP     1
+#define VIRTIO_OF_CONNECTION_RDMA    2

What does RDMA mean? I thought RDMA is a general concept that several
fabrics implement (with different details like how addressing works).


I guest your concern is the difference of IB/RoCE/iWarp ...
We are trying to define the payload protocol here, so I think we can ignore
the difference of the HCA.

I see, maybe this could be called STREAM vs KEYED instead of TCP vs RDMA?


I'd like to define two PDU mapping rules(in '[PATCH v2 04/11] transport-fabrics: introduce Stream Transmission' and '[PATCH v2 05/11] transport-fabrics: introduce Keyed Transmission'): STREAM and KEYED. A transport protocols need to use one.

Then we can define protocols:
#define VIRTIO_OF_CONNECTION_TCP     1		-> use STREAM
#define VIRTIO_OF_CONNECTION_RDMA    2		-> use KEYED
#define VIRTIO_OF_CONNECTION_TLS     3(in the future)	-> use STREAM
#define VIRTIO_OF_CONNECTION_XXX


+        u8 oftype;
+        u8 padding[5];
+};
+\end{lstlisting}
+
+The Connect commands MUST contains one Segment Descriptor and one structure
+virtio_of_command_connect to specify Initiator VQN and Target VNQ,
+virtio_of_command_connect has following structure:

I'm confsued. virtio_of_command_connect was defined above. The struct
defined below is virtio_of_connect. Does this paragraph need to be
updated (virtio_of_command_connect -> virtio_of_connect)?

Why is virtio_of_connect a separate struct and not part of
virtio_of_command_connect?


Because I'd like to define all the commands with a fixed length.

I don't understand. virtio_of_connect and virtio_of_command_connect are
both fixed-length. Why can't they be unified into 1 fixed-length struct?


For stream protocol, it always work fine.
For keyed protocol, for example RDMA, the target side needs to use ibv_post_recv to receive a large size(sizeof virtio_of_command_connect + sizeof virtio_of_connect). If the target uses ibv_post_recv to receive sizeof(CMD) + sizeof(DESC) * 1, the initiator fails in RDMA SEND.

It's not possible to review this patch because these structs aren't used
yet and the opcodes are undefined.

Defining structs that are shared by multiple opcodes might make
implementations cleaner, but I think it makes the spec less clear. I
would rather have a list of all opcodes and each one shows the full
command layout (even if it is duplicated). That way it's very easy to
look up an opcode you are implementing or debugging and check what's
needed. If the command layout is not documented in a single place, then
it takes more effort to figure out how an opcode works.

Stefan

OK, I'll merge the structure definition into the opcode definition.

Thank you!

Stefan

--
zhenwei pi


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]