OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [virtio-comment] RE: RE: Re: Re: Re: [PATCH v2 06/11] transport-fabrics: introduce command set


On 6/9/23 10:06, Parav Pandit wrote:

From: zhenwei pi <pizhenwei@bytedance.com>
Sent: Thursday, June 8, 2023 9:39 PM


We should start with first establishing the data transfer model covering 512B
to 1M context and take up the optimizations as extensions.



Hi, Parav

What do you think about another RDMA inline proposal in '[PATCH v2 11/11]
transport-fabrics: support inline data for keyed transmission'?

1, use feature command to get the target max recv buffer size, for example 16k
2, use feature command to set the initiator max recv buffer size, for example
16k If the size of payload is less than max recv buffer size, using a single RDMA
SEND is enough. for example, virtio-blk writes 8k: 16 + 8192 < 16384, this
means a single RDMA SEND is fine.

Let me read it.
 From above short description, it appears that every receive buffer posted must be of size 16K.
And if sender choose not to do inline, there is super buffer wasted.

If it is read only or read workload, target majority buffer wastage is close to 98% or so assuming 64B command size.

And when buffer is full, the sender is stalled for the full round trip to enqueue the command.

Yes, this waste memory, it's not good enough.

I tried to understand your proposal, please correct me if I misunderstand...

Define data structure like:

struct virtio_of_keyed_desc {
        le64 addr;
        le32 length;
        le32 key;
};

struct virtio_of_command_vq {
        le16 opcode;
        le16 command_id;
        le32 out_length;
        le32 in_length;
        union {
                struct virtio_of_keyed {
                        le32 out_offset;
                };

                struct virtio_of_stream {
                        u8 rsvd[4];
                };
        };
};

struct virtio_of_completion {
        le16 status;
        le16 command_id;
        u8 rsvd[4];
        union {
                le64 value;
                struct virtio_of_vq_completion {
                        le32 in_length;
                        le32 len;
                };
        }
};


For stream(Ex TCP/IP), the request PDU includes [struct virtio_of_command_vq + data], the response PDU includes [struct virtio_of_completion + data].

For keyed(Ex RDMA), the request PDU includes [struct virtio_of_command_vq + struct virtio_of_keyed_desc], there are 2 opcodes for keyed transmission:
1, opcode virtio_of_op_vq: (basic and required command)
the initiator prepares a buffer of [out_length + in_length], the target recv a 32B command, and reads the remote memory [addr, addr+out_length) by RDMA READ, then writes the remote memory [addr+out_length, addr+out_length+in_length) by RDMA WRITE, finally sends completion by RDMA SEND.

2, opcode virtio_of_op_vq_write_inline: (optional command)
the initiator gets a remote buffer of target(Ex, 128K) after feature negotiation.

The initiator selects a region of target remote memory(Ex, 4k - 12k), and writes payload by RDMA WRITE, then sends a 32B command by RDMA SEND(out_offset is 4K, ). The target handles command, writes the remote memory [addr, addr+in_length), finally sends completion by RDMA SEND.

--
zhenwei pi


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]