OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-dev message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: RE: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device specification


Hi,

> -----Original Message-----
> From: Alexander Graf [mailto:agraf@suse.de]
> Sent: Friday, September 02, 2016 10:05 PM
> Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device specification
> 
> 
> 
> On 02.09.16 14:16, Ola Liljedahl wrote:
> >
> >
> > On 02/09/2016, 12:26, "Gonglei (Arei)" <arei.gonglei@huawei.com> wrote:
> >
> >>
> >>> -----Original Message-----
> >>> From: Alexander Graf [mailto:agraf@suse.de]
> >>> Sent: Friday, September 02, 2016 4:07 PM
> >>> Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device
> >>> specification
> >>>
> >>>
> >>>
> >>> On 02.09.16 05:08, Gonglei (Arei) wrote:
> >>>> Hi Alex,
> >>>>
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: Alexander Graf [mailto:agraf@suse.de]
> >>>>> Sent: Thursday, September 01, 2016 9:37 PM
> >>>>> Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device
> >>> specification
> >>>>>
> >>>>> On 08/30/2016 02:12 PM, Gonglei wrote:
> >>>>>> The virtio crypto device is a virtual crypto device (ie. hardware
> >>>>>> crypto accelerator card). The virtio crypto device can provide
> >>>>>> five crypto services: CIPHER, MAC, HASH, AEAD, KDF, ASYM,
> PRIMITIVE.
> >>>>>>
> >>>>>> In this patch, CIPHER, MAC, HASH, AEAD services are introduced.
> >>>>>
> >>>>> I have mostly a few high level comments.
> >>>>>
> >>>>> For starters, a lot of the structs rely on the compiler to pad them
> >>> to
> >>>>> natural alignment. That may get us into trouble when trying to
> >>> emulate a
> >>>>> virtio device on a host with different guest architecture (like arm
> >>> on
> >>>>> x86). It'd be a good idea to manually pad everything to be 64bit
> >>> aligned
> >>>>> - then all fields are always at the same spot.
> >>>>>
> >>>> Good point! I'll do this in the next version. Thanks!
> >>>>
> >>>>> I also have a hard time getting my head around the final flow of
> >>>>> everything. Do I always have to create a session to be able to emit a
> >>>>> command? In that case, doesn't that slow down everything, since a
> >>>>> request would then need to wait for the host reply to receive its
> >>>>> session id? There should be some way to fire off a simple non-iv
> >>>>> operation without any session set up imho.
> >>>>>
> >>>> For symmetric algorithms, we'd better create a session before
> >>> executing
> >>> encryption
> >>>> Or decryption, because the session usually be kept for a specific
> >>>> algorithm with specific key in the production environment. And if we
> >>> only
> >>> change the iv,
> >>>> we don't need to re-create the session.
> >>>
> >>> I think we have a slight misunderstanding here :)
> >>>
> >>> The current workflow is
> >>>
> >>>   -> create session
> >>>   <- session key
> >>>   -> data in
> >>>   <- data out
> >>>   ...
> >>>   <- close session
> >>>   -> ack
> >>>
> >>> That means that at least for the first packet you have at least one full
> >>> round-trip cost from guest to host to guest to be able to send any data.
> >>>
> >> Yes, that's true...
> >>
> >>> That sounds pretty expensive to me on the latency side. There are
> >>> multiple ways to mitigate that. One idea was to have a separate path in
> >>> parallel to the create session + data + close session dance that would
> >>> combine them all into a single command. You would still have the session
> >>> based version, but accelerate the one-blob case.
> >>>
> >>> Another idea would be to make the guest be the session id janitor. Then
> >>> you could just do
> >>>
> >>>   -> create session with key X
> >>>   -> data in
> >>>   <- data out
> >>>   ...
> >>>
> >>> so you save the round trip, if you combine command and data queues, as
> >>> then the create and data bits are serialized by their position in the
> >>> queue.
> >>>
> >> ... But for lots of crypto requests, the exhaust is low:
> >>
> >> -> create session with key X
> >> <- session id
> >>    //do something in the guest if you like
> >> -> data in with session_id
> >> -> data in with session_id
> >> -> data in with session_id
> >>    ... ...    (fill out data virtqueue)
> >> <-data out
> >> <-data out
> >>    <-data out
> >>
> >> And this is what the production environment do currently.
> >>
> >>>>
> >>>> For the asymmetric algorithms, we don't need create a session IIRC.
> >>>>
> >>>> So, I don't think this is a performance degradation, but a performance
> >>> enhancement.
> >>>>
> >>>>> Also, I don't fully understand the split between control and data
> >>>>> queues. As far as I read things, the control queue is used to create
> >>>>> session ids and the data queues can then be used to push data. Is
> >>> there
> >>>>> any particular reason for the split? One thing that seems natural to
> >>> me
> >>>>> would be to have sessions be per-queue, so you would create a
> >>> session on
> >>>>> a particular queue and only have it ever be available there. That way
> >>>>> you get rid of any locking for sessions.
> >>>>>
> >>>> We want to keep a unify request type (structure) for data queue, so
> >>> we can
> >>>> keep the session operation in the control queue. Of course the
> >>> control queue
> >>>> only used to create sessions currently, but we can extend its
> >>> functions if
> >>> needed
> >>>> in the future.
> >>>
> >>> I still don't understand. With separate control+data queues you just get
> >>> yourself into synchronization troubles. Both struct
> >>> virtio_crypto_ctrl_header and struct virtio_crypto_op_header already
> >>> have an opcode as first le32 field. You can easily use that to determine
> >>> both length of the payload as well as command (control vs data).
> >>>
> >>
> >> There is a big problem that the control handle logic is synchronization,
> >> but the data queue
> >> handling logic is asynchronization. We can't combine them into one queue.
> >> It will decrease the performance because you need indentify each packet
> >> if we do this forcedly.
> > Are you saying that control and data operations are handled by separate
> > "blocks²?
> > If you combined control and data queues, there would have to be a (SW)
> > demultiplexer
> > that would add overhead (and potentially decrease throughout) especially
> > for the data
> > operations?
> 
> Uh, the multiplexer is as simple as a switch() statement on the opcode,
> no? It might stall that one particular queue, but that sounds like
> something you can improve by increasing the number of queues (and invent
> something smart to ease polling of activity on more queues).
> 
Actually, the virtio multiple queue's capacity is based on the backend crypto device,
like multiple queue tap device work with virtio-net.

> I think the main point Gonglei is bringing up here is that session
> creation needs to finish before we can start firing off packets to that
> session. That's another point that I dislike about the approach. I can't
> think of any reason why session creation should be synchronous if you
> allow the guest to handle host<->guest session tokens.
> 
> Data ordering needs to be consistent either way if you handle IVs in
> hardware, so you naturally order session creation and data transmission
> for a particular session.
> 
> But again, I really might just be missing something basic.
> 
Assuming combine control and data queue into one queue, let's imagine the below scenarios:

The synchronization scenario is ok:
 
1. Guest send packet A with session and data information to Host, opcode=create session
2. Host creates session then executes crypto operation with session created
3. Host return the out data to Guest (with session id for next crypto operation?)
4. Guest send packet B with session id and data information to Host, opcode= crypto operation
5. Host searches the global transfer table with session id to get hardware session.
6. Host executes crypto operation
7. Host return the out data to Guest.
... ...

How about the asynchronization scenario? (assuming that the below packets have same algo and key etc.)

1. Guest send packet A with session and data information to Host, opcode=create session
2. Guest send packet B with session and data information to Host, opcode=create session or crypto operation?
3. Guest send packet C with session and data information to Host, opcode=create session or crypto operation?
... ...

Each packet need to create a session (though they are the same), we can't get the best performance.

> > And in order to truly benefit from (HW) crypto acceleration (especially
> > for smaller
> > data sizes), you want to minimise the amount of SW processing in the data
> > path.
> 
> I think we're on full agreement on that one :). That's the whole point.
> If you have 256 cores, you don't want all of your crypto sessions go
> over a single bottleneck control queue. Creating sessions is a semi-hot
> path.
> 
Please refer to my another reply, it's indeed a problem.

> 
> Alex

Regards,
-Gonglei


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]