OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-dev message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: RE: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device specification


Hi Alex,

> -----Original Message-----
> From: Alexander Graf [mailto:agraf@suse.de]
> Sent: Friday, September 02, 2016 9:53 PM
> Euler)
> Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device specification
> 
> 
> 
> On 02.09.16 12:26, Gonglei (Arei) wrote:
> >
> >> -----Original Message-----
> >> From: Alexander Graf [mailto:agraf@suse.de]
> >> Sent: Friday, September 02, 2016 4:07 PM
> >> Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device
> specification
> >>
> >>
> >>
> >> On 02.09.16 05:08, Gonglei (Arei) wrote:
> >>> Hi Alex,
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: Alexander Graf [mailto:agraf@suse.de]
> >>>> Sent: Thursday, September 01, 2016 9:37 PM
> >>>> Subject: Re: [PATCH v8 1/2] virtio-crypto: Add virtio crypto device
> >> specification
> >>>>
> >>>> On 08/30/2016 02:12 PM, Gonglei wrote:
> >>>>> The virtio crypto device is a virtual crypto device (ie. hardware
> >>>>> crypto accelerator card). The virtio crypto device can provide
> >>>>> five crypto services: CIPHER, MAC, HASH, AEAD, KDF, ASYM, PRIMITIVE.
> >>>>>
> >>>>> In this patch, CIPHER, MAC, HASH, AEAD services are introduced.
> >>>>
> >>>> I have mostly a few high level comments.
> >>>>
> >>>> For starters, a lot of the structs rely on the compiler to pad them to
> >>>> natural alignment. That may get us into trouble when trying to emulate a
> >>>> virtio device on a host with different guest architecture (like arm on
> >>>> x86). It'd be a good idea to manually pad everything to be 64bit aligned
> >>>> - then all fields are always at the same spot.
> >>>>
> >>> Good point! I'll do this in the next version. Thanks!
> >>>
> >>>> I also have a hard time getting my head around the final flow of
> >>>> everything. Do I always have to create a session to be able to emit a
> >>>> command? In that case, doesn't that slow down everything, since a
> >>>> request would then need to wait for the host reply to receive its
> >>>> session id? There should be some way to fire off a simple non-iv
> >>>> operation without any session set up imho.
> >>>>
> >>> For symmetric algorithms, we'd better create a session before executing
> >> encryption
> >>> Or decryption, because the session usually be kept for a specific
> >>> algorithm with specific key in the production environment. And if we only
> >> change the iv,
> >>> we don't need to re-create the session.
> >>
> >> I think we have a slight misunderstanding here :)
> >>
> >> The current workflow is
> >>
> >>   -> create session
> >>   <- session key
> >>   -> data in
> >>   <- data out
> >>   ...
> >>   <- close session
> >>   -> ack
> >>
> >> That means that at least for the first packet you have at least one full
> >> round-trip cost from guest to host to guest to be able to send any data.
> >>
> > Yes, that's true...
> >
> >> That sounds pretty expensive to me on the latency side. There are
> >> multiple ways to mitigate that. One idea was to have a separate path in
> >> parallel to the create session + data + close session dance that would
> >> combine them all into a single command. You would still have the session
> >> based version, but accelerate the one-blob case.
> >>
> >> Another idea would be to make the guest be the session id janitor. Then
> >> you could just do
> >>
> >>   -> create session with key X
> >>   -> data in
> >>   <- data out
> >>   ...
> >>
> >> so you save the round trip, if you combine command and data queues, as
> >> then the create and data bits are serialized by their position in the queue.
> >>
> > ... But for lots of crypto requests, the exhaust is low:
> >
> > 	-> create session with key X
> > 	<- session id
> >     //do something in the guest if you like
> >  	-> data in with session_id
> > 	-> data in with session_id
> > 	-> data in with session_id
> >     ... ...    (fill out data virtqueue)
> > 	<-data out
> > 	<-data out
> >     <-data out
> >
> > And this is what the production environment do currently.
> 
> Does this also apply to a web server handling lots of small requests?
> Does crypto acceleration even make sense in such an environment or would
> it be cheaper to simply to the crypto on the CPU for such small payloads?
> 
> I'm seriously asking :).
> 
I think I get your points now. You mean for one-blob request (different packets have different algorithms or key), 
then the exhaust is too higher under current approach? That's true, because for each packet
we do have the below workflow:

 Guest -> Host

 -> create session
 <- session id
 -> data in
 <- data out
 <- close session

This is a big problem indeed.

> >
> >>>
> >>> For the asymmetric algorithms, we don't need create a session IIRC.
> >>>
> >>> So, I don't think this is a performance degradation, but a performance
> >> enhancement.
> >>>
> >>>> Also, I don't fully understand the split between control and data
> >>>> queues. As far as I read things, the control queue is used to create
> >>>> session ids and the data queues can then be used to push data. Is there
> >>>> any particular reason for the split? One thing that seems natural to me
> >>>> would be to have sessions be per-queue, so you would create a session on
> >>>> a particular queue and only have it ever be available there. That way
> >>>> you get rid of any locking for sessions.
> >>>>
> >>> We want to keep a unify request type (structure) for data queue, so we can
> >>> keep the session operation in the control queue. Of course the control
> queue
> >>> only used to create sessions currently, but we can extend its functions if
> >> needed
> >>> in the future.
> >>
> >> I still don't understand. With separate control+data queues you just get
> >> yourself into synchronization troubles. Both struct
> >> virtio_crypto_ctrl_header and struct virtio_crypto_op_header already
> >> have an opcode as first le32 field. You can easily use that to determine
> >> both length of the payload as well as command (control vs data).
> >>
> >
> > There is a big problem that the control handle logic is synchronization, but the
> data queue
> > handling logic is asynchronization. We can't combine them into one queue.
> > It will decrease the performance because you need indentify each packet
> > if we do this forcedly.
> 
> I might again just not grasp the full picture here. Is the session key
> something global you get from hardware?
> 
The specific algorithm and key, key length etc. can exclusively determine a session,
which is a software concept IMO, in order to improve the performance for lots of
packets with the same algo and key while executing crypto operation. A session is a
handle which describes the cryptographic parameters to be applied to a number of
buffers.

> In that case, it might be a good idea to take a look at all existing
> crypto accelerator implementations we know of today and see which model
> works best across the board.
> 
Mihai, what about your crypto accelerators for SOC?

> I also think it's probably a good idea to not pass hardware session ids
> into the guest, so you'd want to create a separate lookup table that
> converts from virtio session to hardware session. That way hardware that
> does not have global session tokens can actually leverage multiple CPUs
> hammering it.
> 
The retuned session_id is an index based on software realization, not hardware session ids.
We truly need to create a separate lookup table.

> 
> Alex

Regards,
-Gonglei


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]