Re: [pkcs11-comment] Behavior of C

I see yourÂpoint. I want to see this clarified one way or the other so that indeed, all implementations will act consistently. Arguably it's already clear enough as it is, but this allowance to require more bytes than needed in case the output buffer is NULL, probably is what opened up the possibility for the mistake that I am seeing in practice.Â

On Mon, Aug 19, 2019, 10:06 PM Jason King <jason.king@joyent.com> wrote:

That is correct, though to do the âjust decrypt the last blockâ, if your implementation has any sort of state associated with the decrypt operation (e.g. to buffer a partial block, to store the previous block, to cache the expanded key schedule), youâll probably need to duplicate that state, track it to make sure itâs properly deallocated (outside of the normal state) so that you can do the decryption without interfering with the main decryption operation (since a caller can retry C_Decypt as many times as they want as long as it still returns CKR_BUFFER_TOO_SMALL).

For other modes where you canât just grab the n-1 and n block, youâd still need to treat the padding mechanism as a special case, and segment the input internally if you only want to allocate a single block (and again worry about the decryption state management).

Itâs all possible to do (it is just software after all), but it seems like an awful lot of complexity. All to save a handful of bytes of output for a corner case that only applies to decryption, and behavior that seems hard to get correct given the reports of HSMs not implementing it properly.

From:ÂAmit K <klg.amit@gmail.com>
Reply:ÂAmit K <klg.amit@gmail.com>
Date:ÂAugust 18, 2019 at 3:05:12 PM
To:Âpkcs11-comment@lists.oasis-open.org <pkcs11-comment@lists.oasis-open.org>
Subject:Â Re: [pkcs11-comment] Behavior of C_Decrypt in pkcs#11

Hi Jason,Â

I agree that relaxing the requirement as you described will also resolve the issue, but I'm not sure that it's really a feasible change to make because it would break a lot of existing code. For instance, I've seen a lot of cryptographic wrappers that encapsulate pkcs11 in such a way that a decryption may be directed to an HSM when using one configuration, but will use a regular software implementation in another configuration in which the key is not on the HSM. In such a case, the software implementation (for example: OpenSSL) will not have a problem to do this "one off" decryption with the exact plaintext length, but the pkcs11 implementation will fail that same call.

Â

I also want to add on a technical note: for CKM_AES_CBC_PAD specifically, the implementation doesn't need to decrypt the entire ciphertext in order to compute the real length of the plaintext. All that's needed is to decrypt the last block which contains the padding. In CBC mode it's possible to recover any block via the key and the preceding ciphertext block only. Once the last block is obtained the number of padding bytes can be determined and subtracted from the length of the ciphertext, which gives the plaintext length.Â

Note that even in modes of operation that do require to decrypt the entire ciphertext, there's no need to actually allocate a buffer for the entire plaintext (I know you didn't say that, I'm just clarifying the issue in case someone misses that point), because you can decrypt block by block, always to the same block-sized buffer, and eventually recover the final padding block and do the same thing.ÂÂ

Regards,

Amit

On Sun, Aug 18, 2019 at 11:01 PM Amit K <klg.amit@gmail.com> wrote:

Hi Jason,Â

I agree that relaxing the requirement as you described will also resolve the issue, but I'm not sure that it's really a feasible change to make because it would break a lot of existing code. For instance, I've seen a lot of cryptographic wrappers that encapsulate pkcs11 in such a way that a decryption may be directed to an HSM when using one configuration, but will use a regular software implementation in another configuration in which the key is not on the HSM. In such a case, the software implementation (for example: OpenSSL) will not have a problem to do this "one off" decryption with the exact plaintext length, but the pkcs11 implementation will fail that same call.

Â

I also want to add on a technical note: for CKM_AES_CBC_PAD specifically, the implementation doesn't need to decrypt the entire ciphertext in order to compute the real length of the plaintext. All that's needed is to decrypt the last block which contains the padding. In CBC mode it's possible to recover any block via the key and the preceding ciphertext block only. Once the last block is obtained the number of padding bytes can be determined and subtracted from the length of the ciphertext, which gives the plaintext length.Â

Note that even in modes of operation that do require to decrypt the entire ciphertext, there's no need to actually allocate a buffer for the entire plaintext (I know you didn't say that, I'm just clarifying the issue in case someone misses that point), because you can decrypt block by block, always to the same block-sized buffer, and eventually recover the final padding block and do the same thing.ÂÂ

Regards,

Amit

On Sat, Aug 17, 2019, 3:06 AM Jason King <jason.king@joyent.com> wrote:

I think your understanding is largely correct.Â The current requirement is when the output buffer is not NULL and CKR_BUFFER_TOO_SMALL is returned, that the _exact_ size of the output plaintext must be returned (as well as allow the C_Decrypt call to be tried again without having to call C_DecryptInit).Â I would rather see this requirement relaxed so that an implementation can report a less precise value when returning CKR_BUFFER_TOO_SMALL regardless if the output buffer is NULL or not.Â The output buffer wonât be completely filled, but that already has to happen when calling C_DecryptUpdate.

Using your example, with a 272 byte ciphertext and 256 byte plaintext, even if you call C_Decrypt with a one byte buffer, it still must decrypt the whole ciphertext, strip the padding, set 256 as the plaintext size, and return CKR_BUFFER_TOO_SMALL, all while not terminating the decryption operation so it can be tried again until it succeeds or a different error occurs. Â Implementing the correct semantics here is a royal pain (I speak from _very_ recent experience).Â Iâm guessing the HSMs returning CKR_BUFFER_TOO_SMALL are expecting an output buffer at least as large as the input ciphertext (it makes the implementation significantly simpler).

From:ÂAmit K <klg.amit@gmail.com>
Reply:ÂAmit K <klg.amit@gmail.com>
Date:ÂAugust 16, 2019 at 5:56:25 PM
To:Âpkcs11-comment@lists.oasis-open.org <pkcs11-comment@lists.oasis-open.org>
Subject:Â [pkcs11-comment] Behavior of C_Decrypt in pkcs#11

Hi all,

I want to suggest that the behavior of theÂC_Decryptfunction should be more clearly defined in the pkcs#11 standard. My observation arises from testing with several HSM brands, where I noticed an inconsistency when using symmetric key mechanisms that return variable length output, such as theÂCKM_AES_CBC_PADÂmechanism.Â

To illustrate my point I shall give an example: suppose that I have a 272 bytes long ciphertext, which I know should actually decrypt to 256 bytes via theÂCKM_AES_CBC_PADÂmechanism, which means a full block of padding was added. (I know this because I originally encrypted 256 bytes via the same mechanism).

I know thatÂaccording to the standardÂwhen invokingÂC_DecryptÂwith a NULL output buffer the function may return an output length which is somewhat longer than the actual required length, in particular when padding is used this is understandable, as the function can't know how many padding bytes are in the final block without carrying out the actual decryption. But there's nothing in the standard states that it is necessary when I *do* know what output length to expect, which is the use case I am describing.

So what I noticed is that some HSMs will successfully return exactly 256 bytes back as a response to a C_Decrypt for such a ciphertext, while others will return aÂCKR_BUFFER_TOO_SMALLÂerror as a result. I guess that the implementations which return this error just don't allocate an internal buffer for the final block which contains the padding, to actually check its contents and how much padding bytes are there. Without doing that it would be natural to require more bytes in the output buffer than they may actually need.Â

So I think that the issue with the standard is that there's no clear requirement to make a call with a NULL output buffer to C_Decrypt before invoking it with a real buffer - so that some implementations seem to assume you have done both while others are more conservative, resulting in the inconsistency I described.

If this issue can be either clarified (I may be misunderstanding something about the standard) or perhaps addressed in a future version of pkcs#11, that would be of great help.

Best Regards,

AmitÂ

Â

P.S. I posted a question with a similar pharsing a while ago in the SO site:Âhttps://stackoverflow.com/questions/56818371/what-is-the-correct-behavior-of-c-decrypt-in-pkcs11ÂÂ

but so far got no satisfactory answer.

pkcs11-comment message