2

In Tink, it is possible to load and write cleartext keysets as jsons. An non-working example is seen below:

{
  "primaryKeyId": 2800579,
  "key": [
    {
      "keyData": {
        "typeUrl": "type.googleapis.com/google.crypto.tink.AesGcmKey",
        "value": "ODA9eJX9wcAGwZocL0Jym==",
        "keyMaterialType": "SYMMETRIC"
      },
      "status": "ENABLED",
      "keyId": 2800579,
      "outputPrefixType": "TINK"
    }
  ]
}

My question is- is it possible to insert your own values into the various key/value pairs to get another valid keyset? I have experimented with this and haven't had much success- mainly because of the "value" key which complains INVALID_ARGUMENT: Could not parse key_data.value as key type 'type.googleapis.com/google.crypto.tink.AesGcmKey' Any idea of what a valid "value" would be?

Topaco
  • 18,591
  • 2
  • 12
  • 39
user3776598
  • 406
  • 5
  • 17

1 Answers1

2

First of all, the Base64 string of the value field in the posted code snippet is invalid, possibly a copy/paste error.

The following Python code uses Tink version 1.5.0 and creates and displays a keyset for AES-256/GCM as JSON:

import io
from tink import aead
from tink import tink_config
from tink import JsonKeysetWriter
from tink import new_keyset_handle
from tink import cleartext_keyset_handle

tink_config.register()

key_template = aead.aead_key_templates.AES256_GCM
keyset_handle = new_keyset_handle(key_template)

string_out = io.StringIO()
writer = JsonKeysetWriter(string_out)
cleartext_keyset_handle.write(writer, keyset_handle)

serialized_keyset = string_out.getvalue();
print(serialized_keyset);

The result is similar to the KeySet you posted and is e.g.:

{
  "primaryKeyId": 1794775293,
  "key": [
    {
      "keyData": {
        "typeUrl": "type.googleapis.com/google.crypto.tink.AesGcmKey",
        "value": "GiD5ojApaIM2MRpPhGf5sVMhxeA6NE5KjdzUxsJ0ChH/JA==",
        "keyMaterialType": "SYMMETRIC"
      },
      "status": "ENABLED",
      "keyId": 1794775293,
      "outputPrefixType": "TINK"
    }
  ]
}   

I haven't found a documentation that describes the structure in general or for the value field, but comparing the generated KeySets for different algorithms allows conclusions. If value is hex encoded, the result is:

1a20f9a23029688336311a4f8467f9b15321c5e03a344e4a8ddcd4c6c2740a11ff24

For AES-256/GCM it has 34 bytes, where the last 32 bytes are the actual key. The beginning is characteristic for the algorithm, the second byte indicates the size of the key, e.g. 0x1a10 for AES-128/GCM, 0x1a20 for AES-256/GCM or 0x1220 for ChaCha20Poly1305 (but can be more complex depending on the algorithm).

To use a self-defined key for AES-256/GCM, e.g.

000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f 

prepend 0x1a20, Base64 encode the result:

GiAAAQIDBAUGBwgJCgsMDQ4PEBESExQVFhcYGRobHB0eHw==

and apply this value instead of the old value in the above KeySet.

The modified KeySet can be loaded and used for encryption as follows:

from tink import JsonKeysetReader
from tink import cleartext_keyset_handle

serialized_keyset = '''
{
  "primaryKeyId": 1794775293,
  "key": [
    {
      "keyData": {
        "typeUrl": "type.googleapis.com/google.crypto.tink.AesGcmKey",
        "value": "GiAAAQIDBAUGBwgJCgsMDQ4PEBESExQVFhcYGRobHB0eHw==",
        "keyMaterialType": "SYMMETRIC"
      },
      "status": "ENABLED",
      "keyId": 1794775293,
      "outputPrefixType": "TINK"
    }
  ]
}   
'''
reader = JsonKeysetReader(serialized_keyset)
keyset_handle = cleartext_keyset_handle.read(reader)

plaintext = b'The quick brown fox jumps over the lazy dog'
aead_primitive = keyset_handle.primitive(aead.Aead)
tink_ciphertext = aead_primitive.encrypt(plaintext, b'')

The relationship between KeySet and the example key 0001...1e1f can be verified by decrypting the generated ciphertext using the example key without Tink, e.g. with PyCryptodome.

The format of the Tink ciphertext is described in Tink Wire Format, Crypto Formats. The first byte specifies the version, the next 4 bytes the key ID, followed by the actual data.
For GCM the actual data has the format nonce (12 bytes) || ciphertext || tag (16 bytes). Decryption is then possible with (using PyCryptodome):

from Crypto.Cipher import AES

key = bytes.fromhex('000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f')

prefix = tink_ciphertext[:5]
nonce = tink_ciphertext[5:5 + 12]
ciphertext = tink_ciphertext[5 + 12:-16]
tag = tink_ciphertext[-16:]

cipher = AES.new(key, AES.MODE_GCM, nonce=nonce)
cipher.update(b'')
decryptedText = cipher.decrypt_and_verify(ciphertext, tag)

print(decryptedText.decode('utf-8')) # The quick brown fox jumps over the lazy dog

which proves that the example key 0001...1e1f was correctly integrated into the KeySet.

Topaco
  • 18,591
  • 2
  • 12
  • 39
  • Very impressive detective work! So if I take any 32 byte key and prepend 0x1a20 for an AES-256/GCM it should work? – user3776598 May 02 '21 at 13:28
  • @user3776598 - Yes, exactly. The key _0001...1e1f_ is representative for an _arbitrary_ key. After adding the prefix, the result must still be Base64 encoded. – Topaco May 02 '21 at 19:57
  • It works beautifully. Thank you so much! If I may ask- how were you able to see that the first 2 bytes were fixed and were the length of the string from the base64 encoded string examples? – user3776598 May 03 '21 at 00:27
  • @user3776598 - As I said in my answer, unfortunately I haven't found any specification. I've deduced the meaning of the prefix by comparing different algorithms: For AES-128/GCM and AES-256/GCM the 1st byte is 0x1a, for ChaCha20Poly1305 it's 0x12. For AES-128/GCM the 2nd byte is 0x10, for AES-256/GCM and ChaCha20Poly1305 it's 0x20. Hence the conclusion that the 1st byte depends on the algorithm and that the 2nd byte indicates the key size. But the prefix can be more complex depending on the algorithm, i.e. the described procedure is valid for AES/GCM and must be adapted for other algorithms. – Topaco May 03 '21 at 09:10
  • @user3776598 - I had somehow missed the Python tag. So I replaced the Java code snippets with Python code snippets. Apart from that, nothing has changed in content. – Topaco May 03 '21 at 14:23