22

I am trying to encrypt and decrypt data using AES/GCM/NoPadding. I installed the JCE Unlimited Strength Policy Files and ran the (simple minded) benchmark below. I've done the same using OpenSSL and was able to achieve more than 1 GB/s encryption and decryption on my PC.

With the benchmark below I'm only able to get 3 MB/s encryption and decryption using Java 8 on the same PC. Any idea what I am doing wrong?

public static void main(String[] args) throws Exception {
    final byte[] data = new byte[64 * 1024];
    final byte[] encrypted = new byte[64 * 1024];
    final byte[] key = new byte[32];
    final byte[] iv = new byte[12];
    final Random random = new Random(1);
    random.nextBytes(data);
    random.nextBytes(key);
    random.nextBytes(iv);

    System.out.println("Benchmarking AES-256 GCM encryption for 10 seconds");
    long javaEncryptInputBytes = 0;
    long javaEncryptStartTime = System.currentTimeMillis();
    final Cipher javaAES256 = Cipher.getInstance("AES/GCM/NoPadding");
    byte[] tag = new byte[16];
    long encryptInitTime = 0L;
    long encryptUpdate1Time = 0L;
    long encryptDoFinalTime = 0L;
    while (System.currentTimeMillis() - javaEncryptStartTime < 10000) {
        random.nextBytes(iv);
        long n1 = System.nanoTime();
        javaAES256.init(Cipher.ENCRYPT_MODE, new SecretKeySpec(key, "AES"), new GCMParameterSpec(16 * Byte.SIZE, iv));
        long n2 = System.nanoTime();
        javaAES256.update(data, 0, data.length, encrypted, 0);
        long n3 = System.nanoTime();
        javaAES256.doFinal(tag, 0);
        long n4 = System.nanoTime();
        javaEncryptInputBytes += data.length;

        encryptInitTime = n2 - n1;
        encryptUpdate1Time = n3 - n2;
        encryptDoFinalTime = n4 - n3;
    }
    long javaEncryptEndTime = System.currentTimeMillis();
    System.out.println("Time init (ns): "     + encryptInitTime);
    System.out.println("Time update (ns): "   + encryptUpdate1Time);
    System.out.println("Time do final (ns): " + encryptDoFinalTime);
    System.out.println("Java calculated at " + (javaEncryptInputBytes / 1024 / 1024 / ((javaEncryptEndTime - javaEncryptStartTime) / 1000)) + " MB/s");

    System.out.println("Benchmarking AES-256 GCM decryption for 10 seconds");
    long javaDecryptInputBytes = 0;
    long javaDecryptStartTime = System.currentTimeMillis();
    final GCMParameterSpec gcmParameterSpec = new GCMParameterSpec(16 * Byte.SIZE, iv);
    final SecretKeySpec keySpec = new SecretKeySpec(key, "AES");
    long decryptInitTime = 0L;
    long decryptUpdate1Time = 0L;
    long decryptUpdate2Time = 0L;
    long decryptDoFinalTime = 0L;
    while (System.currentTimeMillis() - javaDecryptStartTime < 10000) {
        long n1 = System.nanoTime();
        javaAES256.init(Cipher.DECRYPT_MODE, keySpec, gcmParameterSpec);
        long n2 = System.nanoTime();
        int offset = javaAES256.update(encrypted, 0, encrypted.length, data, 0);
        long n3 = System.nanoTime();
        javaAES256.update(tag, 0, tag.length, data, offset);
        long n4 = System.nanoTime();
        javaAES256.doFinal(data, offset);
        long n5 = System.nanoTime();
        javaDecryptInputBytes += data.length;

        decryptInitTime += n2 - n1;
        decryptUpdate1Time += n3 - n2;
        decryptUpdate2Time += n4 - n3;
        decryptDoFinalTime += n5 - n4;
    }
    long javaDecryptEndTime = System.currentTimeMillis();
    System.out.println("Time init (ns): " + decryptInitTime);
    System.out.println("Time update 1 (ns): " + decryptUpdate1Time);
    System.out.println("Time update 2 (ns): " + decryptUpdate2Time);
    System.out.println("Time do final (ns): " + decryptDoFinalTime);
    System.out.println("Total bytes processed: " + javaDecryptInputBytes);
    System.out.println("Java calculated at " + (javaDecryptInputBytes / 1024 / 1024 / ((javaDecryptEndTime - javaDecryptStartTime) / 1000)) + " MB/s");
}

EDIT: I leave it as a fun exercise to improve this simple minded benchmark.

I've tested some more using the ServerVM, removed nanoTime calls and introduced warmup, but as I expected none of this had any improvement on the benchmark results. It is flat-lined at 3 megabytes per second.

Christo
  • 1,662
  • 3
  • 19
  • 28
  • 6
    First, the benchmarking is wrong: no warmup, single iteration, excessive nanoTime calls. Hotspot's intrinsics for AES-NI are only used with an optimizing JIT compiler, you have to reach there before assessing performance. Second, try AES/CBC. Do you actually measure aes-gcm with OpenSSL, and it gave you 1 GB/s? – Aleksey Shipilev Sep 23 '14 at 14:27
  • 1
    Also note that to use AES-NI intrinsics it is required to use the Server VM, a modern Intel CPU with support, *and* to have a warmup sequence. Note that OpenSSL is one of the fastest libs out there, byte code may be relatively fast for business logic, but for cryptography you *will* see differences with well implemented C/C++ libraries. – Maarten Bodewes Sep 23 '14 at 15:52
  • Yes I know this isn't the most robust benchmark, but 3 MB/s vs 1 GB/s is still very significant and I feel this simpleminded benchmark is good enough to bring the point across. I've tried AES/CBC and I'm able to get more than 400 MB/s for encryption and more than 1 GB/s for decryption using Java's cipher. – Christo Sep 25 '14 at 06:42
  • 1
    I am writing my own JavaFX/Java EE test application with an awesome GUI that go through the entire process of authenticating a user using SRP and then send encrypted files over WebSocket using AES/GCM. I will return with a link when the app is done. But for now, all I wanted to say is that compared with unencrypted file transfer, using AES/GCM is about 10 times slower for me (96 bit authentication tag, 128 bit key & IV). – Martin Andersson Nov 10 '14 at 21:10
  • 1
    @atrioom you don't have the [Unlimited Strength Jurisdiction Policy Files](http://www.oracle.com/technetwork/java/javase/downloads/jce8-download-2133166.html) installed. – Boris the Spider Apr 21 '15 at 10:44

3 Answers3

20

Micro-benchmarking aside, the performance of the GCM implementation in JDK 8 (at least up to 1.8.0_25) is crippled.

I can consistently reproduce the 3MB/s (on a Haswell i7 laptop) with a more mature micro-benchmark.

From a code dive, this appears to be due to a naive multiplier implementation and no hardware acceleration for the GCM calculations.

By comparison AES (in ECB or CBC mode) in JDK 8 uses an AES-NI accelerated intrinsic and is (for Java at least) very quick (in the order of 1GB/s on the same hardware), but the overall AES/GCM performance is completely dominated by the broken GCM performance.

There are plans to implement hardware acceleration, and there have been third party submissions to improve the performance with, but these haven't made it to a release yet.

Something else to be aware of is that the JDK GCM implementation also buffers the entire plaintext on decryption until the authentication tag at the end of the ciphertext is verified, which cripples it for use with large messages.

Bouncy Castle has (at the time of writing) faster GCM implementations (and OCB if you're writing open source software of not encumbered by software patent laws).


Updated July 2015 - 1.8.0_45 and JDK 9

JDK 8+ will get an improved (and constant time) Java implementation (contributed by Florian Weimer of RedHat) - this has landed in JDK 9 EA builds, but apparently not yet in 1.8.0_45. JDK9 (since EA b72 at least) also has GCM intrinsics - AES/GCM speed on b72 is 18MB/s without intrinsics enabled and 25MB/s with intrinsics enabled, both of which are disappointing - for comparison the fastest (not constant time) BC implementation is ~60MB/s and the slowest (constant time, not fully optimised) is ~26MB/s.


Updated Jan 2016 - 1.8.0_72:

Some performance fixes landed in JDK 1.8.0_60 and performance on the same benchmark now is 18MB/s - a 6x improvement from the original, but still much slower than the BC implementations.

archie
  • 316
  • 2
  • 6
  • GCM on bouncy just caches the tag size during decryption (as it is used as part of the ciphertext), although I'm trying out new code to get that out as well (and request the tag separately, as it should be). Rewriting the exponentiation required to allow the AAD to be added later is quite a pain though. – Maarten Bodewes Nov 21 '14 at 00:37
  • I've done some test to encrypt/decrypt 84 files. JDK implemtention: crypt = 15 seconds / decrypt = 111 seconds. BC: crypt = 14 seconds / decrypt = 23 seconds!!! How is it possible that decryption is so slow with native JDK compare to BC? BC is just a jar file containing only class files (so java language file). Oracle (sun) have access to native OS with the JRE/JDK, so they can implement all this stuff in low level and faster language and optimized their code (like in C)! It's incomprehensible for me :/ – Alexxx Jan 25 '17 at 22:54
  • 1
    The JDK implementation of GCM decryption is particularly inefficient - it buffers all plaintext before returning any data, and on top of that does the buffering very inefficiently causing many memory copies, which totally overwhelms the base benefits of having hardware accelerated AES and GCM. – archie Jan 30 '17 at 19:59
4

This has now been partially addressed in Java 8u60 with JDK-8069072. Without this fix I get 2.5M/s. With this fix I get 25M/s. Disabling GCM completely gives me 60M/s.

To disable GCM completely create a file named java.security with the following line:

jdk.tls.disabledAlgorithms=SSLv3,GCM

Then start your Java process with:

java -Djava.security.properties=/path/to/my/java.security ...

If this doesn't work, you may need to enable overriding security properties by editing /usr/java/default/jre/lib/security/java.security (actual path may be different depending on OS) and adding:

policy.allowSystemProperty=true
kichik
  • 28,340
  • 4
  • 77
  • 97
  • 1
    I opened main java.security and changed securerandom.source=file:/dev/random -> securerandom.source=file:/dev/urandom . And it helped for me – demaksee Nov 17 '16 at 22:49
  • See here for some details about random/urandom http://stackoverflow.com/questions/21757653/cipher-getinstance-is-too-slow#comment32942702_21757854 – demaksee Nov 17 '16 at 22:56
  • 1
    Alternative way to use `/dev/urandom`: start with `java -Djava.security.egd=file:/dev/urandom …` (from the comments in `/jre/lib/security/java.security`). – ᴠɪɴᴄᴇɴᴛ Aug 07 '18 at 15:52
  • The correct property name is `java.security.policy` per the [oracle docs](https://docs.oracle.com/javase/8/docs/technotes/guides/security/PolicyFiles.html) – MicGer Jul 27 '20 at 23:42
0

The OpenSSL implementation is optimized by the assembly routine using pclmulqdq instruction(x86 platform). It very fast due to the paralleled algorithm.

The java implementation is slow. but it was also optimized in Hotspot using assembly routine(not paralleled). you have to warm up the jvm to use Hotspot intrinsic. The default value of -XX:CompileThreshold is 10000.

// pseudocode

warmUp_GCM_cipher_loop10000_times();

do_benchmark();