Support for Compressed Strings being Dropped in HotSpot JVM?

Question

On this Oracle page Java HotSpot VM Options, it lists -XX:+UseCompressedStrings as being available and on by default. However in Java 6 update 29, it is off by default and in Java 7 update 2 it reports a warning

Java HotSpot(TM) 64-Bit Server VM warning: ignoring option UseCompressedStrings; support was removed in 7.0

Does anyone know the thinking behind removing this option?

sorting lines of an enormous file.txt in java

With -mx2g, this example took 4.541 seconds with the option on and 5.206 second with it off in Java 6 update 29. It is hard to see that it impacts performance.

Note: Java 7 update 2 requires 2.0 G whereas Java 6 update 29 without compressed strings requires 1.8 GB and with compressed string requires only 1.0 GB.

not related exactly but for future ref: `-XX:+PrintFlagsFinal` lists all the flags available and their values. — bestsss, Apr 27 '12 at 09:10
Looking forward to this feature making a comeback under [JEP 254](http://openjdk.java.net/jeps/254) in [JDK 9](http://openjdk.java.net/projects/jdk9/). I still keep JDK6-32 around for a small but string-heavy app (100MB total RAM, vs. 150MB on JDK8-32, vs 250 MB or JDK8-64) and 30% faster reg-ex searches. — Luke Usherwood, Dec 02 '15 at 04:32

Nathan · Accepted Answer · 2016-03-10T18:16:09.527

41

Originally, this option was added to improve SPECjBB performance. The gains are due to reduced memory bandwidth requirements between the processor and DRAM. Loading and storing bytes in the byte[] consumes 1/2 the bandwidth versus chars in the char[].

However, this comes at a price. The code has to determine if the internal array is a byte[] or char[]. This takes CPU time and if the workload is not memory bandwidth constrained, it can cause a performance regression. There is also a code maintenance price due to the added complexity.

Because there weren't enough production-like workloads that showed significant gains (except perhaps SPECjBB), the option was removed.

There is another angle to this. The option reduces heap usage. For applicable Strings, it reduces the memory usage of those Strings by 1/2. This angle wasn't considered at the time of option removal. For workloads that are memory capacity constrained (i.e. have to run with limited heap space and GC takes a lot of time), this option can prove useful.

If enough memory capacity constrained production-like workloads can be found to justify the option's inclusion, then maybe the option will be brought back.

Edit 3/20/2013: An average server heap dump uses 25% of the space on Strings. Most Strings are compressible. If the option is reintroduced, it could save half of this space (e.g. ~12%)!

Edit 3/10/2016: A feature similar to compressed strings is coming back in JDK 9 JEP 254.

edited Mar 10 '16 at 18:16

answered Apr 24 '12 at 00:01

Nathan

6,095
6
42
63

I assume that large JEE based systems will store most of their data in a database, JSE systems do this but to a lesser degree. Being able to store less data in memory reduces the size of the cache you can have but it is less critical (i.e. you won't get a failure as such) I am assuming the SPECjBB doesn't take into account the cost of being able to cache less data. For my applications, I store most of my data in Memory Mapped Files with byte based encoding for strings and use re-usable StringBuilder rather than String to limit GC impact, so it may not help me as much as it did. – Peter Lawrey Apr 24 '12 at 07:10
5

It shouldn't have to come at a price. Java should be able to provide apis that could be used to build a string from a source that is known to only contain bytes. Instead it chooses to distrust the programmer and verify everything itself. Similarly, java could provide an api allowing a String to be instantiated out of an existing array; instead it completely distrusts all programmers and forces always copying the array. – srparish Jul 11 '12 at 13:08
2

I hope this is added back in. It actually is very useful for speeding up text parsing applications that handle minimal character sets and definitely reduces heap usage if you're keeping your dataset in memory. – Jonathan S. Fisher Jul 25 '12 at 16:28
3

@srparish: I'm pretty sure that allowing what you described would undermine the JVM security/ Accepting your array as a String component makes the String mutable and with String used as class names you could probably do whatever you want in the JVM. So the non-copying public constructor `String(char[])` would have to be guarded by a `SecurityManager`, what would probably make them slower then the copying version. – maaartinus Jul 27 '12 at 21:43
@maaartinus It is not a problem if we can make immutable arrays. By the way, will escape analysis optimize away the copy? – ntysdd Sep 15 '17 at 02:53
@ntysdd But how can we make immutable arrays? I'm afraid, that escape analysis is much weaker than it could be and I doubt, it works with arrays. But I might be completely wrong. For avoiding the copy, you'd need to determine that the `char[]` gets unused and use this information in the string constructor. This sounds a bit complicated and it isn't probably common enough (more often, you use a `StringBuilder`, where it could apply as well, but the allocated array is rarely of the exact size). – maaartinus Sep 15 '17 at 04:16
@ntysdd A funny related feature would be returning the internal array from `String.toCharArray`, whenever it can be proven, that it won't get modified. – maaartinus Sep 15 '17 at 04:30
@maaartinus How are compressed strings benefited? – ntysdd Sep 15 '17 at 04:59
@maaartinus I hear somewhere that escape analysis can make array allocated on stack (if it is less than 64 bytes) so maybe it is not so much a problem. JEP 169 has something about immutable array. – ntysdd Sep 15 '17 at 05:05
@ntysdd Compressed strings save in case of LATIN-1 half the array. There's a lot of to read if you want: https://bugs.openjdk.java.net/browse/JDK-8054307. I know only a little about escape analysis. JEP 169 is cool, but it may take a lot of time till it gets implemented. – maaartinus Sep 15 '17 at 05:09

score 14 · Answer 2 · edited Jul 26 '17 at 23:20

14

Just to add, for those interested...

The java.lang.CharSequence interface (which java.lang.String implements), allows more compact representations of Strings than UTF-16.

Apps which manipulate a lot of strings, should probably be written to accept CharSequence, such that they would work with java.lang.String, or more compact representations.

8-bit (UTF-8), or even 5, 6, or 7-bit encoded, or even compressed strings can be represented as CharSequence.

CharSequences can also be a lot more efficient to manipulate - subsequences can be defined as views (pointers) onto the original content for example, instead of copying.

For example in concurrent-trees, a suffix tree of ten of Shakespeare's plays, requires 2GB of RAM using CharSequence-based nodes, and would require 249GB of RAM if using char[] or String-based nodes.

edited Jul 26 '17 at 23:20

Nathan

6,095
6
42
63

answered Apr 05 '13 at 17:50

npgall

2,769
1
20
23

1

`CharSequence` seems interesting, but I see no means by which an implementation can indicate whether it should be considered immutable (i.e. whether persisting a reference is sufficient to persist the sequence of characters therein). Obviously it's possible for any interface to be implemented in broken fashion, but the interface would seem most useful if it had `IsImmutable` and `AsImmutable` methods. – supercat Sep 24 '13 at 22:33
Yes the immutability of a `CharSequence` depends on the immutability of all `CharSequence`s it references transitively. I suppose if you implement an `ImmutableCharSequence` which can only reference other `ImmutableCharSequence`s, then you could do `instanceof` checks, to detect immutability transitively. – npgall Sep 25 '13 at 15:31
1

While it would be helpful to have an interface `ImmutableCharSequence` which inherited `CharSequence` but didn't add any new members--just an expectation that `IsImmutable` would return `true` and `AsImmutable` would return `this`, and methods which need immutable strings could accept that type without having to call `IsImmutable` or `AsImmutable`, there's no way one could restrict what types of objects could be encapsulated by an `ImmutableCharSequence`, since what would matter would not be whether any encapsulated instance was of a mutable type, but rather whether it would ever be... – supercat Sep 25 '13 at 16:21
1

...exposed to anything that would mutate it. The vast majority of immutable objects encapsulate instances of mutable classes either directly or indirectly, but are immutable despite that because those instances are never freely exposed to the outside world. – supercat Sep 25 '13 at 16:23

score 13 · Answer 3 · answered Jan 12 '12 at 11:17

13

Since there were up votes, I figure I wasn't missing something obvious so I have logged it as a bug (at the very least an omission in the documentation)

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7129417

(Should be visible in a couple of days)

answered Jan 12 '12 at 11:17

Peter Lawrey

498,481
72
700
1,075

1

filing the bug was the right thing to do, yet overall SO has no known JVM engineers participants. – bestsss Apr 27 '12 at 09:13
1

true, but SO is much more responsive. ;) I wanted to check I wasn't missing something obvious. @Nathan's explaination is as good as any. – Peter Lawrey Apr 27 '12 at 09:16
my take on the story would be: JIT compilation of String methods attempts to use SSE assembly but it'd be extra hard to generate proper code w/ both char[] and byte[]. Not impossible but pretty hard and they dropped the support. – bestsss Apr 27 '12 at 09:45
as for being responsive, JSR-166 mailing list is usually quite responsive but that feature would be difficult to relate to JSR-166. – bestsss Apr 27 '12 at 09:46
2

@bestsss: I wonder what difficulties there would have been with having, instead of having an array type for each numerical primitive, having a unified primitive-array instance type which could be cast to any other, along with a JVM-defined static final variable indicating whether such arrays will behave as big-endian or little-endian? There are many kinds of operation which could be accelerated by grabbing two or four things at once. – supercat Feb 27 '14 at 18:37
@supercat, you do need atomic access to each element of that array then you have the JNI (probably doable as well). Actually there is similar stuff (the one you describe) in the form of Direct ByteBuffers (esp with non-aligned CPU support), you can treat each ByteBuffer as any other primitive type. The main issue is that length (or limit in the case of ByteBuffers) is not constant, hence the compiler would have much harder time optimizing. – bestsss May 05 '14 at 14:20
@bestsss: Is Java used on any real-world platforms where it would be harder for the CPU to atomically write a single byte within an `int[]` than a single byte within a `byte[]`? I can imagine an embedded system where only part of the memory allowed writes of individual writes, but I can't imagine a JVM implementation allocating `byte[]` instances in that part of the memory space and `int[]` instances elsewhere. – supercat May 05 '14 at 15:29
@supercat, It was more about writing unaligned int/long in a 'byte[]', if you just need `int[].asByteArrayView()` that won't be much of an issue (just like ByteBuffers now) Also `new byte[5]` cannot be represented as int[] or char[] either, the memory has to be allocated and padded and behavoir when reaching the upper bounds defined. Either way, that will not help the `String` case in question. The hardship comes from handling the content of the array `byte[]` or `char[]` differently, possible cloning of the code paths (by the JVM +intrinsics) – bestsss May 05 '14 at 18:14
@bestsss: The performance advantages of compressed strings would be enhanced considerably if methods could process them in chunks larger than one byte. Given a `long l` which holds 8 ASCII characters little-endian, code on a 64-bit processor could add those characters into a hash value conformant with the present one via `l-=225*((l & 0xFF00FF00FF00FF00) >>> 8); l-=64575*((l & 0xFFFF0000FFFF0000) >>> 16); l-=4294043775*(l >>> 32); hash = (int)l - hash*1807454495;` One `long` fetch, two masks, and four multiplies, rather than eight byte fetches and eight multiplies. – supercat May 05 '14 at 18:33
1

@bestsss: The greater the performance advantages of compressed strings, the more likely those advantages would be to overcome any overhead imposed by having to conditionally or virtually dispatch members. If I were designing a `String`, I'd probably have three main storage formats: an array of bytes for ASCII strings, an array of characters for non-ASCII strings, and an array of `Object` which would hold a list of `String` a long with a list of offsets (concatenating two list-style strings should produce a new string with a combined list whose items, other than the first and last, were... – supercat May 05 '14 at 18:44
1

...between 256 and 512 characters long). Having the list items hold references to `String` rather than the backing arrays would mean that if two strings were compared and found equal, the array reference stored in the newer one could be replaced with that in the older one, thus expediting future comparisons. – supercat May 05 '14 at 18:46
2

@supercat, Keep in mind no one stops JIT to optimize the generated code and reduce loads (even though they are very cheap when hitting L1). Even now the JIT can use intrinsics (and SSE on x86 that's more effective than 64bit long) to work with the String, it's just not visible on java level. Admittedly I have not been following JIT for quite some time, though. Actually as of java1.7 hashCode primary uses murmur32 as hash function and the original hashCode is not much used. The code uses32 bit ops everywhere and the JIT should be able to optimize the fetch too. – bestsss May 06 '14 at 05:12
1

...Having virtual calls would kill performance (no inlining on length(), charAt() hence not removed bound checks) unless the strings are really large and everything can be contained in the invoked methods. Historically, `String` has been moved to hold just a char[] and no offest, len; back in the days it was even possible to share the `char[]` of `StringBuffer`, be a view of another string and more. Probably turned out all that code actually slowed down most use cases (and created leaks). – bestsss May 06 '14 at 05:13
...I still have `new String(str)` in some pieces to deal with `str` being few chars long out of substring of several KB. It's just hard to cover everything and still win "That" benchmark you have to follow suit with another implementation. – bestsss May 06 '14 at 05:13
@bestsss: I would think `if (charData != null && index < charData.length) return charData[index] else return source.vCharAt(index);` could fare as well as the present implementation in the common case. Store short strings as `char[]` unconditionally, but use something else when concatenating large strings. If `source` is `final` but not `charData`, and having "plain" strings set `source` to `this` should yield proper memory semantics (the first read of `charData` in a thread might theoretically observe `null`, but a read performed from within `source.vCharAt` would see proper data... – supercat May 06 '14 at 12:57
...and would almost certainly cause any future reads on the thread to see it without having to invoke `vCharAt`. To really get optimal string performance would require a little GC assist (somewhat like the GC does with `WeakReference`). Have `string` lose field `hash` but derive from a class `TelescopingIdentityObject` with a `protected TelescopingIdentityObject identityInfo;`, and GC semantics that if neither `identityInfo` nor `identityInfo.identityInfo` is null, the field would be replaced with the latter. – supercat May 06 '14 at 13:03
1

Exact implementation would depend upon exactly how the GC support worked, but such a design would mean that strings which are compared and found to be equal could be consolidated into equivalence groups, eliminating the need to compare them again or redundantly store their contents. BTW, I thought `string` was stuck with the original hash code since the compiler calls `string.hashCode()` when using strings in switch statements. Is that no longer true? – supercat May 06 '14 at 13:11
The `hashCode()` does work as usual `for (char c:value) h=31*h+c;` but it's rarely used as the main hashCode consumers - `(Concurrent)HashMap` (but no Hashtable) call `hash32()` instead - try iterating a HashMap containing string keys being 2 invocations/instances/processes. Personally I don't use `case "Str":`, it's superfluous (and bad style) by design as constants are hard to track unlike enums. On a flip note `WeakReference` is not exactly easy on the JIT. – bestsss May 06 '14 at 15:36
@bestsss: I would expect that telescoping references shouldn't be too hard for the GC if it works like what I understand of the .NET GC. When an object is found which hasn't yet been copied, the object is copied to a new location, a bit is set in the old copy's sync/flags word, and the first 4/8 bytes of the old object data are replaced with a pointer to the new location. If the object is re-encontered, any references will be updated to the new location. If the first 4/8 bytes of data happened to naturally *be* a reference to a known-identical object, ... – supercat May 09 '14 at 01:26
1

last comment: what you explain is a read barrier. They come with some hefty price esp. if the check cannot be reliably predicted by the hardware (and they are needed on every load, and loads are plenty). OTOH read barriers are very good if you want a true concurrent&copy GC. *That should be my last message here the derail is out of whack now.* – bestsss May 10 '14 at 19:53
1

@supercat It [looks](http://stackoverflow.com/questions/21745619) like `String.hashCode` could be 3.8 times faster with some manual unrolling or a corresponding JIT improvement. I guess it could be combined with your `long`-based optimization. – maaartinus May 20 '14 at 17:01
1

@bestsss I can't see any `hash32()` in Java 8. It seems to have been dropped in favor of using `TreeNode`s to resolve collisions. – maaartinus Jul 24 '15 at 02:50

score 6 · Answer 4 · edited Jul 26 '17 at 23:22

6

Java 9 executes the sorting lines of an enormous file.txt in java twice as fast on my machine as Java 6 and also only needs 1G of memory as it has -XX:+CompactStrings enabled by default. Also, in Java 6, the compressed strings only worked for 7-bit ASCII characters, whereas in Java 9, it supports Latin1 (ISO-8859-1). Some operations like charAt(idx) might be slightly slower though. With the new design, they could also support other encodings in future.

I wrote a newsletter about this on The Java Specialists' Newsletter.

edited Jul 26 '17 at 23:22

Nathan

6,095
6
42
63

answered May 02 '16 at 07:06

Heinz Kabutz

61
1
2

Welcome to Stackoverflow Heinz. – Peter Lawrey May 02 '16 at 07:09

score 4 · Answer 5 · edited Jul 26 '17 at 23:22

4

In OpenJDK 7 (1.7.0_147-icedtea, Ubuntu 11.10), the JVM simply fails with an

Unrecognized VM option 'UseCompressedStrings'

when JAVA_OPTS (or command line) contains -XX:+UseCompressedStrings.

It seems Oracle really removed the option.

edited Jul 26 '17 at 23:22

Nathan

6,095
6
42
63

answered Apr 09 '12 at 02:27

Rodrigo Coacci

183
1
7

2

Well that sucks. I just learned about this option, and wanted to try it on our testing environments. We handle a lot of Strings, and this could have potentially reduced our memory usage. – mjuarez Feb 10 '13 at 09:05

Support for Compressed Strings being Dropped in HotSpot JVM?

5 Answers5

Linked