113

As written in JEP 280: Indify String Concatenation:

Change the static String-concatenation bytecode sequence generated by javac to use invokedynamic calls to JDK library functions. This will enable future optimizations of String concatenation without requiring further changes to the bytecode emmited by javac.

Here I want to understand what the use of invokedynamic calls is and how bytecode concatenation is different from invokedynamic?

Basil Bourque
  • 218,480
  • 72
  • 657
  • 915
Mohit Tyagi
  • 2,594
  • 3
  • 14
  • 29
  • 11
    I [wrote about that](https://www.sitepoint.com/inside-java-9-part-ii/#indifiedstringconcatenation) a while back - if that helps, I will condense it into an answer. – Nicolai Parlog Oct 01 '17 at 13:44
  • 10
    Also, have a look at this video which nicely explains the point of new string concatenation mechanism: https://youtu.be/wIyeOaitmWM?t=37m58s – ZhekaKozlov Oct 01 '17 at 14:13
  • 3
    @ZhekaKozlov I wish I could up-vote your comment twice, links that come from people actually implementing all this are the best. – Eugene Oct 01 '17 at 19:53
  • 2
    @Nicolai: That would be great, and would be a better answer than any other here (including mine). Any parts of my answer you want to incorporate when you do, feel free -- if you include (basically) the whole thing as part of the broader answer, I'll just delete mine. Alternately, if you want to just add to my answer as it's quite visible, I've made it a community wiki. – T.J. Crowder Oct 02 '17 at 14:57

3 Answers3

100

The "old" way output a bunch of StringBuilder-oriented operations. Consider this program:

public class Example {
    public static void main(String[] args)
    {
        String result = args[0] + "-" + args[1] + "-" + args[2];
        System.out.println(result);
    }
}

If we compile that with JDK 8 or earlier and then use javap -c Example to see the bytecode, we see something like this:

public class Example {
  public Example();
    Code:
       0: aload_0
       1: invokespecial #1                  // Method java/lang/Object."<init>":()V
       4: return

  public static void main(java.lang.String[]);
    Code:
       0: new           #2                  // class java/lang/StringBuilder
       3: dup
       4: invokespecial #3                  // Method java/lang/StringBuilder."<init>":()V
       7: aload_0
       8: iconst_0
       9: aaload
      10: invokevirtual #4                  // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
      13: ldc           #5                  // String -
      15: invokevirtual #4                  // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
      18: aload_0
      19: iconst_1
      20: aaload
      21: invokevirtual #4                  // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
      24: ldc           #5                  // String -
      26: invokevirtual #4                  // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
      29: aload_0
      30: iconst_2
      31: aaload
      32: invokevirtual #4                  // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
      35: invokevirtual #6                  // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
      38: astore_1
      39: getstatic     #7                  // Field java/lang/System.out:Ljava/io/PrintStream;
      42: aload_1
      43: invokevirtual #8                  // Method java/io/PrintStream.println:(Ljava/lang/String;)V
      46: return
}

As you can see, it creates a StringBuilder and uses append. This is famous fairly inefficient as the default capacity of the built-in buffer in StringBuilder is only 16 chars, and there's no way for the compiler to know to allocate more in advance, so it ends up having to reallocate. It's also a bunch of method calls. (Note that the JVM can sometimes detect and rewrite these patterns of calls to make them more efficient, though.)

Let's look at what Java 9 generates:

public class Example {
  public Example();
    Code:
       0: aload_0
       1: invokespecial #1                  // Method java/lang/Object."<init>":()V
       4: return

  public static void main(java.lang.String[]);
    Code:
       0: aload_0
       1: iconst_0
       2: aaload
       3: aload_0
       4: iconst_1
       5: aaload
       6: aload_0
       7: iconst_2
       8: aaload
       9: invokedynamic #2,  0              // InvokeDynamic #0:makeConcatWithConstants:(Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;)Ljava/lang/String;
      14: astore_1
      15: getstatic     #3                  // Field java/lang/System.out:Ljava/io/PrintStream;
      18: aload_1
      19: invokevirtual #4                  // Method java/io/PrintStream.println:(Ljava/lang/String;)V
      22: return
}

Oh my but that's shorter. :-) It makes a single call to makeConcatWithConstants from StringConcatFactory, which says this in its Javadoc:

Methods to facilitate the creation of String concatenation methods, that can be used to efficiently concatenate a known number of arguments of known types, possibly after type adaptation and partial evaluation of arguments. These methods are typically used as bootstrap methods for invokedynamic call sites, to support the string concatenation feature of the Java Programming Language.

T.J. Crowder
  • 879,024
  • 165
  • 1,615
  • 1,639
  • 41
    This reminds me of an answer I wrote almost 6 year ago to the day: https://stackoverflow.com/a/7586780/330057 - Someone asked if they should make a StringBuilder or just use plain old `+=` in their for loop. I told them it depends, but let's not forget that they might find a better way to string concat sometime down the road. The key line is really the penultimate line: `So by being smart, you have caused a performance hit when Java got smarter than you. ` – corsiKa Oct 01 '17 at 17:52
  • 3
    @corsiKa: LOL! But wow, it took a long time to get there (I don't mean six years, I mean 22 or so... :-) ) – T.J. Crowder Oct 01 '17 at 18:39
  • Is there any reason the best approach in cases where each argument is already a string wouldn't be to simply offer a static `String.concat` method with overloads which take various numbers of `String` objects and concatentate them, or take a `String[]` and concatenate the members thereof? Nice and simple. – supercat Oct 02 '17 at 13:16
  • 1
    @supercat: As I understand it, there are a couple of reasons, not least that creating a varargs array to pass to a method on a performance-critical path isn't ideal. Also, using `invokedynamic` allows different concatenation strategies to be chosen at runtime and bound on the first invocation, without the overhead of a method call and dispatch table on each invocation; more in [nicolai's](https://stackoverflow.com/users/2525313/nicolai) article [here](https://www.sitepoint.com/inside-java-9-part-ii/#indifiedstringconcatenation) and in [the JEP](http://openjdk.java.net/jeps/280). – T.J. Crowder Oct 02 '17 at 14:13
  • @T.J.Crowder: If overloads exist for up to four strings, there would be no need to create an array with fewer than five arguments, and when concatenating that many the creation of the array would represent a small portion of the total work. – supercat Oct 02 '17 at 14:41
  • 1
    @supercat: And then there's the fact it wouldn't play well with non-Strings, as they'd have to be pre-converted to String rather than being converted into the final result; more inefficiency. Could make it `Object`, but then you'd have to box all the primitives... (Which Nicolai covers in his excellent article, btw.) – T.J. Crowder Oct 02 '17 at 14:47
  • @T.J.Crowder: Object creation is sufficiently efficient in Java that I would expect the cost of boxing an object to be a small fraction of the overall work required to produce a properly-formatted string. – supercat Oct 02 '17 at 15:08
  • @supercat: Well, apparently with all the other stuff thrown in, the people behind the JEP disagreed, as did their "endorser," the committee approving the change (I assume there is one, I'm not au fait with the process), etc. – T.J. Crowder Oct 02 '17 at 15:14
  • "the default capacity of the built-in buffer in `StringBuilder` is only 16 chars, and there's no way to allocate more in advance" What about [the constructor that lets you specify capacity](https://docs.oracle.com/javase/1.5.0/docs/api/java/lang/StringBuilder.html#StringBuilder(int)), which existed at least since Java 5.0? – svick Oct 02 '17 at 16:14
  • 1
    @svick: We're talking about when the *compiler* adds the `StringBuilder` when faced with code written as string concatenation, not when the coder does. The compiler can't know in advance to do better than the default. (The JVM, on the other hand, *sometimes* can.) – T.J. Crowder Oct 02 '17 at 16:19
  • @T.J.Crowder: Looking through Java library source some years back, I was taken aback by the extent to which they optimized some things, but missed some major opportunities. Given `String foo=bar+boz+bam;` I don't think there are any cases where using `StringBuilder` would be more efficient than `(bar==null?"null":bar).concat(boz==null?"null":boz).concat(bam==nulll?"null":bam))`. – supercat Oct 02 '17 at 17:14
  • 1
    @supercat perhaps, if all of them are already `String`s. But `StringBuilder` can avoid creating intermediate Strings if some of the concatenated Objects are of other types. – Hulk Oct 03 '17 at 08:49
  • 1
    @Hulk: The compiler should be able to figure out whether they're strings or not. Further, in the string-plus-something-else case, the cost of producing and discarding the `StringBuilder` would be greater than the cost of building another string. – supercat Oct 03 '17 at 14:24
  • But how is String concatenation _actually_ implemented? I mean, what exactly `makeConcatWithConstants` return? How is returned CallSite implemented by VM? – turbanoff Oct 19 '17 at 12:48
  • @turbanoff: Worth reading [this article](https://www.sitepoint.com/inside-java-9-part-ii/#indifiedstringconcatenation) from [Nicolai](https://stackoverflow.com/users/2525313/nicolai) (which I encouraged him to turn into an answer, but no luck). – T.J. Crowder Oct 19 '17 at 13:05
  • 1
    @supercat there are a lot of baffling things, Java compiler do (or refuse to do). At least when concatenating exactly two provenly non-`null` strings, just calling `String.concat` is more efficient than dealing with `StringBuilder`, without any doubts, but the compilers do not check for that case. Also, compilers may not be able to calculate the exact required capacity, but of course, they may provide a better estimate than the default of `16`. The good thing about using `invokedynamic` for string concatenation is that now the responsibility is not at the compiler (developers) anymore. – Holger Mar 22 '18 at 09:29
  • @Holger: Static methods `String.concat(String[])` and `String.concatObjects(Object[])` could easily have been more efficient than anything using `StringBuilder`, and would have also been simpler to handle in the compiler. I'm not sure what obstacles there are to including such functions, but I'd think the benefits would be obvious. – supercat Mar 22 '18 at 14:48
  • 2
    @supercat I was referring to the already existing `String.concat(String)` method whose implementation is creating the resulting string’s array in-place. The advantage becomes moot when we have to invoke `toString()` on arbitrary objects. Likewise, when calling a method accepting an array, the caller has to create and fill the array which reduces the overall benefit. But now, it’s irrelevant, as the new solution basically is what you were considering, except that it has no boxing overhead, needs no array creation, and the backend may generate optimized handlers for particular scenarios. – Holger Mar 22 '18 at 15:07
  • 1
    @Holger: Using a `StringBuilder` to construct a string that's N bytes long will require creating and abandoning an array holding N bytes. The total length of strings produced by `toString()` would be N. Static Concat overloads for 2-4 strings would be helpful for small numbers of strings, but when using more strings the cost of a `String[]` would be less than the cost of a `StringBuilder` and its associated `char[]`. – supercat Mar 22 '18 at 15:32
  • 1
    @supercat it seems, you got me wrong. I’m not objecting the idea that such a method could have a benefit, but it would require a change of the specification and a new runtime adapted to it. In contrast, just using the `concat` method for two strings were feasible would require, say ten lines of code in the compiler and work with all Java versions since 1.0. If compiler vendors do not even that, you shouldn’t expect too much from them (look at how `switch` over strings has been implemented or how `javac` does try-with-resource). The `invokedynamic` approach solves all `concat` issues at once… – Holger Mar 22 '18 at 15:52
  • @Holger: Yeah, it is curious how many places Java goes out of its way to make performance "improvements" in cases that aren't, while ignoring easy chances to make genuine improvements. Many programs produced by newer compilers aren't going to run on ancient runtime versions anyway, so any time a breaking change was introduced would have been a great time to fix up holes in the standard library design. – supercat Mar 22 '18 at 16:37
22

Before going into the details of the invokedynamic implementation used for optimisation of String concatenation, in my opinion, one must get some background over What's invokedynamic and how do I use it?

The invokedynamic instruction simplifies and potentially improves implementations of compilers and runtime systems for dynamic languages on the JVM. It does this by allowing the language implementer to define custom linkage behavior with the invokedynamic instruction which involves the following the below steps.


I would probably try and take you through these with the changes that were brought along for the implementation of String concatenation optimisation.

  • Defining the Bootstrap Method:- With Java9, the bootstrap methods for invokedynamic call sites, to support the string concatenation primarily makeConcat and makeConcatWithConstants were introduced with the StringConcatFactory implementation.

    The use of invokedynamic provides an alternative to select a translation strategy until runtime. The translation strategy used in StringConcatFactory is similar to the LambdaMetafactory as introduced in the previous java version. Additionally one of the goals of the JEP mentioned in the question is to stretch these strategies further.

  • Specifying Constant Pool Entries:- These are the additional static arguments to the invokedynamic instruction other than (1) MethodHandles.Lookup object which is a factory for creating method handles in the context of the invokedynamic instruction,(2) a String object, the method name mentioned in the dynamic call site and (3) the MethodType object, the resolved type signature of the dynamic call site.

    There are already linked during the linkage of the code. At runtime, the bootstrap method runs and links in the actual code doing the concatenation. It rewrites the invokedynamic call with an appropriate invokestatic call. This loads the constant string from the constant pool, the bootstrap method static args are leveraged to pass these and other constants straight to the bootstrap method call.

  • Using the invokedynamic Instruction:- This offers the facilities for a lazy linkage, by providing the means to bootstrap the call target once, during the initial invocation. The concrete idea for optimisation here is to replace the entire StringBuilder.append dance with a simple invokedynamic call to java.lang.invoke.StringConcatFactory, that will accept the values in the need of concatenation.

The Indify String Concatenation proposal states with an example the benchmarking of the application with Java9 where a similar method as shared by @T.J. Crowder is compiled and the difference in the bytecode is fairly visible between the varying implementation.

Farzad Karimi
  • 701
  • 10
  • 25
Naman
  • 23,555
  • 22
  • 173
  • 290
19

I'll slightly add a bit of details here. The main part to get is that how string concatenation is done is a runtime decision, not a compile time one anymore. Thus it can change, meaning that you have compiled your code once against java-9 and it can change the underlying implementation however it pleases, without the need to re-compile.

And the second point is that at the moment there are 6 possible strategies for concatenation of String:

 private enum Strategy {
    /**
     * Bytecode generator, calling into {@link java.lang.StringBuilder}.
     */
    BC_SB,

    /**
     * Bytecode generator, calling into {@link java.lang.StringBuilder};
     * but trying to estimate the required storage.
     */
    BC_SB_SIZED,

    /**
     * Bytecode generator, calling into {@link java.lang.StringBuilder};
     * but computing the required storage exactly.
     */
    BC_SB_SIZED_EXACT,

    /**
     * MethodHandle-based generator, that in the end calls into {@link java.lang.StringBuilder}.
     * This strategy also tries to estimate the required storage.
     */
    MH_SB_SIZED,

    /**
     * MethodHandle-based generator, that in the end calls into {@link java.lang.StringBuilder}.
     * This strategy also estimate the required storage exactly.
     */
    MH_SB_SIZED_EXACT,

    /**
     * MethodHandle-based generator, that constructs its own byte[] array from
     * the arguments. It computes the required storage exactly.
     */
    MH_INLINE_SIZED_EXACT
}

You can choose any of them via a parameter : -Djava.lang.invoke.stringConcat. Notice that StringBuilder is still an option.

Eugene
  • 102,901
  • 10
  • 149
  • 252