149

What is the concept of erasure in generics in Java?

Peter Mortensen
  • 28,342
  • 21
  • 95
  • 123
Tushu
  • 1,756
  • 3
  • 13
  • 19

7 Answers7

211

It's basically the way that generics are implemented in Java via compiler trickery. The compiled generic code actually just uses java.lang.Object wherever you talk about T (or some other type parameter) - and there's some metadata to tell the compiler that it really is a generic type.

When you compile some code against a generic type or method, the compiler works out what you really mean (i.e. what the type argument for T is) and verifies at compile time that you're doing the right thing, but the emitted code again just talks in terms of java.lang.Object - the compiler generates extra casts where necessary. At execution time, a List<String> and a List<Date> are exactly the same; the extra type information has been erased by the compiler.

Compare this with, say, C#, where the information is retained at execution time, allowing code to contain expressions such as typeof(T) which is the equivalent to T.class - except that the latter is invalid. (There are further differences between .NET generics and Java generics, mind you.) Type erasure is the source of many of the "odd" warning/error messages when dealing with Java generics.

Other resources:

Aniket Sahrawat
  • 11,015
  • 3
  • 31
  • 59
Jon Skeet
  • 1,261,211
  • 792
  • 8,724
  • 8,929
  • This answer is factually incorrect: for example, if I have a class with two fields of types List and List, then at execution time they will have different generic types. For details, see http://java.sun.com/javase/6/docs/api/java/lang/reflect/ParameterizedType.html – Rogério Dec 28 '09 at 00:05
  • 6
    @Rogerio: No, the *objects* won't have different generic types. The *fields* know the types, but the objects don't. – Jon Skeet Dec 28 '09 at 08:55
  • 3
    Yes, but you didn't quite specify that. A statement like "At execution time, a List and a List are exactly the same" can easily be interpreted to be more general than it actually is. Why did you never mention that much type info IS actually available at runtime? Seems biased against Java to me... – Rogério Dec 29 '09 at 02:29
  • 2
    @Rogerio: The statement seems pretty unambiguous to me. A `List` is an object, and it *is* the same as a `List`. If I'd meant a `List` *field* I'd have said so. I program extensively in both Java and C#, and type erasure *is* a significant pain. You seem to be the one trying to imply that it's never an issue. – Jon Skeet Dec 29 '09 at 07:24
  • 1
    Of course it is an issue in certain situations. You seem to imply, though, that it's always or almost always an issue; my experience says otherwise. BTW, can you actually give a concrete and realistic example of a generics-related problem which can be solved at runtime in C# but not in Java? (I am sure such problems exist, but I admit I am not familiar with them...) – Rogério Dec 29 '09 at 13:03
  • 8
    @Rogerio: Absolutely - it's extremely easy to find out at execution time whether something which is only provided as `Object` (in a weakly typed scenario) is actually a `List`) for example. In Java that's just not feasible - you can find out that it's an `ArrayList`, but not what the original generic type was. This sort of thing can come up in serialization/deserialization situations, for example. Another example is where a container has to be able to construct instances of its generic type - you have to pass that type in separately in Java (as `Class`). – Jon Skeet Dec 29 '09 at 13:15
  • 6
    I never claimed it was always or almost always an issue - but it's at least *reasonably* frequently an issue in my experience. There are various places where I'm forced to add a `Class` parameter to a constructor (or generic method) simply because Java doesn't retain that information. Look at `EnumSet.allOf` for example - the generic type argument to the method should be enough; why do I need to specify a "normal" argument as well? Answer: type erasure. This sort of thing pollutes an API. Out of interest, have you used .NET generics much? (continued) – Jon Skeet Dec 29 '09 at 13:18
  • 5
    Before I used .NET generics, I found Java generics awkward in various ways (and wildcarding is still a headache, although the "caller-specified" form of variance definitely has advantages) - but it was only after I'd used .NET generics for a while that I saw how many patterns became awkward or impossible with Java generics. It's the Blub paradox again. I'm not saying that .NET generics doesn't have downsides either, btw - there are various type relationships which can't be expressed, unfortunately - but I far prefer it to Java generics. – Jon Skeet Dec 29 '09 at 13:21
  • 1
    True, the actual "E" in `List` is not discoverable at runtime from an instance alone. Java serialization provides full control over the process, though (the readObject/writeObject methods). But yes, I can see that `EnumSet.allOf()` is better than `EnumSet.allOf(MyEnum.class)`. I only used .NET 1.1, back in 2002 (before generics were added in C# 2.0). Java generics is awkward at times, no doubt. Maybe it will be improved still, in Java 7 or 8 (if those actually get released); after all, even C# 4.0 introduces improvements for generics. – Rogério Dec 30 '09 at 00:18
  • 1
    My main point in this whole discussion is that there is A LOT a Java developer can do with generic type information at runtime, through the Reflection API. For a "real-world" example, see this answer I posted some time ago: http://stackoverflow.com/questions/1170708/is-it-possible-to-create-a-mock-object-that-implements-multiple-interfaces-with-e/1185128#1185128. So, Java generics may have limitations, but it's far from just "syntactic sugar", as some people (not you, Jon) think. – Rogério Dec 30 '09 at 00:49
  • 5
    @Rogerio: There's a lot you *can* do with reflection - but I don't tend to find I *want* to do those things nearly as often as the things that I *can't* do with Java generics. I don't want to find out the type argument for a field *nearly* as often as I want to find out the type argument of an actual object. – Jon Skeet Dec 30 '09 at 07:54
  • 1
    I don't know... "to find out the type argument of an actual object" sounds like you may be favoring conditional logic over polymorphism. Wouldn't it be similar to using the `instanceof` operator, a known bad practice (not to say it doesn't have its place)? – Rogério Dec 30 '09 at 18:32
  • 2
    @Rogerio: If overused, it would clearly be a bad thing - but as you say, these things have their place... which means it's useful for them to actually be *possible*. However, I didn't actually say I was using conditional logic - things like creating a new instance of the type argument aren't conditional, but they do rely on the type argument being known at execution time. – Jon Skeet Dec 30 '09 at 18:39
  • I understand type erasure in Java mean: List will become "List of Object" at runtime. Nevertheless, object inside that list still keep their type, so why java doesn't allow `instance of E`. thanks. for example we don't use generic but we use raw list then we insert many objects (on any types). After that, we can get each object and use `instance_of` operator. why this shouldn't work for generic. – Trần Kim Dự Feb 12 '17 at 07:29
  • "Nevertheless, object inside that list still keep their type, so why java doesn't allow instance of E." Because it doesn't know what `E` is at execution time... how can it check whether an object is an instance of a type it doesn't know? – Jon Skeet Feb 12 '17 at 07:41
42

Just as a side-note, it is an interesting exercise to actually see what the compiler is doing when it performs erasure -- makes the whole concept a little easier to grasp. There is a special flag you can pass the compiler to output java files that have had the generics erased and casts inserted. An example:

javac -XD-printflat -d output_dir SomeFile.java

The -printflat is the flag that gets handed off to the compiler that generates the files. (The -XD part is what tells javac to hand it to the executable jar that actually does the compiling rather than just javac, but I digress...) The -d output_dir is necessary because the compiler needs some place to put the new .java files.

This, of course, does more than just erasure; all of the automatic stuff the compiler does gets done here. For example, default constructors are also inserted, the new foreach-style for loops are expanded to regular for loops, etc. It is nice to see the little things that are happening automagically.

jigawot
  • 653
  • 6
  • 10
31

Erasure, literally means that the type information which is present in the source code is erased from the compiled bytecode. Let us understand this with some code.

import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;

public class GenericsErasure {
    public static void main(String args[]) {
        List<String> list = new ArrayList<String>();
        list.add("Hello");
        Iterator<String> iter = list.iterator();
        while(iter.hasNext()) {
            String s = iter.next();
            System.out.println(s);
        }
    }
}

If you compile this code and then decompile it with a Java decompiler, you will get something like this. Notice that the decompiled code contains no trace of the type information present in the original source code.

import java.io.PrintStream;
import java.util.*;

public class GenericsErasure
{

    public GenericsErasure()
    {
    }

    public static void main(String args[])
    {
        List list = new ArrayList();
        list.add("Hello");
        String s;
        for(Iterator iter = list.iterator(); iter.hasNext(); System.out.println(s))
            s = (String)iter.next();

    }
} 
Blorgbeard
  • 93,378
  • 43
  • 217
  • 263
Parag
  • 10,693
  • 15
  • 53
  • 74
  • I tried to use java decompiler to see the code after type erasure from .class file, but the .class file still have type information. I tried `jigawot` said, it works. – frank Jun 24 '18 at 01:14
26

To complete the already very complete Jon Skeet's answer, you have to realize the concept of type erasure derives from a need of compatibility with previous versions of Java.

Initially presented at EclipseCon 2007 (no longer available), the compatibility included those points:

  • Source compatibility (Nice to have...)
  • Binary compatibility (Must have!)
  • Migration compatibility
    • Existing programs must continue to work
    • Existing libraries must be able to use generic types
    • Must have!

Original answer:

Hence:

new ArrayList<String>() => new ArrayList()

There are propositions for a greater reification. Reify being "Regard an abstract concept as real", where language constructs should be concepts, not just syntactic sugar.

I should also mention the checkCollection method of Java 6, which returns a dynamically typesafe view of the specified collection. Any attempt to insert an element of the wrong type will result in an immediate ClassCastException.

The generics mechanism in the language provides compile-time (static) type checking, but it is possible to defeat this mechanism with unchecked casts.

Usually this is not a problem, as the compiler issues warnings on all such unchecked operations.

There are, however, times when static type checking alone is not sufficient, like:

  • when a collection is passed to a third-party library and it is imperative that the library code not corrupt the collection by inserting an element of the wrong type.
  • a program fails with a ClassCastException, indicating that an incorrectly typed element was put into a parameterized collection. Unfortunately, the exception can occur at any time after the erroneous element is inserted, so it typically provides little or no information as to the real source of the problem.

Update July 2012, almost four years later:

It is now (2012) detailed in "API Migration Compatibility Rules (Signature Test)"

The Java programming language implements generics using erasure, which ensures that legacy and generic versions usually generate identical class files, except for some auxiliary information about types. Binary compatibility is not broken because it is possible to replace a legacy class file with a generic class file without changing or recompiling any client code.

To facilitate interfacing with non-generic legacy code, it is also possible to use the erasure of a parameterized type as a type. Such a type is called a raw type (Java Language Specification 3/4.8). Allowing the raw type also ensures backward compatibility for source code.

According to this, the following versions of the java.util.Iterator class are both binary and source code backward compatible:

Class java.util.Iterator as it is defined in Java SE version 1.4:

public interface Iterator {
    boolean hasNext();
    Object next();
    void remove();
}

Class java.util.Iterator as it is defined in Java SE version 5.0:

public interface Iterator<E> {
    boolean hasNext();
    E next();
    void remove();
}
Peter Mortensen
  • 28,342
  • 21
  • 95
  • 123
VonC
  • 1,042,979
  • 435
  • 3,649
  • 4,283
  • 3
    Note that backwards compatibility could have been achieved without type erasure, but not without Java programmers learning a new set of collections. That's exactly the route that .NET went. In other words, it's this third bullet which is the important one. (Continued.) – Jon Skeet Nov 24 '08 at 07:55
  • 16
    Personally I think this was a myopic mistake - it gave a short term advantage and a long term disadvantage. – Jon Skeet Nov 24 '08 at 07:55
9

Complementing the already-complemented Jon Skeet answer...

It has been mentioned that implementing generics through erasure leads to some annoying limitations (e.g. no new T[42]). It has also been mentioned that the primary reason for doing things this way was backwards compatibility in the bytecode. This is also (mostly) true. The bytecode generated -target 1.5 is somewhat different from just de-sugared casting -target 1.4. Technically, it's even possible (through immense trickery) to gain access to generic type instantiations at runtime, proving that there really is something in the bytecode.

The more interesting point (which has not been raised) is that implementing generics using erasure offers quite a bit more flexibility in what the high-level type system can accomplish. A good example of this would be Scala's JVM implementation vs CLR. On the JVM, it is possible to implement higher-kinds directly due to the fact that the JVM itself imposes no restrictions on generic types (since these "types" are effectively absent). This contrasts with the CLR, which has runtime knowledge of parameter instantiations. Because of this, the CLR itself must have some concept of how generics should be used, nullifying attempts to extend the system with unanticipated rules. As a result, Scala's higher-kinds on the CLR are implemented using a weird form of erasure emulated within the compiler itself, making them not-entirely-compatible with plain-old .NET generics.

Erasure may be inconvenient when you want to do naughty things at runtime, but it does offer the most flexibility to the compiler writers. I'm guessing that's part of why it's not going away any time soon.

Daniel Spiewak
  • 52,267
  • 12
  • 104
  • 120
  • 7
    The inconvenience isn't when you want to do "naughty" things at execution time. It's when you want to do perfectly reasonable things at execution time. In fact, type erasure allows you to do far naughtier things - such as casting a List to List and then to List with only warnings. – Jon Skeet Nov 24 '08 at 09:13
6

As I understand it (being a .NET guy) the JVM has no concept of generics, so the compiler replaces type parameters with Object and performs all the casting for you.

This means that Java generics are nothing but syntax sugar and don't offer any performance improvement for value types that require boxing/unboxing when passed by reference.

Peter Mortensen
  • 28,342
  • 21
  • 95
  • 123
Andrew Kennan
  • 13,207
  • 3
  • 22
  • 33
  • 3
    Java generics can't represent value types anyway - there's no such thing as a List. However, there's no pass-by-reference in Java at all - it's strictly pass by value (where that value may be a reference.) – Jon Skeet Nov 24 '08 at 07:31
2

There are good explanations. I only add an example to show how the type erasure work with a decompiler.

Original class,

import java.util.ArrayList;
import java.util.List;


public class S<T> {

    T obj; 

    S(T o) {
        obj = o;
    }

    T getob() {
        return obj;
    }

    public static void main(String args[]) {
        List<String> list = new ArrayList<>();
        list.add("Hello");

        // for-each
        for(String s : list) {
            String temp = s;
            System.out.println(temp);
        }

        // stream
        list.forEach(System.out::println);
    }
}

Decompiled code from its bytecode,

import java.io.PrintStream;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.Objects;
import java.util.function.Consumer;

public class S {

   Object obj;


   S(Object var1) {
      this.obj = var1;
   }

   Object getob() {
      return this.obj;
   }

   public static void main(String[] var0) {

   ArrayList var1 = new ArrayList();
   var1.add("Hello");


   // for-each
   Iterator iterator = var1.iterator();

   while (iterator.hasNext()) {
         String string;
         String string2 = string = (String)iterator.next();
         System.out.println(string2);
   }


   // stream
   PrintStream printStream = System.out;
   Objects.requireNonNull(printStream);
   var1.forEach(printStream::println);


   }
}
snr
  • 13,515
  • 2
  • 48
  • 77