4

I'm trying to solve a classic problem (find the top K elements in a random collection). I'm trying to do this with Generics, but I keep on getting strange "errors". I'm not sure if what I'm doing is the right way of doing these things, but I don't see another way to be honest.

Interface declaration

public interface TopK<T> {

    T[] getMostOccurrences(T[] items, int k);

}

Implementation

public class TopKHeap<T> implements TopK<T> {

    private T[] items;
    private Map<T, Integer> occurrences;

    @Override
    public T[] getMostOccurrences(T[] items, int k) {
        this.items = items;

        countOccurrences();

        PriorityQueue<T> minHeap = new PriorityQueue<>((n1, n2) -> occurrences.get(n1) - occurrences.get(n2));
        for (T t : occurrences.keySet()) {
            minHeap.add(t);
            if (minHeap.size() > k) {
                minHeap.poll();
            }
        }

        List<T> topItems = new ArrayList<>(k);
        for (int idx = k - 1; idx >= 0; idx--) {
            topItems.add(0, minHeap.poll());
        }

        return (T[]) topItems.toArray();
    }

    private void countOccurrences() {
        occurrences = new HashMap<>();
        for (T t : items) {
            occurrences.put(t, occurrences.getOrDefault(t, 0) + 1);
        }
    }
}

Test Case

    @Test
    public void testTopItems() {
        String[] input = {
                "John",
                "John",
                "John",
                "Jane",
                "Jane",
                "Jane",
                "Jane",
                "Michael",
                "Emily",
                "Emily"
        };

        TopK<String> top = new TopKHeap<>();
        assertThat(new String[]{"Jane", "John"}, Matchers.arrayContaining(top.getMostOccurrences(input, 2)));
    }

I constantly get the following error:

java.lang.ClassCastException: class [Ljava.lang.Object; cannot be cast to class [Ljava.lang.String; ([Ljava.lang.Object; and [Ljava.lang.String; are in module java.base of loader 'bootstrap')

I've tried a couple of different approaches here. Initially I didn't use the ArrayList, but used T[] topItems = (T[]) new Object[k];, which yielded the same result. I replaced the array with an ArrayList, because Collections in general are better in handling Generics in Java, but apparently that didn't solve it.

The question is two-fold I guess. Why does it keep on throwing that error? And second question is, what is a more elegant way to solve this problem? By problem I mean, creating a generic class that uses Arrays.

edit> The method itself runs fine, but the moment I create a reference to the result of the output, it throws the ClassCastException. So whenever I do this, it falls over:

String[] result = top.getMostOccurrences(input, 2); 
Bjorn121
  • 405
  • 4
  • 10
  • IThe fact that arrays are covariant and retained while generics are invariant and erased is a setup for problem when mixing both. --- Can you highlight the line of code throwing the exception? – Turing85 Aug 09 '20 at 09:47
  • Thanks for pointing that out. I've updated my question with more info on where it falls over. – Bjorn121 Aug 09 '20 at 10:15

3 Answers3

2

You asked two questions:

  • Why is the exception occurring?
  • What would be a more elegant solution to the problem of creating generic classes using arrays?

For future posts, please read: Can I ask only one question per post?



Why is the exception occurring?

There are two parts to this answer. First, the question is why the exception occurs. Second, the question why it occurs where it occurs and why it does not occur when the returned value is ignored on the calling side.


Why does the exception occur?

Looking at the implementation of method getMostOccurrences(...) in TopKHeap, we see that (T[]) topItems.toArray() is returned. topItems is a List, and List::toArray() returns an Object[]. That means, if any type check were to be executed to assert that the value returned by TopKHeap::getMostOccurences is something other than Object[], it has to fail.

Since in the test, the type system tries to match Object[] to a String[], we see a ClassCastException.


Why does the exception occur when and where it occurs?

Instead of looking at the original code, we are going to look at a simplified version of the same problem:

class Test {
    public static void main (String... args) {
        Test.<String>foo();
        Object bar = Ideone.<String>foo();
        String baz = Ideone.<String>foo(); // <- ClassCastException thrown here
    }
    
    static <T> T foo() {
        return (T) new Object();
    }
}

Ideone demo

This code will throw a ClassCastException on line 5. The obvious questions are:

  • Why do the two calls to Test.<String>foo() on lines 3 and 4 not throw a ClassCastException?
  • Why is the exception thrown on line 5, not on the return ... in method foo()?

Both questions have the same answer: the JLS does not define where to palce the type check, "and it is up to the compiler implementation to decide where to insert casts or not, as long as the erased code meets the type safety rules of non-generic code." (kudos to newacct).

In short: the compiler is free to place the typecast where it sees fit. And in the implementation use in this answer, the compiler placed the type safety check at the assignment rather than the method return.



What would be a more elegant solution to the problem of creating generic classes using arrays?

First and foremost, try to avoid mixing arrays and generics. Since arrays are covariant and retained, while generics are invariant and erased, using them in combination is a recipe for trouble.

Second, if you cannot avoid it, try to use the array as internal state. The implementation of ArrayList uses an Object[] internall as backing data structure, but never leaks the array to the outside (ArrayList::toArray returns a copy of the internal array).

If you have to leak an interal array in a generic way, the implementation tends to get clumsy (expecting Class<T> instances as parameters for array creation) and rely on reflection, as shown in this answer by Michael Queue.

Turing85
  • 13,364
  • 5
  • 27
  • 49
0

Why does it keep on throwing that error?

ClassCastException is thrown because (T[]) topItems.toArray(); returns an Object[] and you are trying to cast that to String[] in Junit assert as assertThat(new String[]{"Jane", "John"}, Matchers.arrayContaining(top.getMostOccurrences(input, 2)));

What is a more elegant way to solve this problem?

<T> T[] toArray(T[] a); method of List takes array of type T and returns T[]. You can get the type of array using reflection (array type is retained at runtime) as -

Class<?> itemsClass = items.getClass().getComponentType();

return topItems.toArray(((T[]) Array.newInstance(itemsClass, topItems.size())));

Here items.getClass().getComponentType() return class type of array element. Even though array is covariant you can still use this safely as type is enforced by TopK<String> top = new TopKHeap<>();

Complete method -

@Override
public T[] getMostOccurrences(T[] items, int k) {
  this.items = items;

  countOccurrences();

  PriorityQueue<T> minHeap =
      new PriorityQueue<>((n1, n2) -> occurrences.get(n1) - occurrences.get(n2));
  for (T t : occurrences.keySet()) {
    minHeap.add(t);
    if (minHeap.size() > k) {
      minHeap.poll();
    }
  }

  List<T> topItems = new ArrayList<>(k);
  for (int idx = k - 1; idx >= 0; idx--) {
    topItems.add(0, minHeap.poll());
  }

  Class<?> itemsClass = items.getClass().getComponentType();

  return topItems.toArray(((T[]) Array.newInstance(itemsClass, topItems.size())));
}
Pankaj
  • 781
  • 5
  • 19
0

My answers are a lot simpler (in my opinion) than the two that preceded mine:

Q1: „...Why does it keep on throwing that error?...

A1: It throws that error because you're calling the no-arg List.toArray(); which is specified to return an Object[].

So you get a ClassCastException when you do String[] result = top.getMostOccurrences(input, 2), for the same reason you can't do String notAString = new Object(). In other words: A String IS A Object but an Object IS NOT A String.

Going the other way around would be OK though: Object[] OK = new String[]{ "foo", "bar" }

Q2: „...a more elegant way to solve this problem?...

A2: That depends on what you consider to be „elegant“ :) In my opinion, the solution I implemented here meets your requirements…

public class TopKHeap<T extends String> implements TopK<T> { /* <-- An explicit bound on T*/

    ...

    @Override
    public T[] getMostOccurrences(int k, T...items) { /* <-- Not essential; just my personal preference */

        ...

        T[] typeEnforcer = copyOf( items, topItems.size( ) ); /* <-- This is what makes it type safe... */
        
        return topItems.toArray( typeEnforcer ); /* <-- ...no need to cast */
    }

}

Then you'd use that just like you originally did. In my demo though…

TopK<String> top = new TopKHeap<>();
    
String[] output = top.getMostOccurrences( 2, input);
    
for(String name : output )
    out.printf("%s ", name);

…I just print out what I get…

Jane John
deduper
  • 1,762
  • 7
  • 19