2

Can a groupingBy operation on a stream produce a map where the values are arrays rather than lists or some other collection type?

For example: I have a class Thing. Things have owners, so Thing has a getOwnerId method. In a stream of things I want to group the things by owner ID so that things with the same owner ID end up in an array together. In other words I want a map like the following where the keys are owner IDs and the values are arrays of things belonging to that owner.

    Map<String, Thing[]> mapOfArrays;

In my case, since I need to pass the map values to a library method that requires an array, it would be most convenient to collect into a Map<String, Thing[]>.

Collecting the whole stream into one array is easy (it doesn’t even require an explicit collector):

    Thing[] arrayOfThings = Stream.of(new Thing("owner1"), new Thing("owner2"), new Thing("owner1"))
            .toArray(Thing[]::new);

[Belongs to owner1, Belongs to owner2, Belongs to owner1]

Groping by owner ID is easy too. For example, to group into lists:

    Map<String, List<Thing>> mapOfLists = Stream.of(new Thing("owner1"), new Thing("owner2"), new Thing("owner1"))
            .collect(Collectors.groupingBy(Thing::getOwnerId));

{owner1=[Belongs to owner1, Belongs to owner1], owner2=[Belongs to owner2]}

Only this example gives me a map of lists. There are 2-arg and 3-arg groupingBy methods that can give me a map of other collection types (like sets). I figured, if I can pass a collector that collects into an array (similar to the collection into an array in the first snippet above) to the two-arg Collectors.groupingBy​(Function<? super T,? extends K>, Collector<? super T,A,D>), I’d be set. However, none of the predefined collectors in the Collectors class seem to do anything with arrays. Am I missing a not too complicated way through?

For the sake of a complete example, here’s the class I’ve used in the above snippets:

public class Thing {

    private String ownerId;

    public Thing(String ownerId) {
        this.ownerId = ownerId;
    }

    public String getOwnerId() {
        return ownerId;
    }

    @Override
    public String toString() {
        return "Belongs to " + ownerId;
    }

}
Ole V.V.
  • 65,573
  • 11
  • 96
  • 117
  • 1
    It is *never* more convenient to deal with arrays when dealing with generics. [There's already existing pain out there regarding generic arrays](https://stackoverflow.com/q/529085/1079354), so it would make some sense that the new APIs steer well clear of arrays and push towards collections instead. – Makoto Dec 09 '18 at 17:54

3 Answers3

4

Using the collector from this answer by Thomas Pliakas:

    Map<String, Thing[]> mapOfArrays = Stream.of(new Thing("owner1"), new Thing("owner2"), new Thing("owner1"))
            .collect(Collectors.groupingBy(Thing::getOwnerId,
                    Collectors.collectingAndThen(Collectors.toList(),
                            tl -> tl.toArray(new Thing[0]))));

The idea is to collect into a list at first (which is an obvious idea since arrays have constant size) and then converting to an array before returning to the grouping by collector. collectingAndThen can do that through its so-called finisher.

To print the result for inspection:

    mapOfArrays.forEach((k, v) -> System.out.println(k + '=' + Arrays.toString(v)));
owner1=[Belongs to owner1, Belongs to owner1]
owner2=[Belongs to owner2]

Edit: With thanks to Aomine for the link: Using new Thing[0] as argument to toArray was inspired by Arrays of Wisdom of the Ancients. It seems that on Intel CPUs in the end using new Thing[0] is faster than using new Thing[tl.size()]. I was surprised.

Ole V.V.
  • 65,573
  • 11
  • 96
  • 117
  • minor but for the printing, you could just do `mapOfArrays.forEach((k, v) -> ...)`. the call to `entrySet()` can be avoided. also you might be better of doing `tl.toArray(new Thing[0])` instead of `tl.toArray(new Thing[tl.size()])` (see [Arrays of Wisdom of the Ancients](https://shipilev.net/blog/2016/arrays-wisdom-ancients/) for more details) otherwise good solution 1+. – Ousmane D. Dec 09 '18 at 18:07
  • @Aomine A very surprising article, at least to me. I have edited and changed to `tl.toArray(new Thing[0])`. Thanks again. – Ole V.V. Dec 10 '18 at 11:42
  • 1
    That’s not an Intel specific thing. It’s rather that neither, the temporary zero size array instance nor the reflective array creation, has any performance impact and that the HotSpot JVM fails to eliminate the zero filling of the new array if the creation site and the overwriting code are too far away, like with a caller supplied array. Note that this knowledge also found its way into the JDK, see [JDK11’s `toArray​(IntFunction)`](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/Collection.html#toArray(java.util.function.IntFunction)) method… – Holger Dec 10 '18 at 13:07
2

you could group first then use a subsequent toMap:

Map<String, Thing[]> result = source.stream()
                .collect(groupingBy(Thing::getOwnerId))
                .entrySet()
                .stream()
                .collect(toMap(Map.Entry::getKey,
                        e -> e.getValue().toArray(new Thing[0])));
Ousmane D.
  • 50,173
  • 8
  • 66
  • 103
  • 1
    Thanks. It’s probably easier to understand than my own answer. I also like my own idea of doing the whole thing in one terminal operation, though. – Ole V.V. Dec 09 '18 at 18:04
  • 2
    @OleV.V. yes, yours looks more compact so I can see where you're coming from. good to have options anyway. – Ousmane D. Dec 09 '18 at 18:10
  • Frankly I didn’t know whether to accept your answer or my own. I believe that each has its advantages. I am accepting yours also as an appreciation of the fact that we now have more options to consider and choose from. – Ole V.V. Dec 12 '18 at 11:55
  • @OleV.V. Hey Ole, I couldn't agree more, both certainly have their advantages in their own right, as mentioned before I like the compactness of your answer. as for which answer to accept, please choose the one you feel is most beneficial to the "reader". Thanks. – Ousmane D. Dec 12 '18 at 11:58
2

Probably obvious but you could have done it via:

Stream.of(new Thing("owner1"), new Thing("owner2"), new Thing("owner1"))
            .collect(Collectors.toMap(
                    Thing::getOwnerId,
                    x -> new Thing[]{x},
                    (left, right) -> {
                        Thing[] newA = new Thing[left.length + right.length];
                        System.arraycopy(left, 0, newA, 0, left.length);
                        System.arraycopy(right, 0, newA, left.length, right.length);
                        return newA;
                    }
            ))
Eugene
  • 102,901
  • 10
  • 149
  • 252
  • 1
    The idea occurred to me, but it appeared wasteful of resources, so I didn’t check whether it was doable. Thanks for showing that it is. It’s great to have different proposals. For my case (a stream of typically 1 to 20 elements) efficiency probably is of no concern, so it would be an option. – Ole V.V. Dec 09 '18 at 20:40
  • 1
    if one wanted to condense this a little bit then i guess they could change the merge function to --> `(left, right) -> Stream.concat(Arrays.stream(left), Arrays.stream(right)).toArray(Thing[]::new)` – Ousmane D. Dec 10 '18 at 12:00
  • @Aomine I did not do that on purpose btw, it seems to me that two `System::arraycopy` would be faster that stream... – Eugene Dec 10 '18 at 12:01
  • @Eugene Yes, I seem to have figured that out. Thought I'd leave a comment anyway, to show a different way... – Ousmane D. Dec 10 '18 at 12:04