I have been trying to understand and showcase how Java streams implement a type of loop fusion under the hood, so that several operations can be fused into a single pass.
This first example here:
Stream.of("The", "cat", "sat", "on", "the", "mat")
.filter(w -> {
System.out.println("Filtering: " + w);
return w.length() == 3;
})
.map(w -> {
System.out.println("Mapping: " + w);
return w.toUpperCase();
})
.forEach(w -> System.out.println("Printing: " + w));
Has the following output (with the fusion of a single pass for each element quite clear):
Filtering: The
Mapping: The
Printing: THE
Filtering: cat
Mapping: cat
Printing: CAT
Filtering: sat
Mapping: sat
Printing: SAT
Filtering: on
Filtering: the
Mapping: the
Printing: THE
Filtering: mat
Mapping: mat
Printing: MAT
The second example is the same but I use the sorted() operation between the filter and map:
Stream.of("The", "cat", "sat", "on", "the", "mat")
.filter(w -> {
System.out.println("Filtering: " + w);
return w.length() == 3;
})
.sorted()
.map(w -> {
System.out.println("Mapping: " + w);
return w.toUpperCase();
})
.forEach(w -> System.out.println("Printing: " + w));
This has the following output:
Filtering: The
Filtering: cat
Filtering: sat
Filtering: on
Filtering: the
Filtering: mat
Mapping: The
Printing: THE
Mapping: cat
Printing: CAT
Mapping: mat
Printing: MAT
Mapping: sat
Printing: SAT
Mapping: the
Printing: THE
So my question is here, with the call to distinct, am I correct in thinking that because it is a "stateful" intermediate operation, it does not allow individual elements to be processed individually during a single pass (of all operations). Furthermore, because the sorted() stateful operation needs to process the entire input stream to produce a result, then the fusing technique cannot be deployed here, so that is why all the filtering occurs first, and then it fuses together the mapping and printing operations, after the sort? Please correct me if any of my assumptions are incorrect and feel free to elaborate on what I have already said.
In addition, how does it decide under the hood whether it can fuse elements together into a single pass or not, for example, when the distinct() operation exists, is there simply a flag that switches off to stop it from happening as it does when distinct() is not there?
A final query is, whilst the benefit of fusing operations into a single pass is sometimes obvious, for example, when combined with short-circuiting. What are the main benefits of fusing together operations such as a filter-map-forEach, or even a filter-map-sum?