The order in which Java 8 streams process the elements depends on following factors.
Encounter order
This is the order in which the stream source makes its elements available. The source elements can be in a particular order or it might not have any meanings of order. If there's no encounter order from source then we should not care whether the stream result will be ordered. If we have an ordered source and perform intermediate and terminal operations which guarantee to maintain the order, it does not matter whether the stream is processed in parallel or sequential, the order will be maintained.
Spliterator#ORDERED characteristic:
If stream is created with Spliterator#ORDERED characteristic (please check out our Spliterator tutorial) then the implementation logic of operations like limit(), skip(), distinct() and findFirst() will be tied to encounter order regardless if it's sequential or parallel stream. We will discuss these operations in the next sections.
Collection sources:
Streams created via Collection#stream() or Collection#parallelStream() will be ordered depending on the underlying collection. For example List and arrays are ordered. HashSet is not ordered, whereas LinkedHashSet is ordered (Check out our Java collection cheat sheet).
Other sources:
All stream sources created from static methods of stream classes e.g. Stream#of(), Stream#iterate(), Stream#builder() (and other corresponding IntStream, LongStream, DoubleStream methods) are ordered. (It's good to see our stream operation cheat sheet whenever we are comparing stream operations)
Stream#concat methods create a new stream of the first stream followed by all the elements of the second stream. The resulting stream is ordered if both of the input streams are ordered.
Stream#generate(),IntStream#generate() and other Long, double generate functions create the streams without Spliterator#ORDERED characteristic, so it's unordered.
Intermediate operations and encounter order
Following are the intermediate methods which either effect the encounter order or effect the performance in the presence of encounter order.
Stream#sorted()
This method returns a stream whose elements are sorted in natural order, thus ignoring any initial encounter order and imposing a new sorted order.
We can also use overloaded sorted(Comparator<? super T> comparator) to sort elements per provided comparator.
Set<Integer> list = new HashSet<>(Arrays.asList(2, 1, 3));
Object[] objects = list.stream().sorted().toArray();
System.out.println(Arrays.toString(objects))
Output:
[1, 2, 3]
For ordered streams, the sort is stable. For unordered streams, no stability guarantees are made. A stable sort is the one where two objects with same compare keys appear in the same encounter order in sorted output as they appear in the input source.
BaseStream#unordered()
Returns an unordered stream if it initially has encounter order, no change otherwise. This method doesn't take any actions to explicitly unorder the stream, it just removes the ORDERED constraint on the stream which will optimize operations like skip(), limit(), distinct() and the terminal operation findFirst.
Stream#skip(long n)
This method returns a new stream which will discard the first n elements of the stream. For ordered parallel stream it's an comparatively expensive operation, that's because multiple threads have to wait for each other to figure out the result which should respect the encountered order. Using an unordered stream source (such as generate(Supplier)) or removing the ordering constraint by using BaseStream#unordered() may result in performance improvement for parallel pipelines, we should only do that if we don't care about the ordered result.
In following examples we are going to use our PerformanceTestUtil.runTest method to record execution time, this method also run the each test twice, first time silently to avoid cold-start (class loading etc) extra time. The complete code is included in the project browser below.
public class SkipExample {
public static void main (String[] args) {
PerformanceTestUtil.runTest("unordered parallel skip", () -> {
IntStream intStream = IntStream.range(1, 100000000);
intStream.unordered().parallel().skip(1000).toArray();
});
PerformanceTestUtil.runTest("ordered parallel skip", () -> {
IntStream intStream = IntStream.range(1, 100000000);
intStream.parallel().skip(1000).toArray();
});
}
}
Output on my intel i7-2.70GHz, 4 core,8 logical processor, 16Gb ram machine:
unordered parallel skip time taken: 123.6 milliseconds
ordered parallel skip time taken: 166.5 milliseconds
Note that the result might vary from machine to machine.
Stream#limit(long maxSize)
This method returns a new stream truncated to be no longer than maxSize. For ordered parallel stream it's an comparatively expensive operation, that's because at terminal operation multiple threads have to wait for each other to find out specified number of elements in encounter order rather than the ones which just finish first in time. Using an unordered stream source or removing the ordering constraint by using BaseStream#unordered() may result in performance improvements.
public class LimitExample {
public static void main (String[] args) {
PerformanceTestUtil.runTest("unordered parallel stream limit test", () -> {
IntStream stream = IntStream.range(0, 1000000000);
stream.unordered().parallel().filter(i -> i % 2 == 0)
.limit(100000000).count();
});
PerformanceTestUtil.runTest("ordered parallel stream limit test", () -> {
IntStream stream = IntStream.range(0, 1000000000);
stream.parallel().filter(i -> i % 2 == 0)
.limit(100000000).count();
});
}
}
Output:
unordered parallel skip time taken: 148.7 milliseconds
ordered parallel skip time taken: 160.5 milliseconds
Stream#distinct(), IntStream#distinct(), LongStream#distinct() and DoubleStream#distinct()
These methods return a new stream consisting of the distinct elements of this stream. The distinct elements are select based on equal and hashCode methods.
public class DistinctExample {
public static void main (String[] args) {
PerformanceTestUtil.runTest("unordered stream", () -> {
IntStream stream = IntStream.range(0, 1000000);
stream.unordered().parallel().distinct().count();
});
PerformanceTestUtil.runTest("ordered stream", () -> {
IntStream stream = IntStream.range(0, 1000000);
stream.parallel().distinct().count();
});
}
}
Output:
unordered stream time taken: 32.43 milliseconds
ordered stream time taken: 134.6 milliseconds
As seen in the output, distinct operation with ordered parallel stream is noticeably slower than the unordered parallel stream. The reason is: ordered parallel stream preserves the stability whereas, unordered parallel stream does not.
The distinct operation on ordered stream preserves the encounter order and the element appearing first in the order are selected for the final result. For unordered stream no stability is maintained, the duplicates are just ignored which come later in time rather than respecting encounter order.
distinct() is full barrier operation with a lot of memory utilization.
Generally full barrier operation means all operations appearing first in sequence must be performed completely before starting the next operation.
If order is not important we should always choose unordered data source or use the method unorder() to remove order constraint.
Following example shows what stability means in ordered parallel streams:
public class MyObject {
private static int c = 0;
private int id = ++c;
private String str;
public MyObject (String str) {
this.str = str;
}
@Override
public boolean equals (Object o) {
if (this == o)
return true;
if (o == null || getClass() != o.getClass())
return false;
MyObject myObject = (MyObject) o;
return str != null ? str.equals(myObject.str) : myObject.str == null;
}
@Override
public int hashCode () {
return str != null ? str.hashCode() : 0;
}
@Override
public String toString () {
return "MyObject{id=" + id + ", str='" + str + "\'}";
}
}
public class DistinctStabilityExample {
Object[] myObjects = createStream().parallel().distinct().toArray();
System.out.printf("ordered distinct result 1: %s%n",
Arrays.toString(myObjects));
MyObject.c = 0;
myObjects = createStream().parallel().distinct().toArray();
System.out.printf("ordered distinct result 2: %s%n",
Arrays.toString(myObjects));
MyObject.c = 0;
myObjects = createStream().unordered().parallel().distinct().toArray();
System.out.printf("unordered distinct result 1: %s%n",
Arrays.toString(myObjects));
MyObject.c = 0;
myObjects = createStream().unordered().parallel().distinct().toArray();
System.out.printf("unordered distinct result 2: %s%n",
Arrays.toString(myObjects));
}
ordered distinct result 1: [MyObject{id=1, str='a'}, MyObject{id=2, str='b'}, MyObject{id=3, str='c'}]
ordered distinct result 2: [MyObject{id=1, str='a'}, MyObject{id=2, str='b'}, MyObject{id=3, str='c'}]
unordered distinct result 1: [MyObject{id=7, str='a'}, MyObject{id=4, str='b'}, MyObject{id=5, str='c'}]
unordered distinct result 2: [MyObject{id=1, str='a'}, MyObject{id=4, str='b'}, MyObject{id=5, str='c'}]
Terminal operations and encounter order
Following are the terminal operations which either ignore the encounter order or effect the performance in the presence of ORDERED constraint.
Stream#forEach(Consumer<? super T> action)
For parallel stream pipelines, Stream#forEach() operation does not guarantee to respect the encounter order of the stream, as doing will undermine parallelism. For parallel streams, we should not use this method if encounter order is important.
Stream#forEachOrdered(Consumer<? super T> action)
This method guarantees the encounter order of the stream. In case of parallel processing, Stream#forEachOrdered() processes the elements one by one. Each consumer action establishes happens-before relation with the subsequent elements action. We may lose the parallel stream performance gain by using this method.
Here's the comparison between forEach() and forEachOrdered()
public class ForEachExample {
public static void main (String[] args) {
final int[] ints = IntStream.range(0, 5).toArray();
PerformanceTestUtil.runTest("forEach() method", () -> {
Arrays.stream(ints).parallel().forEach(i -> doSomething(i));
});
PerformanceTestUtil.runTest("forEachOrdered() method", () -> {
Arrays.stream(ints).parallel().forEachOrdered(i -> doSomething(i));
});
}
private static void doSomething (int i) {
try {
Thread.sleep(10);
} catch (InterruptedException e) {
e.printStackTrace();
}
System.out.printf("%s, ", i);
}
}
Output:
3, 1, 4, 0, 2, forEach() method time taken: 10.08 milliseconds
0, 1, 2, 3, 4, forEachOrdered() method time taken: 50.18 milliseconds
It's clear from the output forEachOrdered maintains the encounter order but is slower.
Stream#peek()
This method and other peek methods defined in IntStream, LongStream and DoubleStream do not respect encounter order. Also these methods are for debugging purpose only and shouldn't be used to implement application logic.
public class PeekExample {
public static void main (String[] args) {
IntStream.range(0, 5).parallel().peek(System.out::println).
count();
}
}
Output:
4
1
0
2
3
Dependencies and Technologies Used:
|