Working with the Java Stream API

Monday Feb 6th 2017 by Manoj Debnath
Share:

Discover the qualities of the Java Stream API and how to use it in a simple manner.

Java 8 Stream APIs are built keeping lambda expression in mind, and their power can be realized visibly when using them in Java code. The essence of the Stream APIs is in their ability to perform sophisticated operations in data manipulation such as searching, filtering, and mapping data. Conceptually, these operations resemble the peculiarities of database queries through SQL. In fact, in many cases they may be performed in parallel to attain the additional efficiency of data handling. The APIs described in the Java Stream API library are one of the advanced features of Java. A good grasp of the generics and lambda is a must to fully appreciate this feature. The article attempts to bring to you some of the qualities of this API and how to use them in a simple manner.

Overview of Stream

A stream basically represents a flow of data; it is a conduit through which elements are fed from a source, such as an array or a collection for computational needs. A stream does not provide a storage but a means to means streamline data. The operation performed on it produces an outcome without changing the source. For example, you can sort the stream results in the creation of a new stream that represents sorted data. The original data source remains unchanged, except that the outcome reflects the change in a new stream.

It is worth mentioning that the stream referring here is defined in java.util.stream. It is quite different from the IO stream supplied by the java.io package, though conceptually they may act in a similar fashion. The difference lies in their inherently built structure. To be specific, Java IO streams are abstractions that either produce or consume information. They are primarily linked to a physical device of the Java IO system. The idea is to make the streams behave in a symmetry irrespective of the type of devices involved in doing the IO operation. The stream APIs of java.util.stream, on the other hand, are functional in nature. They are more conducive in connection with lambda expressions. They are typically used to manipulate data associated with the data structure.

Operation Pipelining

In a lambda expression, the stream API code can be segmented into data source, operations performed on the data, and the terminal operation. The data source implies the source from where data comes in. For example, an object of List is a valid candidate of a data source. The operations performed in data refer to the action taken on the data, such as sorting or filtering. The terminal operation means what necessary actions are taken after the intermediate operation, such as persistence.

The following snippet provides an example of the previous three actions discussed.

If we want to create a new collection instance that would contain only the filtered items from the original collection, we may write the code in the following manner.

List<String> asteroids=Arrays.asList("Ceres",
"Vesta", "Europa", "Cybele",
"Eunomia","Pallas", "Patientia");

List<String> filteredList=asteroids.stream()
.filter(s->s.startsWith("E"))
.collect(Collectors.toList());

Observe that the 'asteroid' stream represents the data source upon which one or more intermediate operations applied. After we are done with intermediate operations, the outcome at the end is collected in a list object.

Here is another variation where we do not collect the items. Instead, we print the filtered items on screen.

Stream<String> mystream=asteroids.stream();
mystream.filter(s->s.startsWith("E"))
.map(String::toUpperCase)
.sorted()
.forEach(System.out::println);

Or, more simply, as:

asteroids
.stream()
.filter(s->s.startsWith("E"))
.map(String::toUpperCase)
.sorted().forEach(System.out::println);

With lambda expression, the Java compiler infers a lot of information, especially in relation to data types. The power comes from generics. The intriguing fact is that the filter method takes an instance of the Predicate interface. According to the Java API Documentation, Predicate is a functional interface that can be used as an assignment for lambda expression or method reference. It includes a few defined methods, and the one that is invoked in our case is the test method, which is passed as a single parameter and returns a boolean value. The intricacies involved in the concept are beyond the scope of this article. Refer to the Java API Documentation for Predicate, lambda expression, and generics to get further details.

The Stream Interfaces

The interface BaseStream is the foundation of all streams defined in the java.util.stream package and defines the core functionalities of its family. The header of this interface is defined as:

interface BaseStream <T, S extends BaseStream <T, S>>
extends AutoCloseable

Here, T implies the type of elements in the stream and S implies the type of stream that extends BaseStream. The methods declared in this interface are as follows.

Methods Descriptions Operation Type
void close() Closes the stream invoked  
boolean isParallel() Returns true is the stream is parallel; returns false if the stream is sequential.  
Iterator<T> iterator Returns the iterator that references the stream. Terminal Operation
S onClose (Runnable handler) Returns a new stream with a close handler. This handler is used or invoked when the stream is closed during intermediate operation. Intermediate Operation
S parallel()
Returns a parallel stream. Intermediate Operation
S sequential()
Return a sequential stream. Intermediate Operation
Spliterator<T> spliterator() Returns a reference to the Spliterator associated with the stream. Terminal Operation
S unordered() Returns an unordered stream. Intermediate Operation

Because this interface implements the AutoCloseable interface, a stream can be managed in a ty-with-resource statement. This, however, doesn't mean that every stream operation should be explicitly closed by invoking the close method of this interface. Generally, those streams whose data source have some connection with resources, such as a file need to invoke the close operation explicitly. In most cases, invoking the explicit close operation is redundant.

The interfaces derived from the BaseStream are as follows:

The derived interfaces
Figure 1: The derived interfaces

The Stream<T> is the general interface derived from BaseStream. Other stream interfaces, such as IntStream, DoubleStream, and LongStream are specific to a particular type of reference. Stream<T> can handle any reference types. The methods declared in this interface supplement the methods derived from the BaseStream interface. Some of the commonly used methods are as follows:

Methods Descriptions Operation Type
long count() Returns the count of number of elements in the stream. Terminal Operation
Stream<T> filter(Predicate<? Super T> predicate) Filters the elements in the stream based upon the predicate supplied to it as the parameter. Intermediate Operation
<R> Stream<R> map(Function<? super T, ? extends R> mapFunc) Maps to the element based upon the Function applied, creating a new stream that contains only mapped elements. This is a general map function. Other such specific functions are mapToDouble, mapToInt, and mapToLong. Intermediate Operation
Optional<T> max(Comparator<? super T> comp)

Optional<T> min(Comparator<? super T> comp)
Finds and returns the maximum/minimum elements in the stream according to the ordering specified by he Comparator. Terminal Operation
Stream<T> sorted( ) Sorts the elements of the stream in natural order. Intermediate Operation

Because Stream<T> uses an object reference, it cannot be used to operate upon primitive data types. Therefore, to handle the primitive types specifically, three stream interfaces are provided. They are:

  • DoubleStream
  • IntStream
  • LongStream

These stream interfaces have same functionality as Stream, only that it is applied to primitive types.

Operation Type in Stream

Stream operations that are tagged as terminal consume the stream; this means that they are used to produce a result. Once the stream is consumed, it cannot be reused. An intermediate operation, on the other hand, produces another stream. Thus, a chain of intermediate operations can be applied in a pipeline, wherein each operation accepts the stream as a data source and creates another stream colored by the operation applied to it. Intermediate operations, however, do not take place immediately. Instead, they are applied when the terminal operation is executed on the new stream created by the intermediate operation. This type of lazy behavior leverages efficiency of the stream API.

Another significant aspect of intermediate operation is that some of them are stateful and some are stateless. Stateless operations are independent operations and can be carried out simultaneously on most occasions. For example, the filtering operation applied by a stateless predicate is stateless. The elements of stateful operation, on the other hand, depend on the values of other elements. They are cohesive in nature. For example, the sorted operation applied on a stream is stateful. The state of being stateful or stateless is crucial in determining if the stream can be engaged in parallel operation or not. The aspect of stateful and stateless operation lies in engaging the action in a parallel operation.

Parallel Streams

Parallel programming is undoubtedly complex and error prone for various reasons. But, if we want to leverage performance vis multicore processors, we simply cannot do away without it. The Java stream library addressed this issue and tried to simplify it to an extent so that programmers can write reliable parallel processing code with respect to the stream. In stream, what we do is request a parallel stream by methods provided by Collection or BaseStream. The parallel method supplied by Collection is called parallelStream(). This method returns the parallel stream associated with the collection if possible; otherwise, it resorts to sequential processing. The BaseStream supplies a method called parallel(). This method returns a parallel stream based on the sequential stream that invokes it. Note that the idea of parallelism is closely associated with the underlying system. If the system does not support it, parallelism is not possible.

The simplest way to achieve parallelism is to invoke one of two methods: parallel() of BaseStream or parallelStream() of Collection. Let's see how we can do it.

The following is a sequential code to add all the even length strings present in the list and print the value.

asteroids.stream()
.map(s->s.length())
.reduce((a,b)->{if(b%2==0) return a+b;
   else return a;})
.ifPresent(System.out::println);

The operation can be parallelized by simply substituting parallelStream() for the call to stream():

asteroids.parallelStream()
.map(s->s.length())
.reduce((a,b)->{if(b%2==0) return a+b;
   else return a;})
.ifPresent(System.out::println);

The result will be same, but now the addition can occur in multiple threads.

Generally, a multiple operation should be stateless; this means that they must not interfere with other operations in the expression. This associative nature ensures the reliability of the result obtained. Otherwise, the outcome of the parallel stream may not coincide with the outcome of the sequential stream applied on the same operation.

Conclusion

This is a glimpse of the stream API library of the java.util.stream package. There are many more features to explore in this library. Because it is a new feature introduced with Java 8, the API is likely to get many more enhancements. This library is definitely a worthwhile perusal for every serious programmer.

Share:
Home
Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved