Architecture & DesignWhat is Spring Batch? | An Overview of Batch Processing in Java

What is Spring Batch? | An Overview of Batch Processing in Java

Developer.com content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

The Spring Batch framework provides a simple, robust, and reliable way to write batch applications. Batch applications characterize the processing of large amounts of data, such as file indexing, financial transactions, statistical calculation, and so forth. Such applications require bulk execution of data and fit into a common batch pattern. This article provides a high-level overview of the Spring Batch framework and its implementation in Java.

Overview

Batch processing is the exact opposite of an application that requires user and other interventions while processing. It is, in fact, a processing of data seamlessly without any sort of interaction or interruption. Historically, batch processing was an improvement over single instruction processing. To put it simply, in batch system a bulk of similar data were compiled and given for processing without almost no interruption or intervention. Unlike non-batch applications, which may suffer from unpredictable resource utilization spikes, batch processing is quite predictable and easy to manage due to its predictable nature. At a glance, it may seem that batch system is pretty straightforward and does not pose any challenges in development. But, the truth is quite contrary. It has its own set of challenges and can be a real problem unless taken care of properly.

In a typical batch processing application, a system exports a pile of transactions as files while another system imports those data from the file and persists them into the database, as shown in Figure 1.

Typical batch processing
Figure 1: Typical batch processing

In an enterprise environment, a large volume of complex calculation takes place; it indexes a huge document base with cutting-edge algorithms every day. Batch applications are mindful of the hour and quite easily fit into the requirement that works on a real-time environment. Although there are other ways, such as message-based solutions, they are not mutually exclusive. They have their unique set of challenges and usages. Batch applications are distinct in their unique usages as well; they cannot be matched by other solutions, even in today’s complex enterprise environment.

Batch Challenges

In short, the challenges with batch application are maintainability, scalability, availability, and security.

  • Maintainability: In case a batch job fails, the point and the time of failure must be known so that it can be debugged quickly. This leverages maintainability.
  • Scalability: Batch processing must be scalable. The extent to which a batch processing job bulk may extend from ten to several thousands in a period of time. The application must be able to scale the magnitude of operation without considerable lag in the processing.
  • Availability: Interestingly, batch jobs are done not all the time, but scheduled at a certain time. Typically, enterprise batch jobs are piled up to be processed at a given point of time when hardware, data, and other resources are available. For example, in a banking system, transactions are finally scheduled to be logged when resources are more available.
  • Security: Finally, a batch process must be secure in the sense of leveraging data security. This involves data validation, encryption of sensitive data, secure access to external system, and so on.

The Spring Batch Framework

Writing a batch application is not very simple; there are many concerns that must be taken care of at the outset, other than simple compilation of a bulk job. This is the reason why the Spring Batch framework was created. The goal was to provide an open source, batch-oriented framework that addresses the issues that arise in developing a cutting-edge batch application. Spring Batch is a project done in collaboration with Accenture and Spring Source in the year 2007.

From a very high level, Spring Batch can be viewed as a three-tier configuration: as application, core, and infrastructure. These are shown in Figure 2.

Spring Batch's three-tier configuration
Figure 2: Spring Batch’s three-tier configuration

The three tiers are, as follows:

  • Application: The application tier compiles all the batch jobs and codes written by the developer, such as business logic, service codes, and configuration on how the jobs are structured. Note that, in practice, the application is not a distinct entity but a wrap of the core and infrastructure tiers as well, because, in most cases, the development includes custom infrastructure code such as readers and writers, along with core classes.
  • Core: The core contains the run-time classes to control and launch batch jobs. It includes core components such as Job and Step interfaces along with other interfaces, such as JobLauncher and JobParameters.
  • Infrastructure: The infrastructure contains readers, writers, and service templates that are required by the developer as well as the core framework. It handles the read, write, error processing functions from and to files, database, and the like.

The advantage of Spring Batch is that one can reap benefit from the best practices of the Spring Framework as well. Spring Framework houses many off-the-shelf components of popular technologies, such as JDBC, Hibernate, JPA, XML, iBATIS, and so forth. This leverages complex developing requirement.

A Quick Example

Spring Boot provides a ready support to write Spring a batch application with its starters. Here is an excerpt from a batch application; it will give you an idea about the usage of a batch application.

The pom.xml file contains following dependencies.

<dependencies>
   <dependency>
      <groupId>org.springframework.boot</groupId>
      <artifactId>spring-boot-starter-batch</artifactId>
   </dependency>
   <dependency>
      <groupId>org.springframework.boot</groupId>
      <artifactId>spring-boot-starter-jdbc></artifactId>
   </dependency>
   <dependency>
      <groupId>mysql</groupId>
      <artifactId>mysql-connector-java</artifactId>
      <scope>runtime</scope>
   </dependency>
   <dependency>
      <groupId>org.springframework.boot</groupId>
      <artifactId>spring-boot-starter-test</artifactId>
      <scope>test</scope>
   </dependency>
   <dependency>
      <groupId>org.springframework.batch</groupId>
      <artifactId>spring-batch-test</artifactId>
      <scope>test</scope>
   </dependency>
</dependencies>

The model class Address.java is as follows:

// ...import statements

@XmlRootElement (name = "address-details")
public class Address {
   @XmlAttribute(name = "address_id")
   private int id;
   @XmlElement(name = "street")
   private String street;
   @XmlElement(name = "city")
   private String city;
   @XmlElement(name = "province")
   private String province;
   @XmlElement(name = "zip-code")
   private String zipCode;

   // ...getters and setters

   @Override
   public String toString() {
      return "Address{"+
         "id="+ id+
         ", street='"+ street+ '''+
         ", city='"+ city+ '''+
         province='"+ province+ '''+
         ", zipCode='"+ zipCode+ '''+
         '}';
   }
}

The configuration file Batch.java may be written as:

// ...import statements

@Configuration
@EnableBatchProcessing
public class Batch {
   @Bean
   Job job(JobBuilderFactory jobBuilderFactory,
         StepBuilderFactory stepBuilderFactory,
         FlatFileToDatabase step1,
         DatabaseToFlatFile step2) throws Exception {
      Step s1 = stepBuilderFactory.get("file-db")
         .<Address, Address>chunk(100)
         .reader(step1.fileReader(null))
         .writer(step1.jdbcWriter(null))
         .build();
      Step s2 = stepBuilderFactory.get("db-file")
         .<Address, Address>chunk(100)
         .reader(step2.jdbcReader(null))
         .writer(step2.fileWriter(null))
         .build();
      return jobBuilderFactory.get("etl")
         .incrementer(newRunIdIncrementer())
         .start(s1)
         .next(s2)
         .build();
   }
}

The code to export data from flat file to database may be written as follows:

@Configuration
public class FlatFileToDatabase {
   @Bean
   public FlatFileItemReader<Address>
         fileReader(@Value("${input}") Resource resource)
         throws Exception {
      return new FlatFileItemReaderBuilder<Address>()
         .name("file-reader")
         .resource(resource)
         .targetType(Address.class)
         .linesToSkip(1)
         .delimited().delimiter(",")
         .names(new String[]{"id","street","city",
            "province","zipCode"})
         .build();
   }
   @Bean
   public JdbcBatchItemWriter<Address>
         jdbcWriter(DataSource dataSource) {
      return new JdbcBatchItemWriterBuilder<Address>()
         .dataSource(dataSource)
         .sql("INSERT INTO address(id, street, city, province,
                 zip_code)
               VALUES (:id, :street, :city, :province, :zip_code)")
         .beanMapped()
         .build();
   }
}

Meanwhile, from database to flat file may be written as:

@Configuration
public class DatabaseToFlatFile {
   @Bean
   public ItemReader<Address>
         jdbcReader(DataSource dataSource) {
      final JdbcCursorItemReader<Address> build =
            new JdbcCursorItemReaderBuilder<Address>()
         .dataSource(dataSource)
         .name("jdbc-reader")
         .sql("SELECT id, street, city, province,
               zip_code FROM address ORDER BY id")
         .rowMapper(new BeanPropertyRowMapper<Address>())
         .build();
      return build;
   }
   @Bean
   public ItemWriter<Address>
         fileWriter(@Value("${output}") Resource resource) {
      return new FlatFileItemWriterBuilder<Address>()
         .name("file-writer")
         .resource(resource)
         .lineAggregator(new
               DelimitedLineAggregator<Address>() {
            {
               setDelimiter(",");
               setFieldExtractor(new
                  BeanWrapperFieldExtractor<Address>());
            }
         })
         .build();
   }

   // ...

}

Conclusion

This article provided a glimpse into batch processing in general, its relevance to present day enterprise requirement, and a high-level overview of the Spring Batch framework with a rudimentary code snippet of how it is realized in Java code. Refer to the Spring Batch Reference for more details and stay tuned for more details in subsequent articles.

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Latest Posts

Related Stories