The full guide to Batch processing with Spring boot | Full guide

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

in a previous video I showed you how to upload a CSV using spring boot but what if you need to deal with large file that requires extensive data processing that's where batch processing comes into play batch processing is a method that allows us to efficiently process large volumes of data in CHS without the need for constant user interaction it's a game changer when it comes to handling tasks like data cleaning transformation and loading but why does it matter imagine having a massive data set that need to be imported processed and transformed a traditional approach might struggle leading to slow performance and potential system issues that's where spring badge steps in Spring batch is a powerful framework built on top of the familiar spring boot designed specifically for batch processing it provides a structured and efficient way to handle large scale data processing tasks with spring batch you can process data in chanks manage jobs executions handle errors gracefully and even restart failed jobs seamlessly it's a robust solution for those scenarios where uploading large files requires more than just a simple file upload in the upcoming video series we'll dive deep into spring batch I walk you through the concepts the code and real world examples and absolutely I will show you how you can optimize your batch processing to reduce the execution time if you're ready to level up your spring boot skills and Tackle large scale data processing hit the Subscribe button and let's embark on this journey together I would like also to invite you to visit my website and also connect with me on on social media I will leave you all the links Below in the description of this video so let's connect now with no further Ado let's get started first let's check this introduction about spring batch so many applications within the Enterprise domain require bu processing to perform business operations in Mission critical environments so this business's operation includes first we have automated complex processing of large volumes of information that is most efficiently processed without user interaction so these operations typically include time-based events such as month-end calculations notice or correspondence then periodic application of complex business rules processed repetitively across very large data sets for example insurance benefit determination ation or rate adjustments and finally integration of information that is received from internal and external systems that typically requires formatting validation and processing in a transactional manner into this system of record so batch processing is used to process billions of transactions every day for Enterprises now let's see what is spring batch so spring batch is a lightweight comprehensive batch framework designed to enable the development of robust batch applications that are vital to the daily operations of Enterprise systems spring batch builds upon the characteristics of the spring framework that people have come to expect productivity pojo Bates development approach and general ease of use while making it easy for developers to access and use more advanced Enterprise services when necessary spring batch is not a scheduling framework there are many good Enterprise schedulers such as gartz T control M and others available in both the commercial and open-source spaces so spring batch is intended to work in conjunction with a schedular rather than replacing the schul itself spring batch provides reusable functions that are essential in processing large volumes of records including logging and tracing transaction management job processing statistics job restart Skip and Resource Management it also provides more advanced Technical Services and feature that enable externally high volume and high performance batch jobs through optimization and partitioning techniques you can use spring batch in both simple use case such as reading a file into a database or running stored procedures and complex high volume use cases such as moving High volumes of data between databases transforming it and so and so forth high volume batch jobs use the framework in a highly scalable manner to process significant volumes of information so what is the background of spring batch so while open-source software projects and Associated communities have focused greater attention on web based and microservices based architecture framework there has been a notable lack of focus on reusable architecture Frameworks to accommodate Java based batch processing needs despite continued needs to handle such processing with Enterprise it environment so the lack of a standard reusable batch architecture has res resulted in the proliferation of many oneoff inh house solution developed within client Enterprise it functions so spring source which is now VMware and Accenture collaborated to change this Accenture Hands-On industry and Technical expertise in implementing batch architecture spring Source depth of technical experience and spring proven programming model Together made a natural and Powerful part partnership to create high quality Market relevant software aimed at filling an important Gap in Enterprise Java both companies they worked with a number of clients who were solving similar Problems by developing spring based batch architecture Solutions so this input provided some useful additional detail and real life constraints that help to ensure the solution can be applied to the real world problems exposed by clients accenter contributed previously proprietary batch processing architecture Frameworks to the spring batch project along with committer resource to drive support enhancement and the existing feature set so Accenture contribution was based upon Decades of experience in building batch architectures with the last several generation of platforms such as Cobalt on Main frames C+ Plus on Unix and now Java anywhere so the collaborative effort between Accenture and spring Source aimed to promote the standardization of software processing approach Frameworks and tools Enterprise users can consistently use when creating badge applications companies and governments agencies Desiring to deliver standard proven solution to their Enterprise it environment can benefit from spring batch the following diagram is a simplified version of the batch reference architecture that has been used for decades so it provides an overview of the components that make up the domain language of batch processing so this architecture framework is a blueprint that have been or that has been proven through Decades of implementation on the last several generation of platforms so spring batch provides provides a physical implementation of the layers component and Technical Services commonly found in the robust maintainable systems that are used to address the creation of simple to complex batch applications with infrastructure and extensions to address very complex processing needs now let's break down each component and each part of this diagram and explain them separately so in batch processing we have several components first let's start with the job luncher so the job luncher is simply an interface for launching a job with a given set of job parameters so we will see later on what are job parameters and how we can pass them and then each job launcher will launch a job the job it's a section that I will explain in just few moments in details and then each job might have or can have one or several several steps so here we can see step one and step n and each step step is composed of three main elements so the first thing is an item reader so the item reader is an abstraction that represents the retrieval of input for a step and one item at a time so when the item reader has exhausted the items it can provide it indicates this by returning null so when the item reader returns null means that we no longer have records or data to read so you can Define more details about the item reader interface and its various implementation in the readers and the writers so the item reader is able to read data from for example a database a file and so and so forth any storage system you want and then once we read an item we pass it to the item processor so the item processor is an abstraction that represents the business processing of an item so while the item reader reads one item and the item wrer writes one item the item processor provides an access point to transform or apply other business processing if while processing the item it is determined that the item is not valid returning null indicating that the item should not be written out you can find out more details about the item processor in the interface in readers and writers and then finally we have the item writer so the item writer is an an abstraction that represents the output of a step so one batch or Chun of items at a time generally an item writer has no knowledge of the input it should receive next and knows only the item that was passed it in its current invocation you can find more details about the item writer interface and its various implementation in the readers and the writers as well and all these three parts job lunch job and the steps they use and they communicate to the job repository so the job repository is the the persistence mechanism for all of the stereotypes mentioned earlier it provides crude operations for job launcher job and step implementations when a job is first launched a job execution is obtained from the repository also during the course of execution step execution and job execution EMP implementations are persisted by passing them to the job repository or to this repository when using Java configuration The annotation enabled batch processing annotation provides the job repository as one of the components that is automatically configured so this is the global overview of the batch architecture now let's move on and break down and see how the job works now let's break down the job so so a job is simply a central entity that encompasses an enti an entire batch process so configured through XML or Java base step referred to as job configuration it acts as the top level element in hierarchy organizing multiple step instances logically these steps are grouped within the job to form a cohesive flow allowing for Global configuration of properties such as restartability the job configuration includes the job's name the definition and ordering of its step instances and as specification of whether the job restarts or not then within a job we have a job instance a job instance in Spring batch represents a distinct run of a batch job for example consider an end of day job meant to run daily despite there being a single end of day job it each run is tracked as a separate job instance like the January 1st or January 2nd run even if a run fails and is Rerun the next day it retains its original identity such as the January 1st run so each job instance can have multiple executions but only one job instance associated with specific job parameters at every given time so the definition of a job instance doesn't impact data loading that's determined by the item reader implementation for instance in an end of day scenario the data may include a column indicating the effective date the January 1 run for instance loads data only from the first and the January 2nd runs uses data from the second this decision is often a business choice for the item reader to make however reusing the same job instance determines whether the state for example the execution context from prior executions is utilized starting a new job instance means being from the start while using an existing instance generally means resume from where you left off and then after the job instance we have a job execution so a job execution in Spring batch represents a single attempt to run a job which can result in success or failure the associated job instance is considered incomplete until the execution successfully completes for instance in the context of end of dat job if the job instance for January 1st fails on its initial run and is Rerun with the same job parameters a new job execution is generated but only one job instance persists so globally a job defines what a job is and how it is to be executed and a job instance is a purely organizational object to group executions together primarily to enable correct restart semantics a job execution however is the primary storage mechanism of what actually happened during a run and contains many more properties that must be controlled and persisted a step is a domain object that encapsulates an independent sequential phase of a batch job therefore every job is composed Ed entirely of one or more steps so a step contains all the information necessary to Define and control the actual batch processing so this is necessarily vag description because the contents of Any Given steps are the discretion of the developer writing a job so a step can be as simpler or as complex as a developer desires so a simple step might load data from a file file into database requiring little or no code depending upon the implementation used a more complex step may have complicated business rules that are applied as part of the processing as with a job a step has an individual step execution that correlates with a unique job execution so a step execution represents a single attempt to execute a step a new Step exec ution is created each time a step is run similar to the job execution however if a step fails to execute because the step before it fails no execution is persisted for it a step execution is created only when each step is actually started and step executions are represented by objects of the step execution class each execution contains a reference to its corresponding step and job execution and transaction related data such as commit or roll back counts and start and end time additionally each step execution contains an execution context which contains any data a developer needs to have persisted across batch runs such as statistics or state information needed to restart now let's translate all these the theory to a practice let's go to Spring initializer and let's create a new project so I will select M as uh project type or project manager and then let's use the most recent version of spring which is 32 Z and Java language of course here I'm going to change to this to com. alibu and then as an artifact I will call it batch so the description is a demo project for spring boot batch processing all right so here let's keep it jar and Java 17 now let's add few dependencies so the first thing I will need web I want to create a simple endpoint to start or invoke my job and then I will need spring batch of course so here when you search for batch you have spring batch and it's part of the IU and then we need of course postre SQL because we want to persist some data to our database and then we need of of course lumbo to reduce the boilerplate code all right so now our project is ready let's click on generate and open our project within our preferred IDE for the sake of this tutorial I will be using anj ultimate version so you can also use the community because it's totally enough for you you don't need to have the ultimate version all right so let's go ahead so here we have our project opened in our IDE so the first thing let's go and add some configuration so you know my style already so I want to use or I prefer using yaml configuration and the first thing that I want to configure for my project is the database connectivity I want my application to be able to connect to a database and to persist data there so I will provide you the configuration and I will explain them one by one and also if you are new with spring data jpa I would like to recommend and going back to my channel and checking the spring data jpa playlist all right so this is our configuration so we want to config configure our data source and here we have the URL and I want to use a database called file upload and then for the username and password you can use username and password or also you can change that to the corresponding one so I will make it ALU because my database is already configured like that and then you need to SP specify the driver well this one is optional so you don't need to provide it because spring will automatically detect the driver that we have in our pom.xml and it will use this driver but it is also fine to provide or to specify the driver here so also here this is my jpa configuration and I want to have a ddl oo of type create drop and show SQL fals and here I want to format SQL true in case I decide to show the SQL and now I precise the database which is post degree SQL you can use a different one if you want to and here just providing the dialect which is post degree SQL dialect all right so now our database is configured and spring will be able to connect to our database then when starting with spring batch we explained earlier that spring needs to Spring batch needs to create a schema or to create few tables to in order to store and keep track of the job executions and the step exe executions generally speaking so in order to do that we need to ask spring batch to what to do in order to initialize or to create the schema so here we have several options whether we can ask spring batch to always create the schema or we want to create our own and just tell him you just you can use the schema because automatically it will try to find the correct tables and use them so here I need to provide a property called batch and then we have another property called jdbc and after that we have a property called initialize schema and for this initialized schema we have always embedded or Never So embedded means that it's provided never so do nothing with the schema and always mean that we want spring badge to always initialize the schema in case it does not exist so for now I will keep it to always as a value and then there is one important property so spring batch is enabled by default as you can see here so spring batch is automatically enabled so execute all spring batch jobs in the context and this is something that we don't want to do we want to trigger the execution of the jobs ourselves so let's set this one to false all right next I just want to change the server port to 9090 for example and that's it now our project is configured the next step is we want to move on and prepare the connection to our database this step you can skip it if you want to all right so for people using the ultimate version you have this option right here database you can click on it and then you have this plus and you can add a new data source so let's search for for postgre SQL and now we need to provide the configuration or the properties in order to connect to postgre SQL so first let's keep everything as it is because it's already installed locally on the port uh 5432 and now I need to provide the username and password and then just click on test connection so now let's apply and in case you see this warning so this warning like database postre SQL has a collation version Miss match I will show you how to fix that so I'm going to click okay so in case you see that warning so here you see that we are able to connect to our post degree SQL and in case you see that warning all you need to do is to update or refresh the version so there is uh a query you need to execute which is Alter collation and then the collation name which is uh post degree SQL and then refresh version so if you run this command or this query right here it will automatically update the collation version or the version that you have and you will no longer see that warning all right so now let's go ahead and create a new database and let's call it as we mentioned in here file upload so file- upload and then let let's click okay and here I will just keep only the file upload schema and I want to display the public one so here we have our database ready and our application is correctly configured and now we are ready to start coding so let's move on so earlier we said that a batch processing application is composed of the following so we have a job execution and then it will launch or start a job and a job is composed of one or many steps and each step has three different items which is item reader item processor and item writer so now let's move on and do this but we will do it from the bottom to the top so this means we will start creating the the items the item uh reader processor and writer and then configuring the step next we will configure the job and then we will start or launch our job all right so first thing new class and I will call it or like make it in a package called config and I will call it batch config so in order to make this class as a configuration class we need to add the configuration annotation and I will also add the required ARX Constructor from lombok in order to inject or create my Constructor for me all right so we said that in within a step we have three things reader processor and writer so and since we want to read a file and upload the data to a database what we want to do is to create an item reader that will allow us or will help us read data from a file all right so let's go ahead and do that so for this we will need to create a bean and this Bean we have an object or an implementation called flat file item reader and this we need to specify what is the object so for now I will keep it as an inter interrogation Mark and I will create the object in just a few seconds so I will call this one item reader for example and for that we need to return an item reader but before doing that let's go ahead and see what is this class so always click on download source so you can see and read everything all right so here you see that this FL uh flat item reader already extends this class and implements this interface and then also it this one it extends item stream reader and this one extends item reader so as you can see this implementation is already an implementation of the item reader and if you click here you will see all the available implementations that you might need so here for example we have paging item reader cursor item reading reader and so and so forth and the one we'll be using is this flat item reader because we want to read a CSV file that we will upload to our database all right but before going ahead and starting implementing this item reader here we need to specify an object what is the type of the data that we want to read so here we can we can leave it as an interrogation Mark and we need to deal with the data the typing uh casting and so on so forth or we can simply create a class and use this class as a type so I will create here a class student and I will move it to a package called student so here as you can see I have my new package student and inside this student I have the student class so let's go ahead and for this student class let's give it some attributes and first we want to have a private integer ID so this will be also our entity that we want to persist to our database and then we will have private string first name and then we will have also last name and for example a private integer or int H all right so these are the attributes now let's add some annotations we need another the getter annotation from lombok we need also the setter annotation and we need the entity annotation all right so the entity is not available because we did not add the spring data jpa dependency so let's do it manually now let's go to our pom.xml and here let's add a new dependence see and here let's type spring boot starter data jpa and now all you need to do is to refresh the project and this and this dependency will be available in our class path so now we can import The annotation entity all right of course within an entity we need to provide an ID and this one I wanted also to be a generated value so now I have my student object let's go back and continue our batch configur configuration all right now let's create an object of type flat flat file item reader of type student and then let's call it item reader equals new flat item reader and then we need for our item reader we need to set the resource so from which resource will be reading the file so here in this case we will be using a new file system resource and now we can provide the path to our file so generally when we talk about batch processing maybe you will mention or you want to mention that you want to upload a file from your API or from from your front end and then process it so batch processing is mainly when we want to have some scheduled jobs so for example we have a bench of files or a list of files and we want to process all the data or all the items for example every day at midnight all right so what you can do is to upload the file and then place it or send it to some location and here the file or this item reader can read the file from that specific location so but for the sake of this course let's give it this path so I will I want to place my file here under SRC Main and then resources and then I will create here a file so let's do it now and let's call it for example students or student. CSV all right now I can copy the path of this one and paste it in here all right so this is the path of the file that I want to read after that let's give our item reader a name so we can set a name in here and now let's call it CSV reader for example all right so after giving it a reader so here within a CSV file so we will have here a header so for example it will be ID first name last name and age and now when reading the file we can decide how many lines we want to skip so we can tell the item reader do skip lines or set lines to skip and we can say we want to skip only one line all right so this means we want to we don't want to read from the header but just from the next line and then for this line we need to set a mapper so here we need to set a line mapper and this line mapper it will be a method that will be we will create in just few second so here I will say line mapper as a method and I will create it just right now then all I need to do now is to return my item reader so this is the item reader and it's ready now now all we need to do is to create our our line mapper all right so for the line mapper I will create a private method here so it will be of type line mapper and this line mapper will take and return a student okay so let's call it line mapper and then what we need to do we have an object called default line mapper of type student and let's call it line mapper equals a new default line mapper then we need to set our the LI The Limited line tokenizer so it's this object here and it's from the ORS spring framework batch item file transform so here let's call it line tokenizer equals new The Limited line to tokenizer and this for line tokenizer so here let's set the delimiter so what are the the limiters of my CSV file so my data for example are separated or are comma separated so for example let's say that my student file or my student CSV looks exactly like this so first I will have the header I will have ID and then first name and then I have last name and then I have age so for example here I have the ID one and then I have John and I have do as a last name and let's say 25 so the delimiter we are setting is this one how we can separate the different columns so then I will set another property I will tell it if if it's a strict or not so it will be no so I will set it to false and then I want also to set my line tokenizer do set names so so what are the names of my elements or of my different columns so here I have the ID and then we have our first name and then last name as we mentioned in the CSV file exactly so and then finally we have H so after that we need another object of being wrapper file set mapper uh sorry field set mapper and this one is of type student so let's call it field set mapper equals new Bean wrapper field set mapper so I will explain what is this exactly so this wrapper field set mapper is an object that will allow us and help us to transform each line that we will read from the CSV file to a student object all right so here let's say field mapper do set Target type and we need to provide a class so it is our student. class all right and finally let's give our line M line mapper do set line tokenizer our line tokenizer will just implemented and finally we say our line mapper dot set field mapper which is our field mapper we also the object we just created and finally let's return our line mapper so this is the line mapper method that will transform or that will map each line line of our file here to a student object so this is the main object or the main goal of this line mapper all right so now we are done with the item reader let's move on and Implement our item processor for the item processor we I want I want to show you how to have your own implementation so for that I will create a class in here and I will call it student processor and this class in order to make it an item processor we need to implement one interface so here I will say my student processor implements an item processor as we explained before and as you can see here an item processor takes an input and an output so what is the input so our input is of type student and the output for now just for the sake of this example it's also of type student but in a real word project for example you will receive receive some data and you want to map it and save it to a different format or a different table for example receiving receiving a list of persons containing a banch of information and then you want to map that one to a student subjects and so and so forth so but for now you just want to make it and keep it simple in order to be able to Simply digest this one or this explanation so that's why we will be we will have the input student and also the output will be the student itself so here as you can see we have a method called process and all the business logic goes here here in order to transform your data or to process your data so let's rename this one to a student and here we will simply return our student object so let's keep our student processor simple now let's go back to our configuration and the same way as we defined our it item reader we want to define a bean of type item processor so our item processor is our student processor so student processor and let's call it processor and all we need to do is to return a new student processor so that's it with the processor now we need to configure or to create a bean for our next element or next phase of the step which is the writer so the writer is the bean or the class or the implementation you call it whatever you want that will write the data to some system whether it it can be a database it can be a file it can be for example a a bucket in the cloud and so and so forth so all we need to do is to provide our processor okay so in order to do that let's first create a bean and then let's create a bean of type public repository item writer repository item writer and then what we want to write we want to write an object of type student which is our entity as well so let's call it write and then what we want to do here we need to provide an object of type repository item writer again of type student I will call it writer and then equals new repository item writer so now I need to provide few information so for this writer I need to set the Repository so what is the repository and as you can see here it's a crude repository of type student so for now all I need to do is to inject private student repository and I will call it repository or we can also call it student repository so for now we don't have this repository let's go ahead and generate it so I will create an interface and I will place it in the student package now this interface in order to make it jpa interface all I need to do is to do extend jpa repository of type student and then my ID is of type integer already now if I go back here I have my repository and I need the final keyword in order to ask lombok to generate a Constructor with this parameter all right so now let's go back let's set our repository and then within the writer what is the method that we want to execute so set method name and here we want to execute the save method of our repository so just remember when you have repository dot you have also this save method and then let's return our writer all right after implementing the reader processor and writer now we need to define the step because I mentioned earlier that we will start from the bottom to the top so after preparing all these three elements now we need to Define find find our step so I will create a public step so here as you can see we have an interface called step and it comes from the batch package all right so here I will call it for example import step or you can call it or give it another name and in order to make this one a bean of course we need the bean annotation in order to create and Define a step let's do return and then we have an object called Step Builder and within this step Builder we need to provide two information first the step name so let's call it CSV import so you can give it any name you want and then let's see the Constructor in here so let's download the source and now you see that for this step Builder it takes a name and then a job repository so for the job repository we need to inject an object and as explained before spring already provides us like the spring batch context already provides us with an object of type job repository so here I will create private final and then job repository and I will call it job repository so let's use this one right here so this is my job repository after that we need to Define few other properties so here then we have something called Chun so the Chun means how many records or how many lines we want to process at a time so now let's say for example we want it to be 10 and also as you can see we need an object of type platform transaction manager so let's go ahead and inject an object like that so I will do private final and then platform transaction manager and I will call it platform transaction manager so here let's provide it as a second parameter and this ch chank is a generic type so we need to provide what are the types so here we want student the input and the output is also a student all right exactly as we did in the processor so now within the step we need to define the reader what is the reader which is this one or this method that we implemented in here so our reader is the item reader be and then we have the processor and our processor is the processor or the bean of type processor we created before and now we have the writer which is our writer method or writer Bean so let's rename this one to make it writer instead of write and then build so in this way we can define a step so after the step comes the job the job itself so we need now to create another Bean so Bean right here and then public and then we have an interface called job so this one for example let's call it run job so this is the job that we want to execute and within a job of course we can Define multiple steps so let's return a new job Builder so not job step Builder but just job Builder and then within this job Builder we need to to give it a name so for example let's call it import students and then of course we need again our job repository because if if you remember from the diagram all these steps like the step and the job all of them they need to use the job repository and then we have a method called start so the start here it takes two types of parameters so whether a flow or step in our case we want to execute One Step all right so this one we need to give it or to pass our import step B and then we have another method in or in case for example you want to pass the second and the third and so and so forth like the steps you defined you have a next method where you can pass the second step to be executed all right but in our case we have only one and then let's just say build so in this way we have our job we have all the configuration of our batch processor let's move on and create an endpoint in order to import or to upload our CSV file and persist it in the database now let's move on and create a controller which will be our entry point in order to be able to execute this job so I will close everything in here and inside the student I will create for example a student controller so as I mentioned before creating an endpoint or a controller is not mandatory because we want to have some scheduled tasks so this is how the real world projects work so we want to have a scheduled TX task that will execute automatically the job but for the sake of this tutorial I want to show you how to do that so here I will just create a rest controller and I will give it a uh request mapping so here I will call it simply SL students or you can do SL API students and of course I need my required ARS Constructor from lombok and now all I need to do I need to inject an object of type job luncher so this job luncher interface is the one that we spoke about previously so let's call it job luncher and of course we need also the job that we want to execute again I will make it private final and then job and I will call it job so now I will simply create a post mapping so this because I just want to invoke or to to call this endpoint and for that one I have here so public void import csv2 DB for example and then I we call it job so the method the method is called import CSV to DB job and then what I need to do first I need to create job parameters so this is what we spoke about before for a job we need to pass some job parameters so here I will create an object of type job parameters and then we call it job parameters equals new job parameters Builder and then we have an add long method so this one we have a key and a long parameter so we want to set when this uh task or when this job should start so here let's say start at and then we want this job to start immediately so let's take the system. current milliseconds so when we call this method or we we call this Endo I want my job to be executed immediately and then I have a method to job parameters that I need to call now what I need to do next I need to call my job luncher and then run and I want to run a job so I want to run my job with my job parameters all right so this one needs to be surrounded with tri and catch or we need to add the throws in or to the method so let's add exeptions method to Signature or here we can select surround with try and catch and as you can see here it will catch many exceptions so let's group them together also we can group this one and we can group this one so here we can just throw a new runtime exception or we simply for the sake of this one just print stack Trace but in a real word what we want to do is to log these issues all right so now our rest and end point is ready let's move on and start testing and also we will be improving so this is not the end of this tutorial we will be improving so we will see what are the issues what are the anomalies that we will be detecting while importing a CSV file and mainly while importing a huge CSC file with an important number of lines now we are at the phase of testing what we already implemented so for that I asked chat GPT I gave it this structure and I asked it to generate 100 student respecting first name last name and age and generating some random information so I will paste the data right here so this is what we can see we have these students right here now all I need to do is to go ahead and run the application so let me make it full screen and now we see that the application is up and running and before that I want to go to my data datase and refresh it and see how tables are created so first we see here that we have seven tables instead of one even we created only one entity in our application but if we go here we see that we have six tables related to Spring batch so we have the batch job execution context content context peram instance execution and execution context so maybe let me open the diagram for you and I'm going to make it full screen so here we see that we have the following structure and here we have the batch job instance has job execution and the job execution has a batch execution context and parameters as well as batch step execution and the batch step execution has a context so you can have a closer look at these different tables now what we want to do is to to run our application so we need to invoke our endpoint and for that I will simply go ahead and create an HTTP file that will simply make or perform a post mapping or a post call to my rest API so here I will just right click and create an HTTP request file and I will call it just demo so for that all I need to do is to do post and then Local Host and my port is 9090 and then slash students all right so this is the end point we just created now when I run this one I expect to have 100 students in my table so for now my table is empty and when I run this one I expect that we import 100 student so also I will clear the console here so we will see the execution time of this so now if I run it and then I go back here we see see here that the job has been executed or the step CSV import executed in 189 millisecs so this this is super super fast all right so now if I again refresh my table I see that I have the 100 student already imported in here so let me maybe organize this one so you see here that we have these students all right so next what we want to do we want to multiply the number of users that we already created so I will go back here and this one I will multip multiply it so many times just to have for example 100,000 students all right so here as you can see we have 100,000 students so you see the number of the lines right here so we don't care about the IDS be because it is already autogenerated and it will be autogenerated by spring so let me restart the application because we need to restart the application in order to load the new content of the file so the application is up and running let me clean this and let me also double check that the student is empty all right so now if I run again this end point let's see what will happen so first you see that it's still spinning so it's running and this is going to take some time maybe seconds maybe minutes depending on the machine where you are running this import but let's wait and see how long it will take to import 1,000 student sorry 100,000 student so here it's done and if I go back and check the output so it took 20 seconds to import 100,000 students and now if I check again we see here that we have so many students all right all right so here we see that we only have 100 students because we are reading the ID from the CSV file and hibernate will automatically update so let's do some changes and retest again so now we can do it in two different ways so whether we can remove the ID column from the CSV file which will be a long thing to do but the easiest step is to go to the student processor and what I want to do here I want to say student. set ID and I will set the ID to null so because when we pass a null ID value hibernate will automatically persist or create a new entry in our database so in this way you see here that even we are processing or are doing some logic in here so I will restart the application and then let's try to import again so let me clean this one also let's make sure that the list of student is empty again and now let's run the end point and wait for around 20 seconds to finish processing all this data and then I will absolutely show you how to improve the performance of your spring batch processing because we need to reduce this 20 second because imagine importing a file or uploading a file containing millions of Records so 20 seconds for 100,000 is not acceptable all right so it's about to be done and even I can show the the log right here once it's done it will log the execution time so now maybe it will take more than 20 seconds also as you can see since we have a longer process and we have something different and we are persisting always it's not just an update it's a creation or insertion into the database so the execution is taking a bit longer so we need to improve that absolutely so here as you can see every time I refresh the data we see that we have new data is getting persisted and here so for example it's 91,000 and if I move on and refresh again so so it's still persisting the data so it's taking some time and we need absolutely to improve that and there are so many ways of improvements in order to improve the processing all right so importing 100,000 students like simple data nothing is complicated took 6 Minute 35 seconds so now let's see how we can improve that so now in order to improve the import and the bat processing what we can do in our batch configuration and for this step we can define something or an object called task executor so task executor will help us and allow us to Define how many parallel threads we want to execute for our step so first let's define a bean of type task executor so here I will type public and then task executor and I will call it task EX executor and then what I will be using so I will use an object of type simple async task executor and I will call it async task executor new async task executor after that what we can say so async task executor set concurrency limit so how many threads we want to run and then let's return our async task executor and let's understand what is this so if we go to to the official documentation so this sets the maximum number of parallel task executions allowed so the default is minus1 indicates no concurrency limit at all so when we imported our CSV file the first time it was using only one task and just to make sure here this the value of these concurrent limits as you can see here it's set to unbounded concurrency and this one equals already minus one one so we have no parallel threads running and now after defining this task executor so here in on the step level we can Define task executor and we can pass our task executor bin also something important you need to be careful about this you need also to take care about the resources and the power of your machine where you are executing this batch processing so now let's restart again our application and let's run the import again so the application is up and running and now I will open my demo file or my HTTP file and I will just run this one again so let's see how much this will improve the import and how long it will take all right so the file has been totally processed now let's check how long this one took all right so here we see that it took 30 second 36 seconds instead of 6 minutes 35 seconds so you see now the difference by just implementing or by just allowing or having 10 threads running you can increase this one to 30 for example or a different number and check the difference now I want also to show you some other way of improving and this also might improve a little bit our processing so let me make this one full screen and as you can see here we have this Chun so the Chunk we have 10 records per thread we can for example improve or like we can make this one 1,000 and rerun the application again and see but first let's double check that we have all the 1,000 100,000 students in our database as you can see here we have all of them so now by also increasing the chank size let's restart the application and see if this is going to also improve the batch processing of our file so here the application is up and running and I will again start importing and see how long this is going to take all right so the import is done now let's go and check how long it took so now we see now it took only 29 seconds so we also improved it by 6 seconds and believe me 6 seconds in a production or in a highly scalable application and on a high demand application this really counts all right so we saw how we can improve our batch processing also we can improve it using partitions and this is something we will see next

Info

Channel: Bouali Ali

Views: 5,849

Rating: undefined out of 5

Keywords: spring boot, spring batch, batch processing

Id: X48mMxEMYps

Channel Id: undefined

Length: 60min 36sec (3636 seconds)

Published: Mon Feb 12 2024