IQ | Spring Batch for Beginners | Process Million of Record Faster Using Spring Batch | JavaTechie

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi everyone welcome to java techy in this tutorial we'll understand what is spring batch and its architecture also we are going to develop an application using spring batch who will process a huge volume of data in a fraction of time okay all right before we jump into implementation let's have a quick look into spring batch and its architectural flow basically spring batch is one of the core module of spring framework and using this spring batch you can create robust batch processing system now you might ask what is bias processing right bias processing is a technique which process data in a large group instead of a single element of data where you can process a high volume of data with minimal human interaction now let's understand what is the exact use case or when do we need this batch processing system so insert whenever you want to transfer huge number of data from source to destination then that time you must need to use this spring batch concept for example let's say you want to design one billing analysis system so you have the billing information with you as a csv format and you want to dump that csv to database okay so here source is your csv file and destination will be your database now let's assume batch processing technique is not there then you need to insert each and every row of that csv file to database by writing insert statement which is really a painful job isn't it so for this kind of scenario it's good to use bias processing so that your job will be faster and you can save your time similarly you can think of another use case that is report generation let's say every day you want to export either csb or excel report by fetching data from database so here database will be your source and file or csv file will be your destination right so this can also quickly be done if you are using any batch processing technique there can be n number of use case but for this example we'll demonstrate the first scenario where we are going to upload a large csv file to database within a second using task executed framework okay that's fine now let's try to understand spring batch core component and their flow of execution so the first key component in spring based architecture is job launcher so this job launcher is an interface this used to launch a spring-based jobs you can also say this as an entry point to initiate any job in spring batch it has a method called ron who will trigger the job object or job component you can say so once job launcher call the ron method immediately it will create another component that is job a job can be defined as the work to be executed using spring base this work might involve a simple or complex task once job launcher launches a job immediately it will call another component that is called job repository okay so this job repository helps to maintain state of job whether it is succeed or failed suppose a spring-based job was running and an error occurs how does spring bears know that an error has occurred and the job needs to be rerun right so we need to save state of the jobs and further execution should take this into consideration state management is an important aspect when processing large volume of data this is achieved using this spring batch job repository okay now next this job component will talk to another component of spring base that is state step is nothing combination of other three component like item reader item processor and item writer okay where item reader will read the data from source so in our scenario source will be our csv file okay similarly item processor will process the data if you want to do any operation in between reading and writing then you can use this item processor similarly item writer will help you to write the data to the destination in our case destination will be database right because you want to read the csb and you want to dump that csb to the database i believe this is all clear for you a job can also have a multiple steps and as usual steps can have multiple item reader processor and writer this is what all architecture of spring bias or you can get context about all the core component of spring baths okay now let's quickly create a new spring boot project to demonstrate this scenario so without any further delay let's get started [Music] so create a new project click on file click on new project then click next then give the group ideas com.javateki then i will specify the artifact ideas batch processing demo project name i will give the same then i will change the jdk version to jdk8 then i will give the package name as com.javateki dot spring dot batch okay now click next let me add all the required dependency we are using the latest version of spring boot 2.6.7 so i will add the lombok dependency then i will add the web dependency then i will also add the spring batch related dependency spring batch then since i just want to input my csv data to database i need this mysql and also i want jpa dependency rate jpa okay i believe that's fine now click on next click on finish it will take few seconds it may download all the lattice dependency so here application imported successfully it downloaded all the latest version of jar now as per our requirement we just need to i'll show you the excel file or the csv file which you want to dump into our database using this spring batch so if i'll open this this is where the csv file with the field id first name last name email gender contact number country and date of birth okay i have 1000 row over here this csv file i just want to save this csv file to the dv with fraction of second using the task executor and spring base framework so for this csv file i need to create a entity class because i need to save this to my database right so i will go to the src i will go to the main then i will create couple of package here create new package i will give the package name called entity then i will give the package name called config then i will create another package called repository then create a package called controller okay fine so to map this csb information to the object i need to create an object which will store in my dv right so for that i need to create a entity class so i'll go to this entity package and i'll create a class called customer fine inside this customer i need to add few field but before that let me annotate here this is my table so i'll specify entity and i will specify the other table annotation i'll give the name of the table you can give something like customer info or something like that okay then i need to add few field id first name last name gender contact number country and dob whatever the input or the column is available in my csv file all the information i want to save to the dv so i define all the header as a variable or the column in my table okay let me input this this is my primary key so i'll just add it okay now next i need to create a dio class or repository class right so i will just create a java class it will be interface so i will name it customer okay then i will extend it from jpe repo then give the entity here customer then primary key data above your primary key which is the integer right now i created my entity and i created my repository now i need to make the connection from my application to the database right so for that in my application.properties i need to add all the data source related properties so add this data source driver class url username password then so sql hibernate ddl auto and the server port under dialect and here spring batch initialize schema i specify always here and i just want to disable job run at startup okay on application startup i just i don't want to run the bad job i want to run the badge of whenever i will trigger from my controller that is why i just make it false that's fine now now next i just need to add that customer.csv file inside this resource folder you can keep it outside but let me copy it to the resource folder so i'll go to my desktop then i'll just copy that csv file let me copy this and i will paste it inside this resource folder fine so you can see here id first name and all the thousand row is there in my csv file you can create more a number of row in that csv file to test your api but with thousand row i can show you how it will be done within a second it don't take much more than second okay so we have the entity we have the repo and we added the csv file which you want to map to this particular entity and want to store in the db right now the next step we need to create the spring batch component or spring batch config so if you can remember as per the architecture we just need to create item reader who will read the data from my csv file then also we need to write item processor we'll process the data in between reading and writing and we need to create item writer component who will write the data to database okay and all three component once we create we need to give these three component to the step then once we created the step that step we need to give to the job object so these are the component we need to configure in our spring batch code so i will just create a class here new java class i will name it spring batch config then i will just annotate here at the rate configuration and also you need to enable the batch processing so there is annotation you can use enable batch processing in the spring framework this will tell to the springboard for this particular application user want to enable the batch processing okay now what i'll do i will inject two factory class for job and step so if you can remember the flow diagram we have the step and job right so to create the step there is a step builder factory to create the job there is a job builder factory so i just need to inject that that two interfaces here so i'll just add private job builder factory then private step builder factory and here in the item writer i want to save the data to my database right so i will just inject the repository here customer repo that's fine you can annotate either at auto ad but since i have these three so i will just use the constructor here i will just use all argument constructor if you have more than one constructor then you might define at the right auto here since i don't want other constructor apart from these three attribute i can just define all argument constructor then spring batch this particular bin will inject these three bin okay then next as per the flow diagram we just need to create reader processor and writer object right so first i will create the reader object so let me zoom this what i will do here i will just create a reader bin which will be flat file item reader or something like that yeah flat file item reader here i'll provide the generic as a customer fine then you can give the name as a reader or something like that then i'll just define this address b since you want to read from the csv file so there is a class given by spring bias that is flat file item reader okay so you can simply use this class flat file item reader to read the information from your source okay so i'll just create object of it flat file item reader item reader equal to new flat file item reader you can pass the generic here let me pass it here customer that's fine right i can specify this side also now in this item reader i need to tell where is my file located so i will just give item reader dot set resource i can give this a new file system resource then i will give the path of my file so my file present inside src src then resource src main resource right src main resources then the file name file name is customers.csb now also in this item reader i need to set the name you can give any name here item reader dot set name i'll just give something like csv reader any name you can give here then you need to tell to the item reader while reading my csv file just ignore the or just keep the first line because that is the header that information i don't want to store to save to my database right i just want the second line onwards so we can tell to skip the first line item reader dot set lines to skip is one fine now next also you need to define a line mapper will understand what is the line mapper let me create a method first line mapper then i will just create a method fine then next just create the object of this line mapper so there is a class called default line mapper line mapper yeah i'll just create the object of it you can also define the type generic i will give customer line mapper equal to new default line mapper fine now in this line mapper see the csv file i'll show you here is comma separated value right we need to tell in this line mapper this is what the delimiter we are using as a comma you just extract it and map to the object which is my customer object that is what we just need to define inside this line mapper how to read the csv file and how to map the data from the csv file to the customer object that is what the job of this line mapper method so i will just define the delimiter tokenizer just create object of it then here just set the delimiter which we are using the comma right set delimiter is the comma then you can set the strict as well delimiter dot set strict equal to false fine now we just need to tell what are the header you just do the comma separated and map to the object so for that i just need to provide delimit delimited line tokenizer dot set names okay so these are the names i have which is the header id first name last name email gender contact number contrary and date of birth so this is the simple guys i'll just change it to the line tokenizer that will be more meaningful just paste it okay so this line token is a tokenizer will read the csv file with the comma separated value and these are the header we set here now the next step we need to map this particular information to the object right so there is a class in spring then something like bin wrapper field mapper or something like this build wrapper bin wrapper field set mapper okay so i will give the generic this will map the csv file to the customer object i'll just give field set mapper equal to new fine then just specify the target set target is nothing customer class customer class that's why so we have the line tokenizer we have the fieldset mapper line tokenizer will extract the value from the csv file fieldsetmapper will map that value to the target class which is customer now both the object you need to provide to the line mapper line mapper dot set line tokenizer then line mapper dot set field set mapper which is field set mapper object then finally i will just return this line mapper that's it fine so we created reader object now then as for the flow diagram next object is item processor so if you want item processor you can create it so let me create a item processor class where there is a compilation error let me check okay let me check okay i just need to return this return item reader now next component i need to create the item processor so i will just create new java class customer processor or something like that then i just need to implements this class from item processor if you observe the item processor came from the bias dot item package okay and there will be two argument which will be generic so i just need to provide the read the input object as a customer and write it as a customer inbound and outbound so i just need to implement the method see here the argument is the type of customer and the return type is customer as of now i am not going to write any logic in the processor i will just return the same object okay so later while explaining i will tell you the purpose of this item processor or this customer processor how you can filter out the information while processing the data or while reading and writing i will tell you in a moment now i just need to configure this customer processor object in my batch config class so i will go here and i will just write public customer processor processor return new customer processor then i will just define here at the red bin fine now we added reader and processor next we need to create the component of item writer so there is a also class called repository item writer if you are using spring data gpa so i will just use that class given by spring framework repo jitori item writer something like that okay let me check the class name repository item writer yeah and here also you need to define the generic which is customer and you can name it writer then just define at the red bin fine so here you can create the object of this repository item writer let me create the object of it writer equal to new fine so you can remove this then in this writer object you can set your repository which is set repository so you can if you remember we inject the repository here right i can set this customer repository so here i am simply saying in this item writer whatever the value you get from the reader just use my repository which is the customer repository i will better name it properly so that it will be more meaningful this is what my set repository just call the method of this customer repository what is the set method name or something like that yeah so the method name will be safe okay so we are just telling in this writer just use my customer repository dot save method to write the information or the csv data to the database then simply i will just return it writer there is couple of classes guys you can use this repository item writer jdvc batch writer there is n number of class given by spring batch so if you go to this interface item writer this also implements from item writer ok so there is multiple classes you can just go to the spring based documentation you can know it that's fine we created reader processor and writer in reader we tell to the spring batch read the file from the source in processor we don't have any logic as of now and in writer in writer we just specify write the csv information to the db which is the destination so we have completed these three steps item reader processor and writer now the next step we need to create the object of this step and we need to give these three component to this step so let me create the object of step you can create here public then you can give the name step one let me add the input statement import class it should be came from or g string framework dot bash dot i'll just annotate your order at bin since we have step builder factory with us which we injected we can create the step object using the step builder factory okay so i can simply write here return step builder factory dot get you can give any name as your step name i will just give birth or csv step fine and you can define the generic customer customer and you want to process the data in a chunk right so you can define the chunk size as a 10 process 10 record at a time that is how we can define the chunk now next you can provide who is your reader and writer and processor so reader i have created the bean of it then the processor i also created a bin of it then the next dot writer we also created bean of this writer object right now the next you can simply build it this is clear right as for the flow whatever you understand we just provided reader processor and writer object to this step and that is what i am doing here i created the step object and i am giving reader processor and writer to it then finally i am building that step object now next step you need to give that step object to this job object right so i will just create another object of or another bin of job so just create it public job you can give any name i'll just give import or i'll just give it job i'll just import this this should come from batch.core you can give a meaningful name round job or something like that here since we have the the way we have step builder factory we injected similarly we have the job builder factory i can use this job builder factory to create the object of job so i'll just use written job builder factory dot get of job name i will give the job name as import customers info something like that fine then here i just need to give the step step one that is what the bna created here right you can give n number of step as i defined in this flow diagram a job can have multiple step you can have if i type here let me show you a flow w you can give the step let us say step one similarly you can create another step object dot flow dot you can see there is another called next step if you have other step object you can give that step object here since i have only one step so i don't have the step chaining here i can remove this i have the single flow so i just want to execute that then next you can simply end and build that flow okay so i believe the flow diagram whatever you understand is clear for you create three component give it to step then create the step object give it to the job now this job object we need to give to the job launcher in our controller so that we can trigger the job ourself by hitting the endpoint so we have the configuration ready here now let me go to the controller class i will create a class called let us say customer controller or you can name it job controller job controller controller fine then you can define a root url not this request mapping job or jobs something like that now here i'll write a method who will trigger the job so for that i just need a object of job launcher job launcher also you need the objective job right job fine let me input this you can again okay better let me do that word if you don't want to auto it if you have a single constructor you can add these two attribute as a argument you don't need to specify the author but for now let me add it or to it then i will just write an end point just right public board you can give something like start or start i can name some i can give some meaningful name import csv to db or something like that job you can annotate it here i'll just give it post mapping fine now here i just need a object of job parameter because to trigger a job using job launcher dot ron i need to pass this job object as well as the job parameter so i'll just use job parameter okay job parameters equal to there is a class called new job parameter builders or something like that yeah fine then here dot r long give the key i will give the timestamp as a key start at parameter as a timestamp system dot current time millisecond fine then i'll just give two job parameters that's fine now i just need to use the job launcher the second argument job launcher dot run method to trigger the job now here i just need to give two argument job and job parameters so there is a it will throw the exception you need to handle it using try catch so more action surround with tricast you can see there is multiple exception i can keep in a single cache block right pull ups cash block add in a single cash statement that is fine now we have the launcher which will trigger the job so i just need to define endpoint here as well jobs import customers this is where my url so fine so we have the config let me cross verify string batch config we created reader processor and writer step and job object then we have the processor we have the entity repo job controller that's fine now let me run our application so let me go to the main class i will just run our application okay it seems there is some error in job controller require a bin of type this job go to the job controller and we already inject the bin of object of this job now we'll verify in spring batch config we created object of step but this job we didn't create the bin of it right so we missed 200 at that bin now i believe we are good let me start this application it will take few seconds spring boot yeah so if you observe it started on port 919 right now if you go and check in your database let me go to this java techie this is where the schema which i use and if you see there is couple of more table added by spring batch badge of execution badge of execution context badge of execution params sequence instance then about batch step by step execution context and its sequence and also you have the customer info table right so let me show you select start from this what badge of execution params okay but i just want to show you the badge of execution table so let me remove this i'll just run this you can see there is no job running it right we didn't stop our badge of yet so there is no job instant side there is no create time exit code exists messages nothing is there so similarly i'll check my step execution all are empty right then badge of instance everything is empty then i will check from my customer info table okay what is the table name let me go to my entity customers info right i will go here and i will just change here there is no entry now we are going to run our bad job through using our endpoint okay this is what the endpoint jobs and import customers so i'll just open my postman then i'll just copy this url this is the post request i can directly trigger trigger it from my postman right let me clear the console go to the postman 9191 jobs this is what the endpoint i'll just send the request okay there is some error i believe we missed to add the setter getter in our entity yeah that's what the problem just add data at the rate all argument constructor or the rate no argument constructor fine that is why it is not able to get the getter and setter method invalid setter method you can see the error right let me restart my application application started on port 9192 now go to the postman let me clear the console before that go to the postman trigger the request you can see here there is a insert statement query still going on because it will process 1000 row from the csv file and finally this is where the step name csv step executed in 6 second 920 millisecond and this is what the job name import customers completed following this is what they start at system.current time millisecond and the status is completed it took around six second triple nine four millisecond to complete or to process 1000 row from csb to database right now if you go and check in your db i will just select star from customer info you can see here it added the record sequentially because if you observe the id is mapped 1 2 3 4 5 6 10 up to there is no gap or there is no suffered record right it is on sequence so if that is the case if i will process thousand or one lakh record then it might take five to ten minute or more than that we don't we never know right then there is no sense to use these spring baths now you might ask me what is the advantages if i use the spring bear same i can do manually one after another now that is automated using spring bias but still it is taking time so by default spring batch is synchronous it is not asynchronous okay so you need to tell to the spring batch execute the row from the source to destination concurrently so for that you need to define your custom task executor or you can set the concurrency label to the task executor now how we can do that go to spring based config class and here you can define a class public task executor you can create object of it task executed then you can create the object of simple async task executor task executor equal to new simple async task executor and here you can set the concurrency limit set concurrency limit is 10 since i have only thousand record i want 10 thread will execute parallelly or concurrently okay then finally return this task executed now we need to tell to the steps while doing the reading processing and writing just use this task executor okay so i will just add here just use the task executor which i am giving you with the concurrency limit 10 fine now what i'll do i'll just delete the information from this table otherwise it won't allow right because same csv file i'm going to upload again the id will be duplicate we might get the constraints violation exception so better let me clear the information from the table delete from customers info okay yes but before that i just want to show you if you now check the badge of execution you can find the job execution id1 and this were the version exist message last update now if you check the step execution you can see here one two 1 0 3 this is the step name fine this is what the first time we got the error right that is what the exist message let me check it yeah parsing error at this because you didn't added the greater answer so that tell me the status is failed and the last one is succeed fine then if you check the badge of instance you can see this so this table was added by spring boot to check the state of your job whether your job is succeed or not how many record is processed how many records are failed everything you can get all the info from this table okay real time while implementing this spring batch you need to play with this table you need to check that job id and its status how many row are succeed and how many are fails everything you need to uh work on the real time now let's move to the code we'll start this application because we deleted the row from this table now we have the fresh csv file which is thousand row and we'll process with the 10 thread that is what we added the task executed now we just want to see how the spring batch will execute concurrently so that i will get the better performance right now let me read on this stop and read on so here we got this error let me check spring batch config task executer yeah because you didn't define this as a bin right just add it now let me cross verify once task executor set congruence limit and this i want to return this object right not the method so you can remove this fine now let me start this application again so application started again on port 9191 now let me let me verify the record in my table first i believe we deleted it but let's cross verify once again there is no record ok now go to the postman before that let me clear the console go to the postman and send the request now we'll see the total time you can see here now the job is done in three second three dot 13 second right now if i will go and check in my db select star from this table you can see here now the customer id begin eight seven eight six there is no order now now ten thread concurrently executing each row from the csv file so we never know which row will take the or which thread will take the which row to process that is how you are getting the record not in a sequence here right 788 you don't know if you go down 366 then it will go down 192 this is how it will concurrently execute okay now what i'll do i'll just remove again everything then i will rerun it delete from this okay now i'll just trigger again send it it takes two second to complete ok so if you will go here and if you will check now it started nine four nine it again started from the end fine because we do not know like thread will never give you the exact expected output it will depends on the thread scheduler whose thread will get a chance to execute that that is how you are getting order not in a sequence that is fine if it depends based on your requirement what is the concurrency limit you want to set 10 20 25 or whatever that depends on your business that is fine we understand how we can process multiple record or the large set of record in a fraction of second that is what the main motto of this tutorial but now in this example we don't have control on the thread we don't know which thread will take the which record but if you want to take the controller if you want to take the ownership on top of thread that you will tell to the thread 1 execute row 1 to 10 or thread 2 execute from 100 to 200 if you want to set your own limit on thread you can do that using spring bass partitioning okay that i will cover in my next tutorial that is what spring boot or spring batch partitioning you have the control over the thread you will tell to the spring bass give fast undertow to the thread one next hundred to the thread two like that you can configure it okay but before close this session i will just give you a small example of the processor because we write this uh spring batch processor but we are not using it okay so let me open my table select star from customer info okay if you observe i have some country bangladesh china iran okay so i'll just filter out with this united states let me check the count of it the number of record count star from this table 25 okay let us say you have a requirement you want to filter out or you want to process only the customer who came from the or whose country is united states how i can add this filter statement while reading and processing it or reading and writing it in between if you want to do any processing or any validation or any filtration you can use this processor okay now i just want to filter out only the customer who whose country is united state so for that what i can do i will go to my customer processor here i will just write the statement if customer dot get country dot equals i'll just copy the proper name united states then only return that customer object or else return the null now you don't process all the record it will get the record and it will check whose country is united state if the if that record satisfied this condition if that customer belongs from the country united states then only it will return that customer object so that it will go to the writer and writer will save it otherwise it will just return the null so we'll verify that let me restart it meanwhile let me delete the db fine so let's switch it to complete application restricted now go to the postman before that let me clear this console go to the postman and just send the request it took 15 34 millisecond because the record is very less because processor filter out the record based on the country now if you go and check in your dv you will find total 25 record you can see here right i just will show you the information rather than count star from customer info you can find all the record still its executing concurrently because the order is not in sequence and you can find only the record whose state or whose country is usa total record count is 25 because that is what the filter we added in the uh customer processor okay this is just one sample filter i added but it depends based on your use case you can filter out the condition from the object or from the information you are getting from the source so this is how you can process large volume of data within a fraction of second using spring batch you can add your own task executor to make it faster rather than executing in a single thread or in a synchronous way you can make it asynchronous using the task executor so that you can get the better performance right that is what we find here and do let me know in comment section if you really want to know more about spring batch partitioning with example that's all about this particular video guys thanks for watching this video meet you soon with a new concept
Info
Channel: Java Techie
Views: 93,799
Rating: undefined out of 5
Keywords: Spring batch example, spring architecture, spring boot, spring batch csv to db, spring batch example, spring batch tutorial, spring batch in spring boot, spring batch processing, spring batch interview questions, spring batch example with spring boot, javatechie
Id: hr2XTbKSdAQ
Channel Id: undefined
Length: 48min 36sec (2916 seconds)
Published: Sat Apr 23 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.