Spring batch tutorial | Step by Step Guide | #4 Optimise Batch processing with Task Executor

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

now we are at the phase of testing what we already implemented so for that I asked chat GPT I gave it this structure and I asked it to generate 100 student respecting first name last name and age and generating some random information so I will paste the data right here so this is what we can see we have these students right here now all I need to do is to go ahead and run the application so let me make it full screen and now we see that the application is up and running and before that I want to go to my database and refresh it and see how tables are created so first we see here that we have seven tables instead of one even we created only one entity in our application but if we go here we see that we have six tables related to Spring batch so we have the batch job execution context content context peram instance execution and execution context so maybe let me open the diagram for you and I'm going to make it full screen so here we see that we have the following structure and here we have the batch job instance has job execution and the job execution has a batch execution context and parameters as well as batch step execution and the batch T execution has a context so you can have a closer look at these different tables now what we want to do is to run our application so we need to invoke our endpoint and for that I will simply go ahead and create an HTTP file that will simply make or perform a post mapping or a post call to my rest API so here I will just right click and create an HTTP request file and then we call it just demo so for that all I need to do is to do post and then Local Host and my port is 9090 and then slash students all right so this is the end point we just created now when I run this one I expect to have 100 students in my table so for now my table is empty and when I run this one I expect that we import 100 student so also I will clear the console here so we will see the execution time of this so now if I run it and then I go back here we see here that the job has been executed or the step CSV import executed in 189 millisecs so this this is super super fast all right so now if I again refresh my table I see that I have the 100 student already imported in here so let me maybe organize this one so you see see here that we have these students all right so next what we want to do we want to multiply the number of users that we already created so I will go back here and this one I will multip multiply it so many times just to have for example 100,000 students all right so here as you can see we have 100,000 students so you see the number of the lines right here so we don't care about the IDS because because it is already autogenerated and it will be autogenerated by spring so let me restart the application because we need to restart the application in order to load the new content of the file so the application is up and running let me clean this and let me also double check that the student is empty all right so now if I run again this end point let's see what will happen so first you see that it's still spinning so it's running and this is going to take take some time maybe seconds maybe minutes depending on the machine where you are running this import but let's wait and see how long it will take to import 1,000 student sorry 100,000 student so here it's done and if I go back and check the output so it took 20 seconds to import 100,000 students and now if I check again we see here that we have so many students all right right all right so here we see that we only have 100 students because we are reading the ID from the CSV file and hibernate will automatically update so let's do some changes and retest again so now we can do it in two different ways so whether we can remove the ID column from the CSV file which will be a long thing to do but the easiest step is to go to the student processor and what I want to do here I want to say stud student. set ID and I will set the ID to null so because when we pass a null ID value hibernate will automatically persist or create a new entry in our database so in this way you see here that even we are processing or are doing some logic in here so I will restart the application and then let's try to import again so let me clean this one also let's make sure that the list of student is empty again and now let's run the end point and wait for around 20 seconds to finish processing all this data and then I will absolutely show you how to improve the performance of your spring batch processing because we need to reduce this 20 second because imagine importing a file or uploading a file containing millions of Records so 20 seconds for 100,000 is not acceptable all right so it's about to be done and even I can show the the log right here once it's done it will log the execution time so now maybe it will take more than 20 seconds also as you can see since we have a longer process and we have something different and we are persisting always it's not just an update it's a creation or insertion into the database so the execution is taking a bit longer so we need to improve that absolutely so here as you can see every time I refresh the data we see that we have new data is getting persisted and here so for example it's 91,000 and if I move on and refresh again so it's still persisting the data so it's taking some time and we need absolutely to improve that and there are so many ways of improvements in order to improve the processing all right so importing 100,000 students like simple data nothing is complicated took 6 Minute 35 seconds so now let's see how we can improve that so now in order to improve the import and the batch processing what we can do in our batch configuration and for this step we can define something or an object called task executor so task executor will help us and allow us to Define how many parallel threads we want to execute for our step so first let's define a bean of type task executor so here I will type public and then task executor and I will call it task executor and then what I will be using so I will use an object of type simple async task executor and I will call it async task executor new async task executor after that what we can say so async task executor set concurrency limit so how many threads we want want to run and then let's return our async task executor and let's understand what is this so if we go to the official documentation so this sets the maximum number of parallel task executions allowed so the default is minus one indicates no concurrency limit at all so when we imported our CSV file the first time it was using only one task and just to make sure here this the value of of this concurrent limit as you can see here it's set to unbounded concurrency and this one equals already minus one so we have no parallel threads running and now after defining this task executor so here in on the step level we can Define task executor and we can pass our task executor bin also something important you need to be careful about this you need also to take care about the resources and the power of your machine where you are executing this batch processing so now let's restart again our application and let's run the import again so the application is up and running and now I will open my demo file or my HTTP file and I will just run this one again so let's see how much this will improve the import and how long it will take all right so the file has been totally processed now let's check how long this one took all right so here we see that it took 306 36 seconds instead of 6 minutes 35 seconds so you see now the difference by just implementing or by just allowing or having 10 threads running you can increase this one to 30 for example or a different number and check the difference now I want also to show you some other way of improving and this also might improve a little bit our processing so let me make this one full screen and as you can see here we have this chunk so the Chun we have 10 records per thread we can for example improve or like we can make this one 1,000 and rerun the application again and see but first let's double check that we have all the 1,000 100,000 students in our database as you can see here we have all of them so now by also increasing the chank size let's restart the application and see if this is going to also improve the batch processing of our file so here the application is up and running and I will again start importing and see how long this is going to take all right so the import is done now let's go and check how long it took so now we see now it took only 29 seconds so we also improved it by 6 seconds and believe me 6 seconds in production or in a highly scalable application and on a high demand application this really counts all right so we saw how we can improve our batch processing also we can improve it using partitions and this is something we will see next

Info

Channel: Bouali Ali

Views: 2,346

Rating: undefined out of 5

Keywords: spring boot, spring batch, batch processing, batch, spring, item reader, item processor, item writer, job, job launcher, task executor

Id: iBCDJtb6u7A

Channel Id: undefined

Length: 11min 19sec (679 seconds)

Published: Mon Dec 04 2023