Spring Tips: Spring Batch and Apache Kafka

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi spring fans welcome to another installment of spring tips in this week's installment we're gonna take a look at something that's just been released a brand new thing that we just announced in the last few days on the good old-fashioned spring IO blog the trustworthy venerable spring L blog that is namely the new support in spring match for Kafka right mom would been haseen one of the legendary spring batch team members just announced this and I guess the large chunk of the contribution came from the community by a Matthew Calais I'm gonna go ahead and guess but oh no oh oh hey wiener I don't know I'm sure it's French and I should know how to pronounce it but I I don't know either way merci thank you so the point is we have this nice new support in spring match for working with Kafka and I just came back from Brazil I was in Brazil let's say you know seven four or five days ago it's now Wednesday right as this goes up and I was in Brazil just a few less than a week ago okay let's say it's a week ago and when I was in Brazil I talked to a large Brazilian bank and they're using spring batch they're starting new projects and spring batch now I love streaming processing I love messaging I love integration I love all these things that have to do with sort of streaming workloads and moving data through processing pipelines but at the end of the day there are some things that just lend themselves naturally to batch and I think a question that a lot of people struggle with is how do I bridge these two universes I've got things that are naturally finite they are time boxed they are windowed they are whatever we need to whatever you know you need to whatever you need to understand to make to understand that they are limited right there might be something that gets processed and there's report that gets sent out for management there's a some sort of analytics that are done whatever right and so this is a very common thing is how do I take my streaming processing pipelines these things that are producing data all the time based on their online sort of transactional world that we live in and how do I turn that into data well one way to do that of course is to is to forward it on to on to spring batch now spring batch is great because it can process large amounts of data and spring batch has a lot of stuff built in that just makes life easier a lot of stuff that I think most people don't really appreciate until they kind of take for granted what a spring batch can do but for example spring batch runs in the background and it has a database of metadata system and that's metadata system is is stored somewhere namely in a job repository which interns in by default stores it in the sequel data store the benefit of this is that a spring batch job which in terms of its domain model anyway is a job composed of multiple steps each of which describes how to read and then write and optionally process data write a job has it has metadata about the state of that job of the execution of that job persisted in the database so if something should go wrong as this want to happen when you work in the real world right there's gonna be data that sometimes got an a failure of encoding or something like that right I saw I think Holden Corral just tweeted or retweeted or something she observe the tweet or you know Oh Age 2 tweeters I don't know something she just tweeted something that I thought was really funny which is um if all of our JSON if we required a schema for all of our JSON again I'm paraphrasing here but if we required a JSON file a schema file for all of our jason then we wouldn't have any data basically I don't know I have no idea what that was I can play I meant I may have very well took it out of context it made me laugh though because it's true right there's there's just this huge sort of world of data out there and we need to make do with what we can write and and that's how the large majority of I think work has done these days it's just normalizing that guy stuff right so what are we gonna do we want to take data from the real world we want to take it from the road and process it the easiest way I can think of doing that is the spring batch spring baptize this metadata system if something goes wrong as it will if you have invalid or on you know invalid as a formatted data you can as an operator intervene in the job and restart it you can also provide heuristics you can say okay well if I fail you know some percent of the data or some number of records of the data then go ahead and fail the whole job but if it's just a small you know if it's a subset of the data then you know don't bother right don't don't bother failing it just keep going just ignore those records it works in chunks so you have this idea of a of steps and steps process they read data in chunks maybe you know X number of Records where X is whatever size boundary you can afford to both store in memory at the same time and and lose possibly right because if you lose one record in the chunk then the whole chunk it could possibly be lost so chunks are and it's also a unit of performance you know how many records do I want to write to the downstream thing in a transaction now logical transaction if not an actual transaction right that's called a chunk so you know I don't want to review all of spring batch again we've done actually one of the earlier spring tips videos I did one of the more popular ones indeed was a spring tip that just looked at spring batch 101 right spring back to basics 101 what I think is interesting here is this possibility of bridging batch with messaging through the use of this new Kafka support so that's we're gonna do today friends we're gonna take a look at this really really cool stuff we can get at it from the spring initializer of course we can build new projects as usual and this will be this is already available this this Kafka support is gonna be available in spring boot 2.2 m3 when that comes out and in the meantime it's available in the snapshot stuff right so it's a spring back for that 200 m2 and I'm going to be using spring boot to tattoo Oh snapshot which intern pulls in spring batch 4.20 m2 okay so I'm just gonna call this BK batch Kafka okay and we're gonna bring in the Kafka dependencies and the spring batch dependencies and h2 now again we're not doing anything with JDBC here but spring batch needs to persist its metadata somewhere I'll give it an in-memory data store you can put you can point this to really any modern sort of sequel JDBC data store and it'll do the right thing but I just want the fast recite restart cycle so let's go ahead and generate a project here we are open this up all righty so we need to configure a few things I want to get that out of the way first because it's early enough and you know it's early enough for the project that I might be able to remember it but if I put it yet I might get I might experience odd errors later on so we need several things and these are just regular spring boot and spring for Kafka support properties right there's nothing special here so we want consumer properties okay so we're gonna propagate we're gonna work with data here first of all in our code we don't create an entity or an object or a detail called guess what customer get rid of that all right so customer that's gonna be our thing and the customer is gonna have a field here will have an ID and will have a field called string name and you know what I think I forgot to bring in a game I forgot to bring in Lombok so I'll go ahead and just add this manually I'll use Java Levin I should have done that I should have done both of those things back at the initializer but thankfully this is just trivial even in the bill because all the do dependencies are managed for us so I don't really pay a lot of I don't read there's no not a huge cost and complexity here as long as I know the goop ID and the artifact the versions and all the exclusions and all that stuff is already mandatory by spring boot okay so what do we do we're gonna have a consumer and you can see our entity it's going to use you know all the things that Lombok gives us so generate a constructor a some setters and you know to string equals hash code blah blah blah and now that we know that we need to because it's Kafka we need to specify a key D serializer all right and I I want to use long deserialize ER because our our ID get along right so here's this and again again like I should I could and should probably do a whole video on just Kafka in spring and I've done something you've seen me talk about spring cloud stream they'd seen me talk about um spring code stream Kafka streams in particular spring cloud stream works with RabbitMQ or Kafka or ABS you know Amazon Kinesis or Google cloud pub/sub or whatever but it just as easily works with spring cloud stream and Kafka but I've also done want one on spring cloud stream Kafka streams which is interesting um okay okay good we've got that now I want the value deserialize er so I'm gonna say without you you see later make sure don't misspell it twice and I'm going to use the JSON deserialize er all right that's for me a spring project here not this one there you go okay so I've got this copy reference okay um when we give it a goop ID and actually so I'm gonna call this customers group good and producer will be he serialize won't be a long sterilizer so I'll look up the long sterilizer as well same idea right I just want to have a thing that will handle the persistence of the key in Kafka in coffee but this concept of a message and messages have keys and then a body and the body is the value and now the reason that you have this this concept of a key is because of the fact that really you're not dealing with a message queue what you're dealing with is an append-only log it's a data store and so this makes it possible for you to use spring cloud stream cough with streams to do some very interesting things that that treat cough cough like a database right and Effects of that is that you have to think about well keys basically which is fine not a big deal we can handle this don't know if we could handle much more but looking to me handle this right all right there's the s your laser and we want the client ID why is that right please yes okay and one thing that's gonna be an issue is it's gonna ask us to we're decode data we're gonna though we're gonna write two programs and they're gonna send data from one node to another we're gonna transit through Kafka and on the other as data pops out on the other side we have to DC a lot deserialize it and we're gonna do so using jackson which by default doesn't just deserialize arbitrary object you have to tell it to relax basically and the way you do that is with a trusted package so kafka consumer properties is a spring json trusted packages equals asterisk packages packages all right good and finally we're gonna be sending messages to the Kafka topic topic as the it's the way that they say cue you need Kafka all right and we want to specify the default one for the Kafka template okay so that's everything that's common to both producer and consumer let's now break the examples down so we're gonna create a producer application which is gonna be a spring batch application that just leads from well actually I'm gonna synthesize some data it's just gonna be a thing that's just gonna pump data into Kafka and then something else will come along on the reader side and read the data off the queue and write it to something else now I don't want to actually get into the business of configuring you know file inbound you know file item readers and or flat file item readers and item writers and that kind of stuff so I'm just gonna for the things that don't have to do with Kafka the things that are the you know the invariants in our demo I'm just gonna provide dummy implementations of the interface as expected and we're gonna look mainly at what happens with the Kafka so producer application this is gonna be the thing that we'll just pump data into a Kafka broker it's gonna be a spring batch application and the first thing you have to do of course besides installing something batch besides making swing batch and spring boot rather work so let's do that let's get that other way okay we got this now we've got that now so this spring application got run okay all right there is this now it's a spring badge application we need to tell spring boot to go ahead and create a spring batch job repository and launcher and all that stuff and in order for that to happen we need to say at enable batch processing and that's the hard part right that's most of what we have to do to get spring batch working the only thing then is we need to provide a definition of a job and a job needs to have a some steps and you know all that kind of stuff so we're gonna we're gonna create a writer that's gonna read data from this arbitrary just in variant dummy reader okay so we wouldn't want we want a beam that has a job okay and in order to create the beam we're going to inject a few things that we need the least of which is a job builder Factory okay and a step above the factory so I tend to just bring these in for every spring batch app I build these days they're just help and I'm gonna inject these into the constructor inside my main application this is my configuration class we're just gonna inject them in the constructor but rather than write the constructor by hand I'll use required arcs constructor and we'll just do that okay so this dot job build a factory dot get jobs I don't know whatever job let me give it no that's not good job job please job but QuickTime just really drags this computer down okay um and then further we're gonna have a step we'll come back to that definition in just a moment so step good come back to you but start is this step okay and we'd have an increment er a new one ID incrementer just create a new unique ID for the job itself so that if I need to point to a particular one and say we start that one I have a that's that's the parameter that'll make this job implication unique okay good so we've got now just a basic job with one step and as I said before the steps have you know three things readers writers and processors so our step will be it'll be a thing that uses the step build a factory - you know that's one and then that one turn chunk let's say ten records at a time and in order to do that we're going to read and write data type of customer okay don't even put that from our DTL and we're going to read data so we'll come back to this in a second but suffice it to say it's gonna be a very invariant thing so leader equals new item leader nothing fancy here I entities a type customer okay read that in okay so there's a little reader we'll come back to that in a second and the writer and the item writer what I want to do is I want to use Kafka right so this is where the rubber meets the road here that'll be a separate beam and we'll just say at beam Kafka item writer managing entity the lip type long key peplum of type customer Kafka item mayor okay okay and we're gonna come back to that in a second okay all right so there's our basic thing we're going to use a builder here don't have to do a lot of work to get that to work Kafka item writer dot want the builder dummy so cough kind of writer builder build Kafka template will inject the Kafka template privatized in the same way and again spring boots gonna automatically configure that that object for us based on the the auto configuration so we really need to get that and well we've already got it I think we already injected into the code didn't we know it should okay I'll just inject it here rather than sunny thickens the the parameter list there why not I'll probably use this again there we go it's complaining that it hasn't it can't find it by type but that's IntelliJ stooling not the actual sort of state of the world what else I need to tell it how to take a object and turn it into a thing that it can send in order to do that and you just have a key so I'm going to tell it how to convert a customer into a key in the customer is a if the object that we're taking right so a new customer okay so convertor from customers long there you go Kafka item writer the key and then value okay alright good so customer dot get ID all right seems easy enough right I think I could I think I can do this that should work we've got the topic topic specified so the only thing we need now is our reader is that true I think that's fair item right and finally our reader now the reader all this thing's gonna do is gonna create a new unique ID using an atomic long okay it's just dummy data you can get your data from anything for which there's an item reader of which that list is a long list so you can see neo4j and hibernate j BCE and json-ld if' and lists and and repositories and XML and files and stored procedures and all these kinds of things and and so that's fine right we don't need to worry about that but I'm gonna just synthesize some dummy data here and the ID will be increment and get actually better yet we're going to say if I eat up increment and get less than 10,000 then we want to get the window cut into a customer and I will say if math.random is greater than 0.5 gain otherwise was a ok there we go and find me in where we turn null so null means the end of the batch so again this is great for like if you imagine you have a job that writes things to a kafka queue our topic rather and something comes along in the evening or whatever every hour and then just consumes everything from that queue that that topic and then does some processing it writes it to a CSV file it writes it to a database any you know standard ETL or in this case you're you're taking data from a streaming system and you're moving it into persistence archival right you want to store it in a database no or something like that spring batch is perfect for that because now you know you can you can tap whatever data is coming out of the streaming system once you're done with it you send it to a place with a large buffer size you know so that you can keep you know whatever a half an hour ten minutes five minutes whatever worth of data then you run this job right every whatever and it'll start pulling down data and and then you know writing it to your back-end system so here I'm just creating a reader that will just keep producing new records until I have 10,000 records okay once it's done it'll send null let it return null and that tells spring batch to stop trying to read and it's gonna read by the way one it's gonna keep reading it'll keep calling weed until it reaches this chunk size at which point it accumulate all those read items into a container a collection and ship it to the writer and the writer is the Kafka idle writer so that chunk size that you saw me specify in the job step down here ten it's gonna do that a thousand times basically it's going to accumulate ten at a time into memory and then write all ten to the Kafka item writer which is going to try and do it in a larger person trying to you know as fast as possible if not an actual transaction so if using JDBC that's an actual transaction it'll actually put everything in one logical transaction and if something fails in writing any one of those ten records at a rollback called ten okay so let's see that looks normal good don't need that there's our there's our code that's a program that will write data and so what I'm gonna do is I'm gonna set the Kafka cop cat here okay so we want Kafka cat and topic will be customers and what else does it need it needs the broker - be localhost 9090 - okay so there's the data that we had before is there I'm gonna do what I'm gonna do I'm gonna just restart everything I'll just reset my Kafka set up by using I'm using confluence desktop thing and it solves everything nice and easy so I don't have to worry about it so let me see source okay good stuff okay so I've just reset everything I've got no car you know I just reinstalled everything you know from nothing and now you can see that this doesn't exist there's no data there there's nothing right so I now need to run this program and that'll actually initialize my my destination so here's my producer let's run it okay so the program ran didn't take very long at all even with my CPU being pegged as a QuickTime recording and so if I go here you can see I've got you know all the data that I thought I should have okay so there's that that's good now now what we have we have all the data in the in the in the Kafka queue and now I want to read it don't I so I want to go now and create a consumer application so here it is here consumer Kafka consumer you know consumer application here we're gonna do the same thing in Reverse we're gonna read the data that's been written to the database or in this case to Kafka so Springwood application and by the way like a very common thing is to use spring match and spring integration together so maybe you are using spring intubation and you're doing messaging and that's the thing that's producing the data right and you're so you get a channel that has data in it there's something called a job launching message Handler and that comes from the spring batch integration project and what that does is it it does exactly that you send a message to the job launching message header telling it what job you want to launch and what parameters you want to use and as a result of a message coming in on a channel it'll kick off a spring batch job by default the way we're using it right now when you run the spring boot program it automatically launches any spring batch jobs that are in the application context but if you want to run a job as a result of some sort of processing pipeline maybe you see a file or something like that you can say oh I here's a inbound file adapter in spring aggregation prompts that notice notice is a new file and in some sort of mount or something you know and then that kicks off the job which does the ETL which then writes that data to Kafka right or or you know whatever and then if it writes it to Kafka that can trigger some sort of streaming processing pipeline which then has an put which then needs to be loaded back into the database and that's where spring batch can help you again as a REIT as a listener as a consumer that's what we're going to look at right here so let's do that we need the same kind of basic setup as the first time right we've got the a table spring batch okay our batch processing I suppose good public static void mean already okay dot run consumer application dot class cool and what do we want to do well we want to have a job okay so in order to create the job we need this job lunch or factory and the step sorry the job builder Factory and the step builder Factory so same as before builder factory okay same thing as before I'm gonna inject that into the constructor I'll say required arts constructor that I'll create a constructor that has parameters and storage for these fields it'll automatically initialize them doing it exactly what I would do if I hit alt enter just generate the constructor and once I have the job what do I want to do want to say job builder Factory dot get job whatever implementer new NID incrementer start build right we need a step so come back to this in a second goody okay and our step is you know again we're going to read data so nothing particularly fancy here it's gonna be using the step filter Factory you should give them more descriptive names because remember it's this gets persistent to the database it's how we can look up the state of this particular step later on so don't do as I'm doing do better I know you can so we're again we're gonna read it here so the chunk really is what are we gonna write to in this case I'm gonna write to the console I'm just gonna write a I'm gonna plug in a dummy item writer that would just log all the data nothing interesting here so it doesn't really matter what the chunk size is there's no risk of there's a risk of like being overwhelmed just uh no real risk really you know I'm just gonna logs to him but you should you should definitely take more care when you said this than I have cuz I don't really need to I'm not writing saving a moment that's a good point you can use Kafka as a way to stage as a way to absorb writes so Cochran can take it if you're not sure if you if you have a lot of data that needs to be written into a sequel data store like Oracle or something that don't drop you know you can't drop the same unit amount of data on an Oracle database you can't write to an Oracle Oracle database even if you drop all the indexes and all that stuff you're just not gonna be able to write as fast as you can to a Kafka so this might be exactly the kind of thingy which is you have something that can drain the queue and then attempt to write to the Oracle database slowly and you can do things with spring batch you can actually you know do remote chunking where you actually have it spin off where it actually fans out to work or nodes on a cluster and then it pulls in data from the the Kafka you know you could I can imagine it you know pulling data down from a different topic or different partitions in Kafka on each node for example you know a lots of things lots of opportunities built into spring batches of framework itself and that those only get better with them with with Kafka which is built for scale as well so okay step build a factory writer is gonna plug that in as head of our writer new item writer customer okay alright and I'm gonna use a logger when you use log4j okay log this out I want to do that for each item it in the collections of items so I'm going to say items up for each okay how about that and all right so there's that and the real magic here the thing I want to show on this side is of course the reader that's the Kafka reader slowly say reader okay and did we provide the beam already nope okay let's do that here so I've been kaka item reader simple right right okay so first things first we need a new Kafka item reader builder and it'll be dice a string it's a long isn't it so it's a long and a customer so what partitions well I'm just gonna use one leg out the one at this point we need some consumer properties and this is where you know we get some help with the auto configuration here bar props equals u properties ok and what I need is the consumer properties from the add up for the property file the application up properties it's all inject the cough-cough properties object which is behind the scenes here it's kind of a thing that gets leaked from iam the auto configuration but I do need it so I'll use that so properties props that put all this properties dot what I want those consumer properties I guess this looks right okay and give it a name it's gonna call customers waiter sure and build and don't want to save state you bet chide you okay oh and we need to topic tell me we need to tell it we need to tell it where to read from right so topic will be customers that's what we said right in the property file so I can probably have injected that too I should probably not duplicate that but it's the default topic is here for the reader sorry for the writer and on the consumer side its customers here so they're sharing the same address in the broker okay I think that's it we need to plug that in Kafka item reader I got the writer so we know we've already got data in the database we already saw that right I I wrote a bunch of data to Kafka it's here we needed to pull that data out now okay so there's all that let's now read it using this consumer and there we go doesn't host this topic partition well that doesn't seem fair is it zero I should have done zero what kind of nerd am i that i didn't i didn't use zero Visual Basic nerd that's what kind alright alright what have I done now oh it says I can't cast a string to a so why would it do that did I screw up the property configuration here let's see for the consumer side value - Sierra Lazard is Jason - sterilizer long long seems legit trusted packages are correct and I somehow where does it hmm right items do the right what did I have sort of in I mean incorrect data I've got a writer what's the writer is a issue I had a reader see item writer made of long customer long and customer unmodifiable collection so it's saying that it's got the the value that it has is a value to see Eliezer on the oh I had a properties element in there so it was trying to use the default oh that's so bad okay so that should work if you can't come in click on it inside of the IDE it's it's I'm using the properties element which is just a map of arbitrary keys and values it's just it doesn't mean anything so it was trying to use the default no wonder it was so upset okay so there we go look at that when I fix that it now knows about the right DC Eliezer and there's all my data and that my friend has been a very quick look very quick look at using Kafka and spring batch and again the real opportunity here is it's in the innovation it's in the possible integrations the streaming to badge and batches of streaming use cases imagine taking data off of a message queue and then writing into some sort of persistent store or archiving in or whatever putting it in a warehouse for analytics this is the exactly the kind of glue code that you need to write to get that kind of work done and it could be also the place where you do the processing in of itself right so spring batch is great for this kind of stuff all in all I'm super excited to see this land and spring back I hope you'll give it a shot and feedback obviously we're not yet g8 so the sooner the better alright thanks so much for watching and again we'll see you next you
Info
Channel: SpringDeveloper
Views: 26,402
Rating: 4.8947368 out of 5
Keywords: Web Development (Interest), spring, pivotal, Web Application (Industry) Web Application Framework (Software Genre), Java (Programming Language), Spring Framework, Software Developer (Project Role), Java (Software), Weblogic, IBM WebSphere Application Server (Software), IBM WebSphere (Software), WildFly (Software), JBoss (Venture Funded Company), cloud foundry, spring boot, spring cloud, kafka, batch, spring batch, apache, apache kafka
Id: UJesCn731G4
Channel Id: undefined
Length: 44min 15sec (2655 seconds)
Published: Tue May 14 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.