Event-Driven Architectures Done Right, Apache Kafka • Tim Berglund • Devoxx Poland 2021

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
modern acceptance that sounded pretty cool i think that means we're supposed to get started um so let's do that it's 1 30. that by the way is a picture of my grandson his name is alex he was born nine weeks ago and he's super cute you needed to know that this is the title slide of the presentation that you're here to see hi my name is tim berglund and it is good to be back uh thanks for being here in person i am so happy to be here in person i think this is my third time here at devoxx poland and i hope there's a fourth and a fifth and and lots more because this is a great place to be i work for a company called confluent and i work with apache kafka a lot i run the developer advocacy team there at confluence so our job is to make it easy and fun to learn how to use kafka and learn how to build systems with it and that's kind of what we're going to talk about today um uh kafka is this is not a kafka talk really but it's really a talk about the kinds of software architectures that have tended to emerge around event driven systems what we'll broadly call event-driven architecture um i think there's a building consensus that a vendor of architecture is a good idea right people are seeing okay there are benefits here and those benefits are more valuable than the costs the things that you're trading off so uh it's a it's a thing that more and more people seem to be excited about uh but most of us are new at it sometimes i get to meet in person with people who are trying to build systems that are they're purely event driven systems and everybody's kind of building their first one right like you you know how to build monoliths everybody's good at that but you break the monolith into pieces and you want them to be asynchronous and reactive and all those cool things and everyone's everyone's kind of doing their first and it's hard but they're not so new that we haven't been able to learn some lessons so this talk is about some of those lessons some of the things not to do when building an event-driven system but before we get to the the don'ts and the do's uh i want to give you an idea of what i really mean by event-driven architecture now uh i'm about to go through a diagram that i think i have presented in a different form in a different vaguely related talk on this exact stage now the diagram looks different enough it's it's definitely it's been completely re-mushed and reformed it's not this talk has never happened before in public um but it's it's it's an idea and a kind of a reference architecture that i've been talking about for a while and what i love about it is it's just so dang useful there's a lot we can get out of this so i'm going to show you i've got this little reference e-commerce system and i want to show you what i mean by event driven architecture uh right here with this system so here are some pieces we'll start with these two things that that have our web front end over there on the left you've got a way of placing orders there's some web front end and maybe it's react and it's gonna create little blobs of json and impose them on that orders microservice over there and that orders microservice has its own local database it's not to be shared with anyone else that's nice over on the right you've got the same kind of thing for users people can create accounts they can edit their address and their name and they can delete their account and and go away and be forgotten never to be remembered again and and that database is also local to that service so there you go now let's build out this is not really quite a minimum viable product so let's build out a minimum viable system with a few other services in an event-driven way now when a user um submits an order says here i would like to buy this thing that order service will validate the order and write it to a kafka topic now like i said this isn't really and i i'm saying kafka just got kafka on the brain it doesn't have to be kafka okay we're just talking about event driven architectures in general i will constantly go back to kafka just because that's my habit but this is a this is a log okay that thing that i'm showing there in kafka world kafka land it's called a topic but it's a durable log of events it's not a cue it's not something where we put a thing so that somebody can pick it up and then it goes away and it disappears after it's it's consumed but it's a log that persists for potentially a long time this is remembering the events that have happened in the system that's very important so we've got that durable log in between those two things you put your order in that log and the payments service says oh look i have work to do there's a new uh validated order in my validated orders log so i'll pick that up and i'll do my work what's payments going to do well it's going to go talk some payment gateway and we like things to be asynchronous in event-driven architectures and all the stuff on the bottom is asynchronous not everything in the world is though right because that payment gateway is is probably synchronous right i make a a rest call and i sit around and i wait for uh that http uh connection to to work itself out get a result back so that'll happen i'll get an answer back synchronously and now i have settled that order i'm confident i can get money so i'm going to produce a new event to a new topic or a new log that says hey this thing has been paid for this is a settled order now which means we can ship it right and so the shipping service has work to do that's nice and it goes and it gets that object out of that log or consumes a message from a topic if we use kafka language and it says okay great um i have um this this uh order to ship and it says it's it's uh user number 24601 and that that needs to be shipped to 2461. i don't know where she lives or what her actual name is i can't ship anything to her i need to go ask the user service now there's this temptation right to make that a synchronous call at this point because it's i just i need i'm kind of blocked on being able to do anything until i have that information and we want to keep the users table in the user's service within that context there's a temptation to do that synchronously right well i don't want to do that so we're going to do this other neat thing here's a way that i can share data in a completely event-driven system i do want a copy of the user data in the shipping service which seems i don't know heterodox a little bit maybe but uh what we're going to do is anytime anybody creates or updates or deletes a user in the user service from that that web front end i will now produce an event to a change log now that change log again with with kafka glasses on that's just another kafka topic okay or if it's some other if it's pulsar or whatever it doesn't matter what it is it's that thing and we put that message in there and it it it uh is the the current version of that user so when i create a user produce a message to that that log change my address produce a new message to that log which lets the shipping service now materialize its own little read-only key value store of users so i i can now effectively in memory look up user 24601 okay that's that's this lady is her address and i can create the shipment and away we go and i would most most most likely then ideally do this with another log i'll produce an event say here is the order and the warehouse can then go asynchronously pick that up and have somebody pick the products and put them in a box and labels get created and all those warehouse things happen uh asynchronously okay so this is the reference architecture right here and this is kind of what i mean when i say event driven architecture uh it it more or less implies microservices these days and this is the way we're going to be interacting i want to look at five ways we can screw this up that's that's that's your outline for the rest of the talk we're going to have five of these things five ways to go wrong here but all of them are going to be with reference basically to this diagram we're going to pick apart different parts of this diagram which like i said i've been using this story in one form or another to talk about different concepts for a while and it's i think kind of neat how rich this you know completely fake if you work in actual retail or e-commerce and everything you may well be aware that you have more than four microservices in your whole system right but you know with just these few services in here uh there's a lot to unpack so let's do that how can this go wrong um there are five ways so the first way we can go wrong is when we use event-driven architecture and we just shouldn't have now anybody who gives you an architectural paradigm and says this is universally applicable it is always the thing that you should do in all cases is probably a person who is selling something and even if they're selling something they're not doing a very good job of it right that's you shouldn't sell things that way either not that selling is wrong it's just that you know saying this idea works all the time is is wrong so there are times we shouldn't we're going to look at a few of those some some suggestions of when this is the wrong approach uh sometimes people get a little excited um i was in a meet-up last night and i was talking about something sort of related to this and actually using the same diagram for different purpose because i do it all the time uh and somebody said hey isn't kafka like a database and that person was being a little bit of a troll uh somebody i knew he's right there um but uh sometimes when we start to internalize the ideas of a vendor of an architecture we start to think oh hey we just need logs of things we don't need databases anymore databases aren't cool this is the new thing that's going to cause me to not need relational databases anymore that is an immature approach to event driven architecture so we're going to look at um at too much database aversion that's a mistake we can make schema this kind of sneaks up on you this may or may not be obvious off the bat but uh managing schema is very important just like databases don't go away just because uh event logs become the center of our world rather than database tables being the center of our world that doesn't mean that schema is somehow an obsolete concern schema is a first class concern and we'll talk about that writing in an event-driven system is usually fairly simple reading is usually not reading it might be better to give than to receive but it's more expensive to consume than to produce and sometimes people make the mistake of trying to build too much into the components that read from event logs it's a very specific failure but we want to look at what happens when uh your consumers are too custom uh there are there are good things and again this this actually kind of becomes a fairly kafka specific thing but there are equivalents all over the place however you're doing this you need to make sure that that you're not customizing those consumers too much uh finally scale will look at scale and not properly respecting uh how hard it's going to be um there are really two errors there one is thinking wow we need to design this thing so we can be netflix and scale that much well i know it would be cool you're probably not going to be netflix scale wise you probably don't need that much scale but also just not thinking about scale and trivializing that set of concerns is a problem so we'll talk about that all right again like i said before anybody who presents you an architectural paradigm and says it's the paradigm that you use all the time is a person who's not being trustworthy this is definitely true of event driven architecture it's a great idea i actually really believe that it's it's sort of a generation-long uh architecture paradigm shift that is going to affect the default way we build most systems for the next season of life in software development 20 30 years i think it's a big and important thing but it's not always the answer and another painting that i don't know i've i've talked about this painting before this is called the school of athens it's painted by an italian guy named rafael and every single person in here is a person either from renaissance italy or classical antiquity uh it's it's the guys in the middle right there that uh we tend to focus on that's plato the older guy on the left and aristotle the younger guy on the right the guy kind of like you know he's saying you're here you need to bring it to here that guy that's that's aristotle um uh very broadly speaking um when we when we are building things we're putting on the event-driven architecture hat when you begin to think about a system you don't begin by thinking about what is you think about what happens so going through the process of building an event-driven system you think first about what happens um when you're building just say a monolith you start with like a at least my thinking in a new application maybe this is just me but i tend to think of of a schema first right and if i'm thinking of a schema first i'm thinking about what is not what happens then i work out what happens later on well um you you have to have both of these okay and i'm going to come back to this idea a few times as we cover some of these other anti-patterns and and and pitfalls you always have things and you always have events it's just a matter of which one you're going to emphasize just like in actual philosophy i mean it's not obvious the degree to which plato had a correct account of reality or aristotle had a correct account of reality there isn't a consensus on that um i mean people disagree right and so whether you think events first or things first you can prioritize one or the other and that priority implies a choice of data infrastructure which in turn implies an architectural style around that data infrastructure um but it's it's not like you always have to go things first or events first and that's what this choice is really about this this um we're talking about the first mistake which is is using event applying event driven architecture when we ought not to so um when your scale is very small when your system is very simple uh you might not need event driven architecture if you're building something that's small initially and it's just not very complex you can really fit the whole thing into your head at once and it's it's not designed for a great deal of scale then use a database and write a monolith pick a framework that you like and build an application around that collection of state and you're going to be fine right don't don't over complicate this and and deploy a kafka cluster and and i need kubernetes because now instead of uh classes uh i'm you know a little application that's got maybe 15 classes that interact those all need to be services and i need to deploy those separately please don't do that right terrible idea just build a monolith it's fine um and i think a a key point there is when we think of the word monolith number one it's bad right we just know it's like profanity monoliths are bad we don't build we're trying to get away from monoliths monoliths are big and imposing and if you're one of these australopithecines and you wake up and you see it there it's like scary because it's a big giant alien thing you know if it's a small program you don't call that program a monolith architecturally it is one and i think i want to give you permission to just embrace that and if you're building something small go ahead and do it and if somehow you don't know the film this is from 2001 a space odyssey strongly recommended outstanding film the last 15 minutes drugs are definitely involved uh so just be prepared for that to be maybe a little confusing but the rest of the film is is great you should see it um and again i i would say if something is small if if uncertainty is very high like you don't exactly know what the business is asking you to do and you're kind of spiking something and trying to sketch out an application you're going down a path you're not really sure where the path leads then then maybe events are not for you just yet just build something simple architecturally simple and try to learn what you're trying to do before you elaborate the architecture and if it sounds like i'm giving guidelines for when or when not to use microservices to some degree i am right these are these are not equivalent questions but they're very related questions and in my opinion the guidelines and the reasons not to use events sound like the reasons not to use microservices all right um i didn't show you in that reference architecture i showed you the asynchronous version now it's possible to build a system that looks like that not use events not use event logs uh and and have those services all be synchronously coupled right so i still have microservices but they're calling each other with rest or something like that that's not what i mean here that's not a good substitute for event driven architecture so you might say well i'll still build services and i'll just have them call each other synchronously if i if i could just make the claim and for now not defend the claim maybe find me later tonight at the reception and and you can try to get me to defend the claim then um i'll say uh don't do that that's that's not what i mean sometimes a monolith is the right thing to do and just do it all right number two how to go wrong with event driven architecture when you should have used more databases now there are uh centralized databases in the world and their presence is not somehow a failure it's not bad something to be embarrassed about uh it's just kind of how things are and so your life as you're building uh your new event-driven system it's probably gonna coexist with legacy systems that involve databases maybe it's a big giant scary mainframe maybe it's just a nice little postgres database that's doing something useful uh and and making people happy uh an observation is that old paradigms and i'm kind of proposing the the data at rest database centric application architecture i am saying that is the received paradigm that's like what has been happening we could say that's the old way of doing it that's a little bit uh judgy i think but we'll just call it that's that's the thing that we've been doing we're going to keep doing that and we're going to build new systems that supplement those old systems that don't replace them that's how it works a little bit before i really got into software or this kind of software early in my career early in the 90s i was writing firmware so i spent like six or seven years completely ignorant of what a database even was um i got to debug software with an oscilloscope that was pretty rad but i just didn't do any of this stuff but early as a younger man the relational databases what we called open systems and all that kind of stuff those were pushing uh mainframes out of the way as the right way to build a new system but they didn't make mainframes go away they're still here right you just built the new thing on the relational database and now we're building the new thing with events and leaving the old systems in place that's fine so uh there's a whole set of tools called change data capture that help this coexistence uh work well and uh there i'll give you a link later on in fact twice i'm going to give you a link in this presentation to a website that's got more things to just dive deeper if you want to know more about change data capture there's there's all kinds of talks we have online for that but that's a good way to sort of extract data from a database and put it into an event log i just wanna i just wanna say it's okay if you have databases doing important work you need to keep that in mind i'm belaboring this point because uh you don't want your new affection for events to convince you that you just don't need databases anymore it's not true or uh well i saw this talk online that tim gave a few years ago where he was talking about how you can get acid transactional properties out of a system with kafka so i guess kaka's a database i don't need databases anymore like just don't say that or if you say that don't say i told you to say it uh because i told you not to say that another um i think uh sort of mature thing here is to think about skills right um maybe i still want to build a database-centric application or use that sort of static data at rest paradigm because events are hard and this is actually a difficult way to think i didn't take a class on this nobody's built one of these before it's actually hard to think about how to solve problems asynchronously if you've been doing synchronous data at rest stuff for your whole career so you might not have the skills as this skilled balance if you've ever really thought about somebody trying to work on a balance beam i have a daughter she just turned 25 two days ago 25 let's go with 25. two days ago she was a gymnast all through her adolescence and teen years and if you actually watch a child do things on a balance beam it's terrifying because it looks really dangerous there's a lot of skills involved there and you i think the point of this slide is you might not have those skills and so you might say hey let's not go there right now until we can get those skills and i think that's a mature judgment sometimes when there's a new paradigm that's beckoning and you've got this high risk high visibility project you could use it or not and you think maybe not yet maybe it's not time now that will trap you over time you can't live there forever and so oh shameless plug for confluent developer this is a site that is very near and dear to our hearts where i work uh and it's got a lot of great content video courses and library of patterns and and um i can plug it because it's free and in fact you you can't even register so you will you will not be turned into a marketing lead if you use this website we have no idea who you are you just go there and get stuff and hopefully you're happy okay another problem remember we're focusing right now on on when you should have used more databases what's another way you can go wrong here well uh let's look back at this system here and that table over there in the user service okay now straightforward enough uh this is where my users live and uh it's like a database table and it's not that hard um some kafka specific details that's just my where my head is um there's a stream processing api at kafka called kafka streams and it's gonna it will let me basically maintain uh a table basically it's called a k table that's that acts like a key value store right i get one key and i can use that to look things up in memory and that's backed by that change log do i have uh yeah i do okay that guy down there that topic that change log topic i can effectively materialize that in memory as that table and as that table and query that in memory and the framework does all that stuff for me right so i could say hey i've got events i've got kafka i've got kafka streams dang it i learned how to use kafka streams it's not easy but i watched that video on conflict developer that tim told us about and i have superpowers and i'm gonna do it and i don't need a database anymore because i'll just make it a k table well maybe okay that's good there's good reasons to do that but it also might be bad because remember kafka streams a k table is a key value store i can look up by user id that's great all right so i've got that but what if i also need to look up by username or i forgot my password and i need to look up by email or something like you know if i if i've got multiple keys dang it just make it a database it's okay so um don't be allergic to databases just because we're using events the thing that changes when we adopt event driven architecture is that the the system of record the the place data goes to be known and stored and remembered is an event log databases are views of the events in that event log they are necessary views but they are not in in you know once you've really asked event driven architecture into your heart this is kind of what you're doing uh those databases are things that we materialize from the system of record they are not things that disappear so you'll always have databases a part of your life number three um our third way to screw up with event driven architecture is thinking that schema doesn't matter schema does matter let's look at that guy down there that topic um so i have orders i get them from the outside world and i validate them and then i write them to this event log and why do i write them to the event log well because in my minimum viable product the payment service needs to know what to do and so my thinking as the developer of the order service is i put this event here because either i'm going to go write the payment service or my friend over there is going to write the payment service and they need something to do so as i'm thinking about my minimum viable system i'm thinking about the producer and the consumer that's in my mind and that's fine okay what else might happen though maybe down the the line a future version of me or down the line somebody else maybe it's somebody i work near maybe i work for a big company and it's somebody i don't even know wants to do some new things like maybe they want to do fraud detection okay now maybe fraud detection has to live in between orders and payments and so we'd have to change things i don't know maybe this is like asynchronous fraud detection and just there's a dashboard that somebody has that says something might be fraudulent but i can just i can write that service somebody can write that service because the the orders are in the log they're they're knowable maybe i have a customer loyalty program so i can send annoying text messages to people that say hey thank you for being one of our platinum customers and i don't know what we sell it's toilet brushes or something and you ordered that you spent more on toilet brushes than anybody else and so you're a member of our shining porcelain club or something that that just came to my mind it's one of those things you you go there and then you wish that you had thought of a better example but it happened and it's it's on youtube now we have to live with it so with as i said the pivot that happens isn't that we expunge databases from our live lives but the event log becomes the system of record this gives us the opportunity for a relatively more evolutionary architecture than we probably would have otherwise realized right we want something that's more like an ecosystem where there are all kinds of plants and animals in there and uh no one person needs to be on top of all of that no one developer you know what you're working on and it serves business goals and it's governed by some kind of process and people know what you're doing and how to deploy it and all that stuff and there are other people who do other things but you don't need to fit that whole ecosystem into your head actual living systems like this i think are um a much more flexible with respect to schema right there's a few primitive types if you think about it there's like water and oxygen carbon dioxide and some amino acids and like some lipids and into glucose and it's friends and polymers of it and that's like that's like it right and then things eat things and not everything can eat everything else but it's their surprisingly surprising amount of flexibility in what can eat what that kind of works out without without too much schema we're a lot worse at schema so we have to be a little more systematic um and to get specific a way to do this again kind of with my i didn't explain the shirt by the way i should explain the shirt i always forget if i don't explain it up front so i'm just going to stop right now because i was about to talk about kafka again but this kafka shirts fire the wheel and kafka those are like the three fundamental innovations of mankind it's super funny um if you're watching the video later on i want you to know that there was a murmur of a little bit of amusement that happened there i don't that came through in the recording i won't say it was laughter but it was like mild amusement um all right so a uh fully managed kafka kind of lens on this um this is a topic in a confluent cloud cluster that contains orders and uh they are protobuf orders and here is the schema of those orders there's a component called schema registry and this is in the this cloud product you can also just use it for free and manage your own um but we have to track this for two reasons number one i have to be able somehow to look at that topic and know what's in it if i want to write a service that consumes the data there i have to have some kind of automated mechanism to build you know pojos from this that will let me uh you know write code against the current version of the schema i also need to be able to manage change in the schema over time because well the schema is going to change right because that's what schemas do even if they're perfectly correct at one point in time the world changes and so our schemas have to change and so that's the thing and this is a hint remember the the aristotle plato you had plato pointing up you had aristotle like bro no down here that whole thing this is that tension working itself out all right because the i have to apologize for the gross generalization here but the the sort of aristotelian events first thing that's just terrible i can't even call it aristotelian it's more of um you'd be more like like like william of ockham maybe but that's a different talk um but this events first thing where you're thinking about what happens not what is well it's like the the things instead of events here things are popping their head up out of the stream of events and winking at you they're still there you still have to deal with both those both of these things you can think of things first you can give things priority and choose data infrastructure that prioritizes things and that's a database or you can think of events first and give events priority and choose a data infrastructure that prioritizes events and that is kafka or something like it some sort of event streaming system those are two different approaches to data infrastructure two different approaches to prioritizing the tension between things and events but in both cases if you go events first you still have to know their structure that's a thing if you go things first well the things change those are events and you have to process those so in both cases they're they're always there it just depends which comes first now uh this gets into something something i think is interesting and this is uh data mesh is a new idea but um this could totally be selection bias on my part but i'm seeing it talked about a lot so i just want to take a brief little excursion over into data mesh territory for a minute um and and think about this because it's a way of i think helping us navigate this tension more from an analytic standpoint than an operational standpoint i've been thinking i've been talking about application architecture but this brings this over into the analytics world so there are these four key principles of data mesh number one is that data is owned by a domain so like we are the people who build the orders service and so we're responsible for publishing and maintaining and thinking about the order's data from an analytics perspective so anybody in the organization wants to do analytics on orders well we're the team who writes that so this is our data we own it completely we understand it um and we think about it like a product it's not just oh well yeah there are there are rows in a database and data warehouse team you can read that table that's not very nice right like take it if you want it that's not product thinking product thinking would be more like no no no wait this actually matters we're going to document this and there will be a schema and we're going to carefully publish it and manage the quality of it and all that happens you know it's a new thing application developers get to do is is manage that analytics output um that data is available everywhere subject to governance concerns it's a thing that i publish so again with kafka lenses on if there's a topic i can produce my analytics events to that topic and anybody who's able to read that topic now can consume that um and data governance is a we'll get into that it's a it's a it's a data mesh thing there's this principle of federated data governance which is a good idea but this is when i think about schema in an event-driven system my mind goes here maybe a year from now it won't maybe a year from now this will be 15 slides in this presentation because it'll be a much bigger idea if you want to know more the the idea certainly not mine uh it's the brainchild of a lady name named jamaic tegani jamek gave a keynote at the kafka summit europe in boy i want to say april i don't know this year is kind of fuzzing out a little bit but you should definitely take a picture of that qr code and check out that keynote it's great um and i also had the pleasure of interviewing her on my podcast the confluent streaming audio podcast so just google jammek and and um go there you'll get the correct spelling of her name and uh google her and check out data mesh if it sounds interesting so schema is pivotal to this effort we have to understand what the stuff is in there so that things can emerge so that other services can emerge around there and uh there you go don't disrespect schema okay um the fourth way to mess up with event driven architecture i've been boy you know before the pandemic i was toying with the idea of getting some tailored shirts made and you know i think people have been dressing down a little bit in the last year and a half and so that just hasn't seemed like a big priority and it you can get it done somewhat affordably where i live you know it's like not the worst idea i kind of have to kind of wrestle with myself right like do i want to be the kind of person who gets tailored shirts like then i have shirts that fit better but i also get tailored shirts you know there are no solutions there are only trade-offs um but like that's fine maybe again if we talk later on at the reception you can help me work out that little identity crisis of my own when it comes to the programs that read data out of event logs um they're always custom but you don't want them to be too custom so let's talk about that a quick just for kafka context a quick reminder of how this works i've got a topic that's my event log it's broken up into partitions and i have a program called a consumer that consumer can just be like a single instance that reads all of those partitions because this is a log and not a queue i can have more than one of them reading from the same topic and that's fine you know as many as you want that's kind of what i was saying in the schema section just previously i can also scale these out right i can i can add instances of the uh of any of those consumers and uh the partitions will just get automatically reassigned to those extra instances so i can i can scale compute and that's that's pretty cool that's essentially free with just completely vanilla kafka you have code that that looks something like this this is horizontally elastically scalable consume code but in the middle there there's not much going on that api ah that this is this is potentially dangerous you just get an object it's got a key and a value so um you know if this is all you know and you go off and running and you start writing your services and you're like tim really convinced me that event driven architecture is a good idea i'm totally going to do this for the rest of my life that's great and this is all you do um you're gonna do things in there and and you're gonna you're gonna compute aggregates which means you're gonna have to group over a time window and run some reducing function over the group you're going to enrich one kind of kind of an event with another kind of an event which is you have to do like a join and that's hard to do okay there's a lot of stuff that's going to happen there uh and you're going to be tempted that's supposed to be black don't worry don't don't panic i've learned that black slides can be very unsettling and i don't want you to be unsettled it's supposed to be black right now uh so it you're gonna you're gonna want to build a lot of framework to get all that stuff done and you know how it is you you write it once twice like third time you write it you're like oh i'm gonna i'm gonna build a framework it's a good idea well that's a trap because that's not your job right your job is to deliver features that benefit somebody in the business or your end customer or whatever so admiral ackbar famous software architect is warning you against that trap you don't want to do that so like if we think about this i've talked about this and i'm i'm really not i'm talking about particular ecosystem things like kafka streams k tables i know i haven't spent time introducing that idea so some of you are kafka streams developers you're right there with me others of you are just kind of going on the little bit i've given you um but the little bit i gave you was that that um that change log down there that's got a new message in it for every time anybody has updated a user object well i need to now pull that into some in-memory thing like a hash table let's just say it's a hash map it's a java hash map right i can turn that into a table no problem it's a hash map easy well um oops went the wrong way hi admiral miss you um that's not so easy because what if it looks like this right what if i have i have two instances of that consumer and some of the hashmap is here and some of the hashmap is here well that top instance look at the lines it has it has users that are in partition zero and users that are in partition two and what if i have to scale that out because i said well yeah i have all this horizontal scalability it's so cool let's do that we'll scale it out and now the users that were in partition two have to come out of that hash map in consumer a and go into this completely new jvm in its own image somewhere else in the kubernetes cluster and you don't want to do that like you're not in that line of work that's the trap right you don't want to start doing distributed state management so this is what i mean by going to the tailor for your consumers your consumers contain business logic that's unique to you that serves your customers that serves your business and that's your job is to deliver valuable features to the people who rely on your software there are all kinds of infrastructure temptations which you must avoid fun as they might be to solve and state management is chief among them also i mentioned grouping over time windows that's super hard managing time in an event streaming system that's almost certainly not what you get paid to do so don't do that kind of stuff uh there are tools in again in the kafka sense of things i've got the kafka shirt ons this is totally fine kafka connect kafka streams k-sequel db these are things that you'd use to get that kind of work done rather than build any of that functionality yourself and this is all again there are there are great free video courses and talks and blog posts and executable tutorials and all kinds of stuff you can go do to learn more about those that's not our goal today but it's important to know um this is the way you can go wrong since reading from topics is kind of what you do you want to make sure that if somebody else has solved a key infrastructure concern in that consume part of the life cycle that you go to the source and and do that all right finally number five if you think scaling is going to be trivial now again sometimes we over rotate on scale because bigger is more awesome and we think we need to make sure we can scale everything to the nth because wouldn't it be cool uh and that might that might force you to over design things see our first anti-pattern when you shouldn't even use event driven architecture to begin with but if you are you want to respect scaling you want to make things such that they will be able to scale and it's hard to do you have to ask questions as i said a minute ago like how much are you really going to do make sure that it's going to be sum does it need to be elastic or how elastic does it need to be do you have very spiky loads like retail at the beginning of the cloud era when s3 and then ec2 were launched like 2005 2006 the mythology that emerged right away was oh if you're an online retailer with ec2 at christmas time you can just spin up all these extra instances and pay for that compute and then in january you take them down you don't pay for that compute anymore i don't know if you were building applications in 2006 and if you were if they had anything like that kind of elastic scaling capability uh like 99 chance they didn't nobody was building applications that did that then that's actually really hard to do but you do have to think am i going to need to scale if so do i need to do this up and down kind of thing you have to ask those questions will you scale storage will you scale compute those are two separate things and you have to think about whether uh they'll scale so scaling compute looking at this diagram again um you need to understand how this is going to work now scaling on that consume side is rough we just looked at the state management problem and we said don't don't assume responsibility for that uh but even if you do you're doing this in kafka and tim said kafka streams was good because it handled scaling of state for you great uh well you still have this kafka streams application that you still have to be able to scale so you're gonna have you know kubernetes or something if that's not already a part of your life you're going to have to think about how to manage that um to do that scaling there's also in the kafka world this language called k-sql db that insulates you from a lot of that scaling problem now i've just got this one cluster of k-sequel db uh servers and i throw these queries at them and they do my stream processing for me and i scale that one cluster rather than bringing that into my application code so i keep my applications simpler and maybe just use that that simplistic consumer api and have all my interesting stream processing done in sql that's kind of a nice thing because scaling that you know looks like this how many how many nodes do i want in the cluster and there's a nice web ui for that i'll take 12 today okay there you go um now when it comes to scaling storage you have two choices you can make a bigger cluster or you can you can tier your storage all right now bigger is the the approach taken say by the sequoia trees kind of tree that lives in california uh they're massive right and that's successful uh sequoias are super successful but expensive sometimes that's what you want to do to scale storage but sometimes it's not um we can also take a hint from the way we scale storage in computers right so here is a picture of the z80 and the expanded version on the right there is the register file of the z80 this is from ken schriff's blog it's writeo.com r-i-g-h-t-o dot com so amazing check it out it's just pure fun but anyway he's got this picture this time we've got the register file right and then there's ram you know there's a couple levels of cache and there's ram and then there's a disk and if it's an old-fashioned disk it looks like this right so we have these tiers of storage uh in uh memory inside computers of very high cost very fast very low cost relatively slow um and in event driven architectures pretty much you're you're seeing that same thing emerge right like i showed a screenshot from confluent cloud if you use confluent cloud you have tiered storage you can put old stuff in in s3 if you want or azure blob storage or whatever it is uh if if uh uh open source kafka has a thing called kip 405 it's doing the same thing all right let's wrap up instead of thinking about how it can go wrong let's think about how we can get it right number one we'll use event driven architecture only when we should number two we will embrace data braces databases appropriately embraces should always be appropriate hugs are life but there are rules and so we want to use databases not be afraid of databases at all manage schemas you want to think about schema just because we're building systems based on what happens doesn't mean what is is no longer a question so manage schemas you want to use appropriate compute frameworks in your consumers don't do it all there's case equal there's there's kafka streams if you're in the kafka world those are important things and use managed services when you can you don't want to to take on the burden of operating complex and and sometimes persnickety distributed systems um you have to ask me later what persnickety means that's a very idiomatic american word we can talk about that later you don't want to maintain a persnickety distributed system if you don't have to so we looked at how things can go wrong this is how it can go right and i want to say thanks for being here and have a great conference [Applause]
Info
Channel: Devoxx Poland
Views: 101,324
Rating: undefined out of 5
Keywords: Devoxx, Devoxx Poland, DVX, Java, JVM, Programming, DevoxxPL2021, eda, event driven architecture, microservices, kafka
Id: A_mstzRGfIE
Channel Id: undefined
Length: 50min 53sec (3053 seconds)
Published: Thu May 26 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.