Building event-driven (Micro)Services with Apache Kafka by Guido Schmutz

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Applause] [Music] okay let's start welcome good afternoon welcome to my talk about building event-driven micro service with the cough-cough ecosystem my name is Gilles moats I'm working for Tavares for quite a long time 21 years I of course did a lot of things in this 21 years started with PL sequel and the Oracle database did some service oriented architecture work in Java on spring and in the last 5 years I did mostly a big data and fast data project with Kafka with Hadoop with stream processing and of course then micro service is also not far away so maybe quickly show your hands who was in the Kafka talk this morning ok maybe 80 70 80 % who knows about micro services everybody ok that's good so I will start with a short kind of history where do we come from so that's quite far far back almost 20 years and and then I will go to the micro service world I will tell you why maybe micro services with events is better than just doing it synchronously and then we will come to the chapter of Kafka I will I mean that talk is mostly or it's almost only about architecture and I will start on a vendor-neutral way to explain you why is it better and then at the end I will delve into the cough cough world because I think it's important always on an architectural level to think about architecture and not about products and products they can change architecture usually or they should be a bit more stable so where do you come from way back we we did maybe fat client applications on top or web applications rich UI application on the bottom and mostly when we did this when we implemented these systems they're all synchronous request response so we asked for something we until we got the response back HTTP works like that calling a database doing a select on the database works like that the shared database on the right is also quite important at the beginning we just had one database where we shared all the data and by sharing data through one database we don't have to care a lot about integration then we said world changes so we have more databases we have more systems we don't have them all on the control so we said let's do services so let's expose services or let's expose existing traditional functionality by services we just added soap contract first maybe web services contract first of course was to do good approach to do think about the contract first and then expose the functionality the way we did it mostly is we add the technical layers so we talked about maybe a data service a business entity service a business activity service maybe we had process services so they were like layered the organization was layered each layer had its own services its own contract and they were all reusable on each level of course we said process services should only call activity service activity Co should only call entity service but the responsibilities were on that service level or often also on that technical layer then we added esps service virtualization layers just to make it a bit more decoupled but important we still have this synchronous this icon here means synchronous I forgot to explain it before but I guess you have seen it so it's it was still mostly synchronous request response the service virtualization layer just added that layer that from a consumer you don't have to know exactly where the service where web service runs so we can decouple it a bit but still if the service boss is not there or the service virtualization layer and the service I want to talk to I have a problem it doesn't work because it's synchronous what are micro services I mean all of you know about micro service so I will do it rather quick more less the same system I showed you before with these during technical layering with a customer order product and stock in a micro service world of course you organize it differently organize it on on responsibilities on separate you do separation of concerns and so you will probably talk about in customer micro service an order micro service product micro service and so on and you have the applications the application they talk with the micro services now in a way the term micro service is compared to so ah it doesn't mean really that this service is smaller although it the name micro service suggests that but in a soy world often the services were very small because they only did a technical maybe a layer and make an abstraction of an existing logic ok if you count the logic behind also to the service but then still you don't talk about the whole responsibility of a customer you just talked about storing the data for a customer in the solo world so micro services title is code behind it clear interfaces so he we still have the interface highly decoupled is the ID idea and maybe it's so all done right that's not from me I heard that from other people as well but in a way especially the the abstraction especially owning my own data I think that that was always also the goal or it yeah the goal of so I was also like that but it was just not feasible or achievable because we had to kind of reuse logic which was already there and also we at that time there was not no concept of container there was no no sequel databases the databases we had at that time were very heavyweight so there was no no way to just have one Oracle database for each the soap service nowadays it's different so and micro service of course can own its own data and I think it's very important to say own or manage its own date it doesn't have to be its own infrastructure as well you still can have one single database or not one single but you can share databases between micro services in my point of view it's just important that you have your own schema if there is a concept of kima or in a Kassandra world you have your own key space of course if you have really your own infrastructure then you're very decoupled but from a decouple point of view I think it's even more important to think about what interfaces do I expose and what kind of technology I use or a kind of semantics and also how do the micro services talk to each other so I mean it's clear that from a consumer point of view here we have to do some we have questions for the micro service we want to know that the customer from the customers micro service you want to know about the status status of the order so we might ask the order micro service what is the status of my order but of course micro services they also need data from each other so the order micro service at some point maybe wants to know about the customer he only knows about the customer ID he needs details about the customer so another question here is does the order micro service have to ask the customer micro service as shown here in a synchronous way just give me the customer data whenever I need it at the point where I need it of course of course you can also add API gateways to it and then it becomes almost like the service virtualization we saw with saw but it's still synchronous so we still talk synchronous request response with an API gateway most of the time the problem of course with synchronous request response at the end is it only works if your partner you want to talk to is there if he's not there you have a problem and of course if service seven in this picture here is not there and service for once to in invoke it in a synchronous request response way and service for got called by survey three and serve three from service to and service one also invoked service two in a synchronous way then of course the problem you have very down on a very very low level just ripples back to the or ritual one and it isn't it doesn't just mean service seven crashed it could also mean service seven it's just overloaded and you get timeouts or it could mean a change of interface if 7:37 just suddenly changes his interface without communicating it before and maybe doesn't have a clear versioning concept then of course that is also an effect which ripples through and this is just not as loosely coupled as it could be of course HTTP itself the protocol is loosely coupled or has some loosely coupling inside it but it's not as loosely coupled as it maybe could be and of course that comes to or it comes to event-driven event-driven is not new I think event-driven is just new or is just fancy nowadays because now we have infrastructure which really supports event-driven and passing an event in a reliable and scalable fashion of course we were all we always past events inside an UI application another UI specialist but there you have tons of events button click events but it was just inside one single machine and then maybe we passed events also between applications on the same machine still and often these events were not persisted they were just in flight in memory but if you want to reliably talk about events or pass events then of course you need an infrastructure which supports that and we had infrastructures like JMS but the problem there is just I mean with JMS you can maybe send 200 and 300 messages per second and then your queue is kind of not full but it can't handle more and with Kafka we definitely talk about much much more loads we can we can pass through or with the concept of an event table to be if we want to stay product neutral so why not event-driven and of course event-driven is not the answer for everything that's very important because sometimes you need an answer if you want to know the the customer from your customer service then you have to ask the customer service it's kind of a query and you want to get back the the customer here with the sample you want to ask the order service for all orders which has been added today maybe and you get a list of orders back so this is a query kind of interface you hear here you don't probably want to think about events but it doesn't have to be decoupled because if you cannot reach the you want to have an end is an answer you want to have it immediately so if you can't reach the service there's no way - there's no help with events on a command level you have to think about because command level maybe you don't need an answer and of course here you see an answer I have a command order a product order an iPad and here the the the the semantics says I want to know true or false I want to know if it worked on the order micro-service and this is kind of a control way of or an imperative way of implementing my applications or communicating between systems between micro-services you could do the same thing with events you could just raise an event you can just publish an event of course to an infrastructure which handles these events to an infrastructure which reliably stores the events so it's not it doesn't get lost and of course this infrastructure has to be around all the time but then you're much loosely coupled because now you don't talk directly to the order service order an iPad and I'm waiting for an answer I'm just publishing an event and I hope or I suppose that somebody will handle that event and thus the work which is necessary but it's outside of my control so that's the whole idea of event-driven versus at the end request payment but as I said there is always the possibility or there's always the query interfaces which you can't or which will probably want to do in an event oriented fashion so if we change the picture then on the front end to the API gateway to the study to the applications we might still be synchronous but in the backend we're now asynchronous we're now publishing events so now the order micro-services doesn't have to ask the customer microservice anymore for the current picture of the customer the customer micro service will always publish a new event when a new customer is added or when a customer changes and everybody who is interested in changes of customers they can consume that information you know you might think and of course yeah that's that's true now if you publish an an event or if the customer in micro Service publishes in event then it's the it's the picture of that customer at that given second and you cannot just leave it there and consume it if you if you need to know about about a certain customer so the idea here is that the order micro service is constantly consuming these events customers changes events and stores his own representation of a customer everything he needs on his side because what I said micro service has he owns his own date or he owns its data so a micro service and all the micro service can also store customer data but maybe he only stores first name and last name maybe he stores it concatenated together because that's how he needs the information and everything else is not needed but now he has it on his side and he doesn't have to ask the customer micro service anymore and now he works even if the customer microservice is down or has a maintenance cycle so it's much more decoupled that's the idea of event-driven and of course if you have an event store here on the right side the yellow thing then of course this event store has to be reliable it should never fail so it has to be a cluster it has to be distributed it needs some kind of replication so this one is now much more important maybe than your traditional databases you had so far I mean they're still important but the the event store is maybe from a reliability perspective more important than your databases or than your micro services it's kind of the central nervous system as I mentioned it this morning now you could say what about streaming sources now what about like modern IOT flows or social media streams or click streams can I handle them as well another question is of course probably not in a synchronous way you probably don't want to get a tweet and you want to handle that tweet in a synchronous way and call your micro service in a synchronous way that won't work the question here is more it's not is it the micro service the question is more what is the technology I'm using I now move the event store from the right to the left or into the middle because now the event store is also a very good way to as a first kind of point of retrieving events so my all the streaming data I don't have stored yet I first store into the event store of course I need some kind of sensor logic which accepts data from Twitter or from a IOT device and this logic will store it into the event store and now these events they are available it's publish/subscribe so the the sensor logic publishes it to the event store and even store is publish/subscribe so everybody can subscribe on it so a micro service could be a subscriber if it's scales if if it can handle the the throughput the number of events then why not but the question here is more what kind of functionality do I need stream processing systems they provide you different functionality which you don't have in a normal like Java or Scala based implementation or programming system or programming language like windowing over stream like moving average over a window of time because you have a consistent flow of events coming in and maybe if you want to do a moving average of an certain amount and you want to do it over three minutes then a streaming streaming analytic system is much easier to implement that functionalities and I would say three processing you can treat it as a micro service as well because micro service is just an idea and micro service tells you you're free to choose the language the language you want to use for your micro service so a stream processing system is just another language another implementation choice for your micro service so in a on a picture I would even move these together or I would say this is all micro service in just different implementations we can choose from and what's important every micro service can consume and every micro service can also produce again into the event store especially in stream processing uuuugh derive some information you derive an information like it's like a moving average over three minutes and the result out of that calculation is again stored or published back into the event store so somebody else can consume that event can be a subscriber of that information and maybe do something may be doing something is calling a service like over here calling a traditional system because you now have found that some normally happened so you want to inform something or you want an evoke invoke a certain business process so something is happening this is what then I do speed processing or or macro service as such at some point based on an event of course can do as well now not everything is new usually unfortunately you don't have green fields all the time it's nice if you have but if you don't then what about integrating like is the applications in such a picture legacy applications they often come with a database they often come with a relational database they might be a fat client or an rich client the web application doesn't matter in that picture here because here what matters is the database and if you have a relational database you have the concept of CDC or change data capture it's very old but it's quite fashionable again because in the past CDC change data capture has been used to just replicate from one database to another maybe from Oracle to sequel server oracle to db2 and then you have two instances holding the same data now product vendors such as oracle with Golden Gate but also new Enders even in the open-source space division they create new functionality or they change their change data capture functionality so that it can also send the events into an event store so the same event store we have seen so far here can now get data from a traditional system antici its events because change data capture just sits on the redo lock or on the change log of each day each database has that and just gets all the insert update and delete commands you send on a table and on a low level that's an event it's an insert event update event delete event of course it's very fine granular if your customer review consists of a customer table or a person table and address table a contact table you could events for all these single tables but it's events and your you immediately know that the customer has been added in a very old system you can't really integrate or you can't you can't you can integrate it by just clearing the database and getting a full export but then you have to do it I don't know every five minutes and now you get it in a much more real-time fashion and you get it in a win to oriented fashion and you can treat it as an as an event like an event you get from an IOT device like an event you get from a micro service so at the end these are all I'm events and they can compete they can be consumed from a stream processing or from a micro service and yeah as I mentioned at the end you can just say this is all micro services up to you so in a picture we just add these traditional systems and now here you can see it's just tables or databases we don't care about the logic because we integrate on table level and we just get a constant consistent or constant change data capture stream of course the load is not so heavy usually because I mean in you know on these kind of systems maybe change doesn't happen every day on on a customer so the the stream of event is not the same as the stream of events you might get from from sensors but the interesting part is that now you get events on a central place and you can mix and match events from IOT devices with events from traditional systems which provide you maybe changes of status information of master data information still on an architectural level and then we go over to Kafka I will maybe fasten a bit you also have CQRS and event sourcing and I'll I'll just try to explain it here in my own kind of words because I think these words are these terms are probably well known but in my point of view not everybody understands the same behind it and I'll try I try to get different sources and merge them together into one picture maybe I'm correct maybe I'm wrong you can tell me after to talk of course I'm interested and so what I got and I of course you can also mix and you can mix and match this to two principles but command query responsibility segregation is also quite old or it's called CQRS in short the idea here is you separate right from read and by that you can also use different data stores for each and different data stores for right so for example you get from on the upper level you get the rights so from a UI application you get all your rights as may be commands and he store them in a write data store could be no sequel database which is optimized for his rights maybe there's a lot of rights you get and nobody asks for the data or not nobody have couldn't you wouldn't implement the system but you don't get a lot of reads you get a lot of rights in a time-series system where we get IOT data again temperature values you store it in a time series database but only if something happens if a customer thinks there's something wrong he reads the time series so it's a complete different to different pattern so now you store it in a write data store and then you use materialized views on the bottom by projection service and these materials views their now optimized for it you could use a different database and you have a query service which is used to create the data of course you can you can say this is not maybe not transactional this is eventually consistent because it takes a while until the projection service has done its work until it's available on the read side so you can write something and you cannot immediately read it back in that kind of scenarios but that's in that case you would have a use case where it's not important and of course it's it doesn't mean it's an hour later you can write that in a way that it's a few milliseconds up to maybe a few seconds until this is updated event sourcing on the other hand is just treating the data which arrives as events and store them in event store and an event store just stores the events in the same order as they arrive and then if you want to know the current cistern state of your application you need a replayer and the replayer queries the event store and derives the picture out of the event store so you could say I'm starting or have started four years ago and I have all the changes of a customer store in the event store and now the replayer what he has to do is he has to start four years ago and read everything and just basically to insert update update update update and derive the picture of the customer now you probably say this is not fast of course this is not fast and that's why event stores they might have the concept of snapshot so you can do a snapshot and the snapshot just means after maybe or one year ago I took a snapshot so I can use that as a starting point and then just do the work you can of course also publish everything which is has had it has been added to the and story you can publish it to other applications similar to what I've shown you before the combination of these two is events or frequency QRS means I have an event store and I'm using the event store as a as a source for the projection so now I'm creating materialized views and of course I'm creating these materialized views just to speed up so I don't read the replayer anymore the replay is kind of the projection service and I'm creating materialized views and if the event stored never forgets I can also create new materialized views much much later and I'm just reading from the beginning and then just kind of bootstrap my new materialized view from the beginning of the event source I'm reading everything all the events I'm interested in to store in the materialist view and and after a while that is ready and and I can keep it consuming because now I'm getting the updates which happening right now so bootstrapping and staying consistent is one single logic if you do it this way and then you have a query service which queries it so this is kind of the combination of the two you can already see it here but I would like to go drill down a little bit if you now work with an event store and state and state is the materialist view then you might want to store in the event store plus in state and if you do it in one logic if the logic has to handle both then you have a problem that these new systems they don't they're not transactional so there's no way to combine these two rights in one single transaction so if you if you do it this way on the left side what I wouldn't do i wouldn't suggest then you might have that the first right has has been successful but you can't do the second right and now we have a problem of course and you cannot roll back so what do you do or what you should do is you just use the event store as an inside your micro service as well so you everything you you you um you know about the world or about changes you store in event store and maybe inside your micro service you're a consumer of what you have produced because now it's decoupled the publisher only writes to the event store and somebody else or that the writer of the state consumes from event store he's just not a consumer of course you have many other consumers as well outside your micro service which can handle the same event as well and by that you don't have this double right problem and at the end this is just the picture of an event sourcing inside a micro service now you just have to the the the boundary around it with an event store event handlers projection handlers with date and query logic and now from a concept of macro service where I said every every macro service owns its own data and other people say every micro service owns its own database you could also we could have the same discussion on event store level has every micro service is own event store or do we have one central event store available for all the micro services and it's just the responsibility question of course if it's if it belongs to the micro serves the micro service is responsible for the event store if it if it's central then it's part of an infrastructure and if we talk about the central nervous system of course it has to be central because somebody has to just run that event store and has to make sure that it's always available i'm een central means that's the that's that the goal to have only one single one of course because of security because of maybe you also go to the cloud you end up having more than just one but you definitely don't have one per micro service or you don't have one per consumer producer pair before we go to Kafka just one more question what about historical data analytics so far we just said we can consume events we do something and we produce events but man sometimes you also want to do historical data analytics of course big data or bi or deep learning or machine learning it's all based on historical data or most of the time so what about this in such a picture I'm just adding something now to the to the architecture really hat or have the big data cluster and the big data cluster is just a consumer there's only one line he's a consumer of the event store if everything goes through the event store he can just consume he doesn't have to do consume everything but he can consume everything and then you have everything in the big data cluster as raw data or raw events or as events even the events which flow between micro-services and that can be interesting as well from an analytics perspective you don't have to think about loading your data warehouse or big data cluster anymore in various fashions because you also want to know about what the micro service is date with another macro service there's just one single way of doing it through the event store okay so that leaves me maybe 15 minutes to do the Kafka as I mentioned it was for me it was important to talk about architecture first and everything I showed you or most of everything I showed you is is supported by Kafka of course because Kafka is one way to implement an event store or I would now say it's an event hub people today use the term event hub at the end it's just a term and but Kafka is not the only way of doing it so you have other options as well and what I talked so far would be valid as well and I hope that was helpful but now with Kafka you've been most of you have been in the talk this morning so I can I can skip that more or less Kafka is just an cluster with multiple brokers it has publish/subscribe I think what's important is that when you consume and that's also a huge change compared to traditional message broker so like JMS when you consume then you as a consumer have to know the offset and the offset is nothing else than the pointer into the lock so the consumer has to tell Kafka the last position until he read the last time so it's kind of a pointer so far I have read and now give me the rest or give me the next messages so by that you can have different consumers different consumer rates so in big data consumer might only consume every 15 minutes and might consume a million of me of events because these are the events which have been queued up in this 50 minutes and you have another consumer real time consumer or near real time consumer who wants to know always the latest information so he consumes always from kind of this empty one so he just waits for the next message to arrive and Kafka can handle both Kafka stores the data in own files but Kafka is clever enough that it doesn't greet from a file if you're in real time consumer he uses file buffer cache because the data which gets written to this of course is still in the file buffer cache if you're in real time consumer if you come 15 minutes later as a batch consumer of course the file buffer cache is it's not doesn't hold everything anymore and then Kafka has to go to disk and read the data but you're a batch consumer it doesn't matter in that case that is interesting and that concept also allows you to always go back as far as you have data cough can never delete automatically or Kafka doesn't delete based on a consumer pattern Kafka doesn't know who are the consumers Kafka doesn't know who has consumed until which point and it could know it but it doesn't care about it and if it doesn't keep because it doesn't care about Kafka can also not automatically delete when the information has been consumed by everybody so Kafka has a concept of data retention based on size or time and you can also say I never delete if you want and if you never delete you just get an event lock or your event store which you started four years ago and you still have all the events in that lock and you can always go back to the point four years ago and start from offset zero and reread everything again I'll come back to that Kafka is more than just a broker or more than just that storage Kafka has of course consumers and producers you can write them in most of the languages you probably want to use today or go Java C sharp and so on there is also a concept of Kafka Connect and the green is cough Connect so whenever you you come from outside into the Kafka world or you go from Kafka to the outside of the world in the Kafka ecosystem it's Kafka connected doing that of course there are other options outside of Kafka which can support you here as well such as bring data flow or knife eye or stream sets and this is just the answer of Kafka what's very important is the schema registry in my point of view the schema registry allows you to to communicate between a producer and a consumer the data you pass around and of course the stream processing would also retrieve the schemas from the schema registry and these are overall schemas and overall is just the idea is to have a schema which defines what is a valid event so consumer and producer they know what to expect and so it also allows it to be backward compatible so you can have new versions of your schema and a consumer doesn't have to change immediately you can just still consume with an old kheema and as long as it's compatible the producer already produces with the new one and that's quite important these systems of course especially if you talk about communicating over microservice boundaries through Kafka the data retention I already mentioned so it can be never time-based in size based and there's an extra one I think I didn't talked about that this morning it's called log compaction based and it's quite interesting log compaction based is based on the key and I think that also hasn't been mentioned today and message you put into Kafka always is key value so we have a key and you have a value the values the message or event the key is optional you can use the key to use log compaction based retention or ya storage what does it mean in in these systems you have to kind of to two different events or two different streams one is an event stream where we just get all these single events now here is on let's say person level so pater noster he got added on the Ted October of 2015 and then he got changed in May 2017 and then again in 2017 so we have one insert and two updates for person 11 person 12 only had an insert and this is the event view but maybe you're only interested in the latest state is the latest representation of your customer so now reading through a whole history of events and just applying insert update update update all the time to get to the latest view is of course takes a lot of time and a lot of resources so you can tell Kafka to handle it as a state stream or in a as a changelog stream meaning that Kafka only stores the last event based on the key and you would use the customer ID so 11 12 21 you would use as a key and Kafka does that all all the time behind the scenes scene it's like an it's it's an unprocessed be he behind which is constantly happening and now you have an an compacted topic and that is that topic is is smaller and of course if you start from the beginning you have to read less data to get a complete picture of your customers of course there's only you happen this only works if your change always contains all the data if your change only contains the the address changed and of course it would not work so that's something of course you have to control and you can either do it in in the left or right way you could say the producer has to a compaction is is a it's a property on topic level and maybe you want to have both so maybe you want to have the event stream view so somebody can also get a historical view of your events and you also want to have the right side view and now if you need that you have to have two topics in Kafka one is the full view and one is the compacted view you can tell the producer to create this view or you could say the producer is always sending you the full picture just the events so the producer doesn't have to care about it and then you just have an internal consumer producer which is consumed from the topic from the full topic and writes it to the compact the topic or the topic which gets compacted and now you have two topics and the consumer a consumer who needs that information can choose from which one to consume and a consumer can be change data capture and such an compacted topic is interesting for example yeah for customer information or now in that case here it's driver information so we have posts post data position data from from buses driving around and of course usually don't get the driver name with that information the driver is stored in a metadata storage system you can get it maybe in in post press you can get it through change data capture and now you have a driver topic and the driver topic would be compacted because you're only interested in the last or in the current information and you could use that int or you could use that information for example to story the elasticsearch because you're your responsibility in that microservice is just to provide a new better way to search for drivers and you need elastic search because the full text search and the interesting the interesting topic here is that you can also add such a micro service much much later you could you could come up with that in that idea today and if you have a compacted topic with all these drivers which is sits there since four years you can just implement your micro service and you bootstrap your micro service from that compacted topic you've just started off set zero you read everything which is available for your drivers just the latest information and when you're at the end you're basically on the current change data stream your elasticsearch is populated and now you're only getting the changes now if a driver changes immediately today you will get an update and you update that in elasticsearch so your initial load and changing up-to-date is the same and that simplifies it quite a lot because if you have to get an issue load from that system and then synchronize it with the updates you get in order to not lose any single update it's not rocket science science but it you should better test it before you run it because things can go wrong I mean here things can go wrong as well but if it's the same logic then it's much easier to test it of course if we add stream processing to the picture and now this is here is just the idea to show you what Kafka streams can provide and we learned this morning that there's also keas case equal abstraction on top of it and the idea here is just to say this is just a choice for implementing a micro service and I'll just like to show you what what is what is possible with Kafka streams or with stream processing languages or or frameworks in a stream processing you always have a stream of course otherwise stream processing doesn't make sense so you have a stream of let's say boss position topic so these are the bosses driving around constantly telling you about the Geo position and you have a driver topic and the driver topic is just you get it from the from this post press sequel database change data change data capture so this is master data information and this is the stream and you want to bring that together in order to enrich your stream your stream path position only contains the Geo data and the driver ID and you enrich it by a join and this is basically what you have in the stream processing you can join tables state and streams and you create a new topic so you create new information in another topic which is now the enriched event containing of drivers and boss positions you can also aggregate that's what I meant with windowing so you could say I wanna know I don't know boss position engine metrics and I want to do some aggregation time window over a certain time you do the join as well between the topics first and then you aggregate and what kafka streams does quite well is he has the concept of local results doors because an aggregation is always state and state you don't want to lose especially if your window is a certain has a certain length if your window is you know an hour then you have an hour of state if you lose that state if it's only in memory and you lose it then of course there's no way to to rebuild it or the only way to rebuild it would be to reread an hour of data but if you constantly getting new data in you might want to have that states arrive and that's what Kafka does with local states in a rock DB database and all the changes to that rocks DB database are again written into Kafka as a change lock so if that process here dies and you restore it then that rocks DB can be populated based on the change data stream stream it's just another topic used for internal usage and this is quite clever of course I cannot go into details over time and now we're almost at the end this is second last slide that result store so it does is no just it's the same picture we just add an API and an application such an result store is also queryable flink you can do that as well and other languages probably or other stream processing frameworks will follow what is the idea here you have a result store a view of over a window of time of a stream of a certain calculation and you would like to show that on the dashboard the dashboard should just constantly change and just reflects the latest information and with an application you can query that so we don't have to store it in another database in a Cassandra or elasticsearch in order to be able to get that information you can directly create out of memory of course you only have to picture the latest picture it will constantly change there's no history of it but if your dashboard should only show the current information then of course that's what you want and it's much simpler to achieve this also works in a distributed fashion over a cluster so summary I basically just end with the architecture slide and the only difference here is that this here has been moved together you have an event table you have different processing units kind of consuming events producing events whereas maybe that the top one the batch data system only consumes and does something maybe in some cases you would also publish back to the event table more as a signal that he has done his work but the real information just flows between stream processing microservices or you could also combine that together and the idea to combine that into one block is just the whole containerization because we container of course you can do everything on one single infrastructure you could even add the event table also to that same infrastructure because an event table can also run on a containerized kubernetes mis mesos kind of platform ok that was it now maybe you have one minutes left for a question but I'm also here if there's more questions just come to volume thank you I have a very simple question maybe to be the dumb question but are these these topics are they schema-less or do they have a schema it's not a dumb question it's a very important one nobody has mentioned it today if they don't have a schema so it's it's it's kind of scheming read so it's only a binder it's a bytes cream and you as a producer define what you write - so you can just write Jason to it you can just write binary information to it and of course the consumer has to know what you have written to and by using a schema such as overall this is outside and what you just you basically use overall schema just to serialize it in a way that consumer and producer can work through the schema together but it's not stored with the message - schema schema less is Kafka always persisting the events prior to making them available to consumers yes it's always persisted you have a choice of telling you if you're interested if you has done it or not as a producer you can produce and just tell Kafka I don't care so if something fails in the in the storage then and if these are edge cases of course but then you don't know it and you lose messages you can say cough you can tell Kafka by when producing I want to wait for the kind of first replicar for the master for the leader partition and you can also tell on the proton the producers level I want to wait for alright because and of course the more you wait the longer it takes for you as a producer so but that's throughput where those guarantees and you have that choice but it's always stored ok so thank you and if you have questions I'm I'm here thank you [Applause]
Info
Channel: Devoxx
Views: 106,681
Rating: 4.9012628 out of 5
Keywords: vdz18
Id: IR1NLfaq7PU
Channel Id: undefined
Length: 50min 54sec (3054 seconds)
Published: Mon Mar 12 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.