Building microservices with event sourcing and CQRS

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
just a little disclaimer here some of you might wonder why there is micro bloat standing there and not Chris Richardson Chris had to cancel his talk because he had to call in sick and Peter and Jay the organizers of the conference came up to me and asked me if I couldn't step in with him on the same / similar topic and since I do quite a few talks on event sourcing and CQRS I already had a slide deck and I'm filling in Chris now please bear with me the slight Eck isn't 100% compliant with the slide masters that you usually see at the spring one conference that's the reason for that is the limited time but I think the content is more important than any template and yeah let's kick off with the topic and when let's start off with a a review of a a classical I would call it old-school entir software architecture what we see here is something everybody of you has learned in university everybody has seen that throughout her or his career many many times and there shouldn't be a lot of surprises to to anybody in this room right there so usually we have some client we have some soap arrest endpoints with some business logic and some data access and with these things come also different kinds of models and the carries characteristics of this kind of architecture might seem I would say extremely obvious at a first glance so the first characteristic is we read and write data through the same stack of architectural elements so we read and write basically through the same figures like we go through the business service down to the data access logic the repositories and whatnot and the same is happening for the right part there the second characteristic is we use the same data model for business logic processing for reading data and for writing data might seem very obvious and a no-brainer to add a first look to the topic but as we will see later we might find out that there's scenarios where this might not be the best choice that you can have the third characteristic is the deployment unit usually I would say we at least deploy one war or ear file to some Tomcat application server or we have a spring application that contains the the web part and some business logic and some persistence logic and then we of course have the same we run down on the same data store and if we change data we usually go ahead especially in well what means usually especially with relational storages we change data sets directly so if I update this row with ID 1 right there to another value the data set gets updated directly of course the data base in its in its core or in its basic runtime hold something like a transaction log or something like that but basically we as from the viewpoint of an application the old value is gone after the transaction has committed what we could easily say we've done that all the time throughout throughout history of IT we have been very successful and very productive with this kind of approach yes that's true and there are many many applications that work really well with this approach and not just applications but also teams that are used to that approach and that can write decent and good software this with this approach please don't get the impression that I'm going to tell you in this talk that everything we've done in the past is is crap that's not the case we had very good reasons for doing things the the way we've been doing them but there might be some drawbacks to that kind of architecture that might in I would say quite a few cases be more suitable for an alternative solution let's take a look at these drawbacks one is basically the data model that we have is a compromise because we compromise on business logic processing reading and writing data let's take the reading stuff as an example many applications rely on database views those may might be also be persistent views in order to improve read performance against the data storage so they actually have a soidier model for the read side already but in the end we use the same domain models we use the same storage for reading and writing this data and in this setting we can't scale read and write independently because we deploy these things together on the same software architecture stack so if you have an application that has what's let's say a 90 percent read part in terms of flu pool calls CPU time memory usage or whatnot and a 10% write part we couldn't go ahead and scale out the read part on a significant scale and have a a rather scaled down right part if we want to scale our application with this kind of approach we would have to do it on the whole stack so the deployment unit contains read and write code both of them run through the same layers of data and we have to do we have to scale up both parts in parallel even if it's unnecessary another thing that many applications solve on their own through additional functionality through additional libraries and so on through triggers in the database is that within the essence this approach doesn't contain any historic data when I perform an update on a relational table I just can read the new value the old value is gone from the few point of the application so I can't just go ahead and roll back my application to a state where I say please give me the state data state of lasts triple witching Friday in the banking business for instance or give me the data state of tomorrow 6 p.m. or something like that I can't do that and if I want to do that in this approach I have to take care of it I have to write code that gives me these features and these opportunities and this approach tends to create huge monolith I mean obviously we deploy these things together we have very coarse-grained deployment units and we try to merge things together as much as possible and we usually we start off with a let's say a tiny nice example project the typical spring boot Kickstarter and after let's say 2 or 3 years of project craziness we end up with a monolithic messy usually in these kinds of architectures unless you you really take architect a lot of architectural gap and one approach that addresses these kinds of challenges is the approach of event sourcing event sourcing is actually nothing new and it isn't something that I have invented to be honest and far from from that I'm just talking about it event sourcing is a a pattern architectural pattern that rep that makes that represents the state of the application as a series of events so we just don't have like one state stored in a relational storage usually we have the state being computed by certain events and this might sound awkward in in a first precision or when you take a first look at this and a lot of questions are asked how about performance how do we get how can we query this event we will all address these things but take it let's take a little break and really die a think into many business domains how they are structured from a really business domain perspective for instance a classic example is tracking of letters and packages being sent through some logistics provider I mean what's the lifecycle of this package being sent it has been ordered for a pickup at some location and it got picked up by the driver of the logistics company it has been sent to some sort facility it has been sorted it has been detached to another location it has been added to another sort facility it has been put on another car and it has been delivered to the customer and of course we could go ahead and model this in the classic architectural approach but if you really think deep into the business domain and in the nature of this business domain these are actually all events these are things that happen in the past and so this model might be ways more natural to many business domains if you if you really clearly think about the the business domain itself and the typical building blocks of a software architecture or an application that relies on event sourcing are we have some applications or one application and then we have an event queue and we put events on this event queue this event queue stores the events in a event storage and then we have some event handler logic that is it's doing stuff with these events it might validate some events it might process these events further and so on and so forth and the sequence of events that is being passed through the event queue to the event storage is also called the event stream so we have a constant stream of events from our application into these I would say event sourcing building blocks here and what is let's take an easy example at the example is an incident management system we have an incident created event so our application the user clicks and the bug tracker create new incident or in some other incident management system and says hey my mouse is broken it doesn't work anymore so we could model that as an incident created event so there has an incident and has been created in the classical architecture this would usually be an insert into incident with values above above up and so on after that we change the text of the incident so it is allowed to alter for instance the description of an incident whereas it wouldn't be allowed to alter the reporting user for instance so we fire off an incident text change event and say ok it's it's not it's especially the left button of the mouse that had been broken and then we resolve the incident or close the incident by a solution that the mouse has been replaced to the user and this is a typical a very very easy example of an event stream that is happening if we would model that in the classical architecture we would have an insert statement for the incident created event then we would have some update SQL statement that updates the text and then another update statement that changes the status and adds some solution to to the incident for instance easily speaking what it's very important here is that an event is something that happened in the past it is not something that will happen in the future and that is very important here because when we think about naming off of these events and it is a common best practice in in event sourcing blogs literature and other talks on event sourcing that the events are something that happened in the past so we formulate the names of our events in past tense and we the events the names of the events should be very expressive and I am personally a huge fan of of the domain driven design book and the principles behind domain driven design and you will see these letters DDD that stand for domain driven design various times throughout this talk because in my eyes the names of events must be a part of the ubiquitous language of the project so you should take great care about how you want to name your events some good examples is for instance a shipment delivery event another good name is custom a verified event or cart check-out event if we talk about an online shop for instance so if we take a look at these events names we instantly kind of know what has happened here and you see we have delivered verified checked out so past tense bad names for events would be for instance a create customer event create customer is a command style syntax and the command usually process that processes the data and then issues the event so after the the the I would say the business logic has been processed the customer has been created another bad example is will save item event why will you save the item haven't you save the item is there something that's still happening until some item has been saved and the worst case of all is the do stuff event I mean we can build a complete system with just this event and I mean nobody knows what's going to happen there is the event going to order beer for me or is the event going to to ship something to a customer or whatnot so please take great care in how you name these events let's take a look at a very easy code example how such an event could look like we could have an event ID which is usually modeled as a due ID or something like that we can we must have a event date when the event has happened in order to recreate a sequence of events obviously and this date should be on a timestamp basis with the most precise timestamp that we have available in our programming language or in our system and then we for instance add a customer number and some comments some I would say business payload to the event and a very interesting question is how fine-grained we should model our events because unusual systems with a usual complexity we find ourselves having a lot and many many many entities so let's say four hundred five hundred entities what we really want to go ahead and have events that are grained on an entity level what has happened to entity in a business domain I wouldn't do that because usually when when you take a closer look these events do have some I would say some crud type of character to them most of the time and that is also again a domain driven design pattern events are usually happening on the basis of aggregate aggregates in domain driven design are a a collection of I would say entities that are bounded to a specific business context and you usually have one root or you must have one root entity and this root entity is responsible for the life cycle of the whole object graph under it for instance a credit application I apply for a mortgage loan for instance I want to buy a house so I have the mortgage loan application as the root entity and behind that I have for instance some income information how much money I earn from which sources I get my money how much money I spent on for instance other credits or how many credit cards do I have and do I still have to pay off rent or another house and so on and these other entities or these are the value objects they are irrelevant if we want to remove the mortgage location because they heavily depend on that application and so you can actually go ahead and wrap these entities up into a aggregate which is a logical construct you might say and apply the business logic on this part of the system and but this usually gives us is I would say rich and very expressive event model system that we have because we are not too fine-grained we have some higher cloth abstraction and this abstraction is usually bounded to a specific business context for instance mortgage loan application or customer management and and whatnot it is very important that you always treat these events immutable so after you have fired off an event you do not change the event never the only thing where I would say okay you might change something is if you add I would say some infrastructure processing data so when has the event being processed by the event queue or something like that but you never change any business data and basically I would go ahead with the rule you'll never change data and this also applies to the division of events we never delete an event from the event store never a delete operation is just another event here so we create the incident we change the incident with the text and after that we remove the incident or we mark the incident as removed however there may be one exception where you you will be forced to deleting data and that is when regulatory requirements step in I'm based out of Germany and Germany has very very strict data protection rules so for instance and if facebook would be based in germany facebook would be illegal and they basically go to jail according to german laws what they do with the data and that applies to twitter and whatnot and so there might be the case where you really have to delete customer data out of this event i've been working on an event sourced system and we discussed that requirement very heavily and we basically discussed two options do we delete the events for the customer data or do we just anonymize the customer data we decided for the anonymization of the customer data so we had the events where we just blanked our first name last name addresses and so on so that after a certain time you could see this data anymore and we were compliant with the law but law law or regulatory compliance is the only reason where you might consider playing around with the events in the event store after they've been stored they're usually this event bus or the event queue is implemented by a message broker architecture and most of the time what what I see in many projects and I'm working as a consultant for software architectures and stuff like that and what I see in a lot of projects is that they start off with an existing system I mean who starts off greenfield projects these happen but they are a rarity in our industry usually we have to deal with some kind of legacy code and something so for instance we've been playing around with service-oriented architectures for ages we bought a high priced enterprise service bus so our manager might come ahead and say hey that's awesome let's reuse the Enterprise Service bus from the failed SOA project for that it has a ton of features no don't do that I really don't do that and hell no don't do that because what I would actually prefer our dump pipes and smart endpoints as a suitable event sourcing architecture so just use lightweight message brokers there's great things activemq rabbit and pew and so on these are lightweight message brokers and they they are built for scale they are built for easy integration and easy operation don't use these monstrous enterprise service bus is that that lock you in into infrastructure that locks you into let's say some Enterprise tooling which has the label it isn't meant to be fun it has to hurt in order to work don't do that I'm not dissing specific companies here but I think you got me the advantage when we follow this principle of event sourcing is that we instantly have the ability to rebuild certain states of the application we get that for free of obviously there is no currently event sourcing to sweet and something like that so event sourcing the same goes out to micro services and so on is no free lunch you have to do a little bit more and a little bit more of thinking and a little bit more of ramping up a working infrastructure than just going ahead and deploy a tomcat server and put your war files in there you have to do a little more but after you've done that you can the ability to to rebuild a certain set of data to a certain specific time instantly think about a bigger project in enterprises how often have you been in meetings that discuss which data set do we need for the business test now on which environment is that data how can we load it over there if we have events or systems we just go ahead and say okay we want the end of the first quarter data so we just do a rebuild of the data of all the events and we're there we have it and you can also go ahead and have temporal queries that means I think very ver or for instance how the customer aggregate has looked like on February the 15th in 2014 at 12 o'clock if I want to do that that is very interesting for meeting audit requirements that's something that your your data audit loves when you say okay no problem we have the event store we can tell you exactly what has happened when with this piece of data and you can have a event replay which is very interesting for various reasons and the most interesting thing is actually I'm debugging or finding really nasty business logic bugs because you can go ahead and rebuild your data set to a specific time before some reported bug by your user has happened and after that you can replay event by event and see where the bug has happened and then you can go ahead and step into a specific piece of code in the classical architecture we will be stuck with a relational storage which is other data set that is after the bug has happened for instance and we now have a hard time to replay the game and the business logic in our minds with an event source architecture we go ahead and replay the data set until the bug has happened and then replay event by event that is very convenient well-known examples for event source architectures are for instance version control systems so if you take a look at subversion or get and so on you instantly see this event character because you can have a replay on commit by commit basis to a git repository or subversion or any other version control system that you use and in the relational world we also have this characteristic in database transaction logs if you look at how a relational database works in its internals you will see that the transaction log consists of a series of events in most relational storage um so what is very obvious is that the event store has a very high business value not just because of the features that I just mentioned hmm but also and because when you get a deeper insight in your business domain and you get you discover new analytics on your business domain on the behavior of your business domain you can apply these analytics to all the data imagine you guys having a online store and you play around with some AdWords campaigns you do advertising on Instagram now and on Twitter and on Facebook and on Google and whatnot and you get a insight on the analytics on how users behave based on these ads with a classical architecture you can just apply these analytics to the current state of the application now if you get new insights on how these analytics might work you can apply them to the past as well and verify your assumptions so this is a very very high value but of course we might ask if we just stick to this model that I've just displayed right now don't we run into performance problems if we for instance do temporary queries against the events store yes of course I mean the performance of a system that I have displayed until now will be horrible because we have to for every query so far replay every event in order to get the current state of the application which is I mean without a doubt a very bad idea but the idea goes a little bit further because what we can do is we can derive a application state for instance the current state of the application from the event storage through some event handler logic but we can also work with two application states for instance we can work with an application of state of the previous day of the previous week end of today and we could query both application States because we can easily derive these application states based on the event storage and a way to to deal with these things to create this I would say pre-processed application state is the CQRS pattern CQRS stands for command query responsibility separation what does this mean here we're now talking about taking a closer look at the read and the write characteristics of our application some of you might have wondered why I meant why I stressed out this read and write behavior of the the classical architectural approach to applications in the beginning in the intro of my talk the reason is exactly this because commands in in the context of the secure s pattern are write operations yes I'll come to that later and the question was if the event store can be an already BMS I'll address these things later on I'll just work on completing the the overall picture and then later on I will go ahead and also discuss some challenges that arise from these kinds of applications and one of these things is obviously going to be a discussion what technologies are sufficient or good choices or transactions and stuff like that I'll come to that I would say within 20 to 30 minutes or something like that and commands are write operations and queries are obviously read operation so there's no big surprise to that and responsibilities separations is that we go ahead and separate our read and our write operations and if we now go ahead and revive our classical old-school architecture and apply a secure estimate to it the first step is rather easy we take an ace and split the application in half in a first step so we split up the query part and the right part of our application and that is also if you have an existing application and you if you have a certain business domain that might be suitable for this approach this would also be your first refactoring step two words I would say event sourcing and CQRS you just separate the the logics that read stuff and the right stuff and you obviously will have to rewrite some code for queries on the right part for instance for business validation cross field validations and stuff like that so the basic idea is really simple if we take a look at a easy code example written in Java we have a typical let's say service interface with a safe incident a update incident method a retrieve by severity method and a retrieve by ID method the first step towards a CQRS off' occasion of this interface is splitting it up into a query service and command service which then have the read operations and the write operations and the issue let me go back to this slide here the issue that we still have is both of these in this current state that I just displayed is we're running against the same data storage basically so we're and the idea behind this slide is also we are working with the same model right there so so far in this current state we don't get a lot of advantages and I would say this this current state of the architecture of the refactoring process is I would say quite insufficient because we just add complexity without gaining substantial advantages so we shouldn't end there because what a I would say really good combination of CQRS and event sourcing looks like is a system that looks like this what we do here is we go ahead and let's start with the command part on the right side from your viewing direction and this command part is going to issue events and here comes the event sourcing idea into play we issue events to an event store from the command part from some data access logic or you can also fire off commands directly to the event store or fire of commands that fire of events to the direct store to the event store and on the event store on this cue we also have a event handler listening to it I was what this event handler then does is it is able to create and to derive the application state into a separated read storage and the Reid storage can be for instance a relational database but usually and in most applications you would have you wouldn't have a relational storage for the Reid store you would for instance work with some in-memory data grids for instance like terracotta or hazel caste or something or if you if you have very graph a very graph oriented business domain you might for instance choose neo4j as your read storage because it fits the nature of your queries very well and the read part in this architecture always queries the read storage so we never query the event store from the read Park here and a substantial advantage that we get there is we don't have to reuse the model the event model from the right part in in the read storage we can have a highly optimized optimized structure that suits our queries very as well and that makes the queries go really fast on the read storage for instance I I've worked on a project where we had for instance for one ever aggregate 15 representations in the read storage that were highly optimized for certain queries because these would be queries that would be very complex very long-running the crunch a lot of data so we already pre aggregated data for these queries in the read storage and what we get from there is we can now individually scale and deploy these things and this is the the part where for instance and microservice architecture might come into play so we might have read archetype microservices and write micro services in this architectural stack and we could scale these out totally independently if we have a lot of reach - do we scale out the REIT microservices on a larger scale than the right micro-services or if he for instance go ahead and and know that due to some end of the day processing in our business domain we need to have more I would say right capability in our system because at the end of the day processing we write more data than during the day we scale out the the right instances in let's say Cloud Foundry or AWS or something for two hours and after that we scale them down again so we have a very individual and a very powerful means to scale and deploy our application with this in addition to that we have a high degree of freedom of choice depending on our technologies we could for instance go ahead and have a let's say if we have a very complex business domain on the right part we could go ahead and write that with Scala or closure or whatnot whereas on the read part we usually fire up some some queries so we go ahead and do that with greatest for instance for with groovy or with playing Java or something like that I but don't get me wrong I don't encourage a I always call it technology soon in organizations make wise decisions which technologies you want to use and give a I would say a thought-out toolset that on the one hand gives a certain freedom of choice but on the other hand also has a I would say management of technologies that we have because if we don't limit these things and in the beginning we end up having let's say 50 technologies after two years and obviously I know take us developers we love working with different technologies it's great but forum a let's say CTO CIO perspective it's going to be a pain in the long run so give some freedom of choice give some some options and but think these out very well and this kind of architecture also fits very well into the idea of the bounded context from the main turbine design because we can go ahead and define business contexts for these micro services these contexts can be bound to on an aggregate level but a bounded context might also include include four or five data aggregate and this fits very well and if you take a close look at best practices in micro service granularity I mean there is a lot of discussion going on in the community I mean there's the idea of nano services then there's people saying a micro service shouldn't have more than 200 lines of code I frankly disagree with that to be honest my personal opinion I think you should bind and that's something you always read in many sources and you hear in many discussions bind your micro service granularity on the bounded context of your business domain always let the business domain drive your software architecture not technology so these things event sourcing secure s and I would say micro services in the broader field are very interesting architectural options but there was the one question that that asked what about the our DBMS can we have the early BMS as a event store could we use an our DBMS as a read store as well what about caching what about consistency in this architecture we haven't discussed consistency at all so far what about validation where do we validate data I mean the question about the validation where we validate is quite easy when we when we can when we validate data that is in our egg Gate where we built a repent the events upon because we usually have the whole data available at the command level but you also might have validations that include other aggregates that might include even other business domains so there's some complex validation logic out there and what about parallel updates parallel processing of events and so on and so forth and I will go yes okay the question was that okay I will go into that right now before we dig into the other stuff the question was how about the representational language of events you have basically any freedom of choice event can be a serialized java object in the in the worse case I would say I usually tend to represent my events as JSON strings or XML stuff so data formats that can be easily processed and stored by various storages but you can also represent your event for instance as a a table in a relational storage so you have a lot of freedom of choice but I personally tend towards representations that are very flexible in terms of technologies and logic that consumes them so my default choice of representation is usually Jason I have to say yes yes the the question is is there a chance of redundancy read and write that is exactly point one what I'll discuss right now so in terms of consistency um yes the answer to your question is yes systems based on CQRS an event sourcing are mostly eventually consistent hmm I would say I would actually go ahead go this far and say nearly any distributed system in the world is to a certain degree eventually consistent so there there is hardly a full consistency in in distributed systems and and the eventual consistency it can happen for instance storing to the event store so if you if you do an asynchronous store to the event store you might be able to open a transaction against the Q and the Q might give you some sort of degree of guarantee that you get stored into the events or but if you just go ahead and do a fire-and-forget you don't have any guarantee that if the event ends up in the event sir so there is a consistency risk on this end there might also be a consistency risk on the event handlers end for instance if we if we don't update the read storage immediately if we go ahead and say okay we do these processing on of the events against the read store every minute or every five minutes or something like that so you might run into a risk right there as well or you might run into a risk with updates against the read storage how transactional are these updates are these updates actually a part of the issuing of the event against the event store usually not so you have a consistency risk as well and depending on the technology that you use on the read store you might again run into consistency issues here for instance if you use a highly distributed in memory data grid well what tells you how consistent the updates to other nodes are and if you read the blog call me maybe has anybody ever heard about that blog yeah some if this is a very very interesting blog that sometimes makes you shrug when you read that depending technologies that you use this blog especially deals with consistency and replication partition status and so on and there is in terms of consistency this EAP theorem that says you can even have a consistency availability and partition tolerance but you can only have two of them so you exactly have to choose which way you want so if you want if you favor availability of obviously you want to have some sort of partition tolerance so the choice is do you want availability or consistency if you favor consistency you might go ahead and do some adjustments to that system that Drive you to a very consistent approach if you favor availability you have a lot of freedoms of choice how much you want to limit your consistent guarantee in the system and that is something and and the the next slide might be a little bit shocking or something like that - I would say CQRS and event sourcing purists if you really want to and I would then if you really want that consider if event sourcing is a suitable architectural choice versus you could go ahead and build a while fully is a little bit exaggerated a very consistent system which follows event sourcing principles and there comes your relational storage in place we could go ahead and place the event store and the read storage in a relational database on the same note and we could add the updates and the inserts to the event storage and the research in one transaction and there we would have the idea of the event system with the events on the event storage the idea of the read storage system and with a I would say very high transaction I guarantee but this system scale very well no it won't and that's the point that I want to stress this this approach here with the like very good I should actually call that slice very good consistency this system here will be quite consistent but will it be a very good option for scaling out and for availability no it won't but you can also you should consider your business domain what level of consistency you really need don't do eventual consistency just because you read it on Twitter or in some blogs around the corner like every five minutes nowadays no inspect your business domain and the business requirements there are regulatory requirements that you have and also the business value of failed transactions do they really hurt you and how much money costs that to my organization in order to derive a suitable level of consistency for your business domain and what can be done for instance is let you go ahead and do the insert into the event store extremely transactional so that you add a lot of that you add a transaction to this insert then you know okay this part is I would say quite consistent here but on the read storage okay we do this update asynchronously and we can live with data that is let's say deprecated for five to ten to twenty minutes it depends on the data that you for instance a classic example - this is the country list I mean I would say it is a very rare vocation that a country changes its name that a new country arises or that a country disappears so if if we update for instance these kinds of data like every six hours which is an eternity in terms of application server and application lifecycle we're totally fine off but can we live with updates on on the read storage with a let's say five minute guarantee on a stock trading system probably not that might be quite difficult actually so what I want to say is there is no standard solution to that topic you really have to inspect your business domain and you really should be aware of the options that you have going back to technology choices yes you can use the relational storage is it a perfect choice for storing events mm-hmm I don't think so if it is it a good way for the read storage maybe but there might be other options like a graph database or a in-memory data grid with distributed maps that the the queries are basically key lookups to maps for instance it might be a choice you have to inspect your business domain and let your business domain drive these decisions not the fanciness of some technology so far we have now discussed the point of consistency let's talk about validation validation is a very interesting thing it might seem I would say quite obvious at a first glance because validation tends to be easy okay I have my not null fields then the one field should be a credit card number or a completely filled out address and there is min and Max validations I mean these are the easy validations you can easily deal with them in the business logic or even when entering into a rest call or something like that those are the easy things the the more interesting things our uniqueness validations for instance let's take a look at this example domain here so we have a user which has an email and a password and we split this up into two commands you register user command or user it yet reduce the user command change email command and after these commands have been processed they fire off user registered event and email changed event and here you see a very nice example for the naming of these things the commands are present taste tensile register the user change the email after the command has been processed the user has been registered and the email has been changed so we can register a user and we can change the email and now we have the requirement that we want we need to process more than two million user registrations per day and a user can change her email address however the email address must be unique now if we think again in our I would say event source system which I would say validation risks do we have so upon registering the user we need to do a query against the read storage on for instance a list of distinct email addresses we have one data bucket registered email addresses is the email that the user is coming along with already registered if yes the user has to choose another email address or has to reactivate her account for instance if no everything is fine we insert and the same validation takes place when we want to change the email address is the new email address unique in the system so the risk that we might run into with this transaction is that the update to the read storage hasn't occurred yet for instance another user has used for instance my email address Michael deployed at a new Q comp in order to register illegally and the read storage says no this email address doesn't exist yet so we can create or we can store this event user created event in the event storage and now we have an inconsistency in our event storage because there's at the current time in the event store two users with the same email address and if we if we talk about I would say this scenario with business people we often and I would say often means 95% get the answer this must not happen under any circumstances and please take huge technical measures in order to prevent this from happening but I always of course we can do big technical steps in order to prevent these things from happening happening by increasing the the consistency level in the overall architecture but also sacrificing availability if we go ahead with the availability argument to the business things to the business managers and to our manager we often get the answer oh yeah you have to take care of availability as well but let's take a step back how high is the probability that this validation will fail thus this happened very often I would say in this use case yeah it might happen some time but is it the regular case that this is happening I doubt them I don't think it's going to happen very often let's say we have a 0.5 0.05 percentage that this happens then you should take into account which data do I need in order to do this validation and were and how is this data being stored maybe we can mitigate the probability by quicker updates to the read storage if we have a I would say high business criticality on these things but at the end of the story is basically what is the true business impact of a failure in this validation process thus this cost us a lot of money do we really need to deal with these things on a technical basis I mean the the user diplucate user thing how can we deal with this okay there is the event in the event store we now have two users with the same email address in our event storage but couldn't we just go ahead in the event handler when we when we create this distinct list of email addresses we will find out the error pretty quickly couldn't this event handler then just issue another event to some customer service representative that emails the user and says hey we have a hiccup in the data sorry please change your email address with this link again and we're fine easy solution organizational solution we could also have a compensation event that just rolls back the email changed event to the old email address that we just send the user in email and tell him hey something went wrong in processing your email address change please try again fana solution for many of these things so always again the business domain is the main driver for the consistency level in the system another thought that we can apply to this to this topic is against what would we want to validate one option is and please don't shrug I have a second slide right after that one we could validate from the event store usually the event store is more up-to-date and more consistent than the read store we could validate from the read store which I would actually favor or we could have some added validation in the event handler I would actually favor strongly favor and that that should be your 98% choice validate from the read storage and just have some I would say what I always call them in the read storage validation pockets that get updated depending on the business criticality of the of the this this kind of business operation and you you usually are pretty good off with that one what I would never do is validate from the event sir because you will run into performance problems with that over the long run especially in high-volume system high transaction systems you will kill yourself in terms of performance so so far we have talked about consistency and validation now let's talk about parallel updates again we're going to deal in our business domain you know the user with the email and the password register user command change email command and then the two events but what happens when Alice and Bob share an account and both update the email address at the same time so scenario Alice and Bob are having a diversion and Alice tries to change the account to her email address Bob tries to change the email account the account to his address what's going to happen the first thing that my to take a look at what is how is this issue being dealt with and I would say the classical old-school architecture what we could do there in the database we could have pessimistic or optimistic locking on these data sets so with pessimistic locking we usually issue the classical select for update command have fun on scalability with that one on the data set the data set is locked for that transaction we change the command and so on and so forth or we can use a version incrementation or chime stamp mechanism with optimistic locking so that we increase the version with every update and when something goes wrong we get a let's say in typical hibernate terms optimistic locking exception or something like that again the locking quality should be driven from our business domain what we really need and a pessimistic locking on a data level will obviously very hardly work on an event sourcing architecture what do we pessimistically want to lock the event queue the feed storage it won't work so that won't work and let me be very clear in my eyes and I think that is also the strong opinion of the overall community that we're living in is pessimistic locking is a bad practice you usually don't do pessimistic locks even in the classical systems obviously sometimes on update operations you have some optimistic locks but if you take a look at patterns like transactional right behind or stuff like that where you try to to put the DML data manipulation language statements as up to as far as possible to the end of the transaction we try to minimize pessimistic lock contention our databases if we want to presume istic they lock something in an event source architecture with CQRS we could actually go ahead and issue a user locked event to the event store which gets replicated very fast to the read store for to a bucket is is the user locked can the user be worked with and we can do that for instance but as a business rule for instance if I if I implement a implement a mortgage loan system and there is the credit decision and the directors of the bank approve or disapprove your credit application the director might go ahead and do a business lock and take this application into a in work status that's what usually happens in these kinds of systems so if another director wants to go ahead and take this this application in work they can't because there's some business logic that prevents them from doing that so that is something I wouldn't call that a pessimistic lock because a pessimistic lock is always on a roll a bit for me that is a business law and business locks might work very well in these kinds of applications but do we really need these locks really question that and if you think about classical old-school classic architectured systems they are usually running very well with optimistic locks so that means if we run pessimistic Locke says pessimistic things will go wrong with optimistic locking I say I guess most of the time things are working very well and if something goes wrong well then I have to do something usually it means you you detect the collision of the parallel update on the insert and then you notify the user sorry somebody has already changed the data a please go back refresh your data into your change again so again some sort of organizational or operational fixed the issue so if we want to introduce a optimistic locking approach we can go ahead and add a version field to our user and also have the version obviously in the events yeah in the email changed event why not in the user created event because this always includes version equals zero a freshly created user has the version zero or one whichever you prefer and each writing event then increases the version so we register we have version zero over there we change the email address which means we check the version and this email address changed event should have for a successful processing the version that is compliant with the current state of the application so that means zero yeah that's a success that works so we have the current version of the user in our hands now the the user is basically in version one we issue another event which encompasses version one that's a success that works and now suddenly we come along again with the email changed event but on the user in version one that means that is the parallel update from Alice and Bob that we see there and this obviously will be detected when we update our read store because again here we are not fully consistent we don't validate against our event store we validate against the read store and then we just raise another event email change failed event and this event might trigger an email being sent to the user hey sorry we detected that something went wrong with you and we're fine so again to sum this up and as always you should be as consistent as your domain requires and what I think is very nice about these kinds of systems is that they force you in order to think very carefully about your business domain it's it's I've mentioned that quite a few times it's it's the business domain that that is driving these decisions not fancy technology or something like that and you get to think very thoroughly about this business domains when when you deal with these with these kinds of questions and I found myself I established some some sort of best practice for myself here because I when I started working with with these kinds of patterns I immediately always go back and when I see a a problem or a a challenge a requirement that arises that needs to be addressed I usually go ahead and dig into the business domain and derive this decision the level of consistency the level of validation quality the valid the level of locking quality that I need from the real business requirements and always talk to to the business people what is the cost of things going wrong if the cost is substantial we need to do something of course we need to address some things if the cost is not really substantial if the cost is an automated email being sent to the user and say try changing your email again something went wrong we can live with that that's fine okay um I'm done with the talks so far mmm I think we have quite some time for questions yeah there's a question here wait a minute I need to come closer to you because I don't hear you and I'll repeat the question for the audience so okay yeah yeah okay not really change the email oh yeah the the question was why do I did I do the validation on the event part I think maybe my slide wasn't very clear about that I would always do I would do the validation the validation in the command against the query storage but you detect the collision if if your validation says you can store this user with this email address and if the query against the record was stale with stale data this will probably be detected when the event handler is running over the events and analyzing the data and that is then where you handle these things yes another yes yeah yes yeah the question was um there was in another talk the the pattern or the dimension of checkpoints as a starting point to updating the rete model basically I'm a huge fan of that I use that very often usually you don't always update the read model from scratch but you have let's say all the snapshots upon which you then apply the events on and these snapshots are obviously the check points that you're going to very good pattern I would call that our best practice absolutely yes yes yes I'm usually I what it depends on the business domain that you have if you have an event that where you detect later on that the validation fail I would usually go ahead and raise another event which then marks this this well I would say this this business domain as hey there needs to be something done and from there going on you can have event handlers that do something with that one option would be to send out on automated email so there there's then no human interaction required the user just gets notified something went wrong please do that again or if it's a more complex domain you could route that domain to a I would say service team that is dealing with these kinds of problems and that can then introduce measure measures that mitigate this situation I didn't get you can't say that registered user events yeah okay if you detect that beforehand you don't save the event you just go back to the user but if the event has been saved you would then have to deal with that afterwards yes over there I come back to you later okay yes yeah on it depends a it depends on the database that you use of course yes if that is the case and this if these things happen very often you might have a added complexity but it always it always depends if this really hurts you you know sometimes if you the database has absolutely no issues with scaling with these kinds of things because things just get written off and that's no issue but sometimes you might run into issues and then you you have to rethink how you want to validate and whatnot absolutely yes yeah yes yes yes the question was about the usage for instance of spring data and the facilities that spring data offers to us when working with an event source and CQRS environment I would say it depends for instance you can have a a great Spring Data project on top of a let's say no SQL read storage where you can easily use all the query facilities against this read storage but you just write stuff differently over there but of course yes this goes away from I would say the typical spring data approach that is pretty much crud centric on a piece of data so you would basically split things up you would fire off events and you would just query through spring data for instance okay another yes yeah yes okay yep yes okay okay the question was um if you have a a legacy system and what are the best practices and a recommended way in order to migrate from from this classical approach towards a I would say the secure s event sourcing approach first disclaimer here and secure s and event sourcing are no I would say catch-all patterns you can easily go ahead and just pick out one business domain where this is essential and very suitable and still work with the rest of the domain in the classic approach note one so you can have a step-by-step approach so you you would usually go ahead and you have a very business uncritical thing which actually fits very well in there so you can collect your experiences you can create a proof of concept on top of that what I would go ahead is in these things I would split up the read/write part and then first of all I would actually it depends on the risk that you want to take if you want to have a a conservative approach you could go ahead and use the relational storage for reads for the event storage and for the query part but what I actually would do is externalize the event store for instance towards some ready storage or something like that and you might replicate into highly aggregated tables on the relational storage okay yes okay okay okay and the question was what is the best starting point for event sourcing and secure s with starting code that you could easily clone to be honest I can't name your reference I think there are some open source projects that address this especially some secure s frameworks one is yeah exactly and that could be a good starting point excellent a xon exactly so that would be a good starting point there I think Martin Fowler wrote a very interesting blog in that with tons of code examples but I think his examples are written in Scala as far as I know a very interesting tooling is also the occur persistence thing in Scala I don't I hope I don't get beaten up at the greatest groovy conference for mentioning a capriciousness and Scala about that but yeah okay yes okay the question is when I'm validating in a high-volume system if the validation on the command end if that won't be a bottleneck no because we have a very or I would say not in most of the cases because I have a a very accurate a I can create a very high performance read storage which reduces my my read costs significantly and that should usually have a pretty good performance yep yep yep let me just repeat that what you said he just said just go ahead and and setup very simple and highly available highly performant data buckets for these validation queries and I actually worked on an event source system were where we modeled our queries as the query parameters as map keys it was just a distributed in memory map or several of 80 highly distributed in memory maps the query parameters were just map keys and that was like really fast stuff okay there was one last question okay and if the read storage is a highly distributed in memory data so how do i okay okay first question is if I have aged data well I can have easily read storage and read buckets let's say from last week last year and so on if they need if I need that for my business domain you can be very creative and very flexible in terms of the read buckets that you have and these buckets can also then be and he also asked if I could elaborate a little more on the on the concept of the of these snapshots and checkpoints you can easily for instance go ahead and save a let's say read storage snapshot of a twelve o'clock one o'clock two o'clock and so on and you can then go ahead and apply the the the new events on top of this read storage for instance that's the pattern yeah okay just two last remarks you can follow me on Twitter at big bosses my twitter handle and I will upload the slides of this talk to slideshare.net /m PL o e-- d and as soon as I've done that I'll publish that on Twitter also through the s to G X hash map and I want to thank you very very much for the great discussions after the talk for all of you coming to the talk and showing interest into that topic I'll be around today at the conference so if you have some further questions and/or anything else just say hi and yeah thank you very much you
Info
Channel: SpringDeveloper
Views: 93,139
Rating: 4.9451113 out of 5
Keywords: Web Development (Interest), spring, pivotal, Web Application (Industry) Web Application Framework (Software Genre), Java (Programming Language), Spring Framework, Software Developer (Project Role), Java (Software), cqrs, microservices, event sourcing
Id: A0goyZ9F4bg
Channel Id: undefined
Length: 84min 33sec (5073 seconds)
Published: Tue Mar 22 2016
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.