The hardest part of microservices: Your Data by Christian Posta

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] I'm going to be talking about so I'm assuming everyone has heard of microservices I hope no I'm not okay well I put in the slide thinking if people had heard of micro-services some of this stuff might the title the slide might resonate a little bit um so we we've been talking for the last couple years to three years now about micro-services and breaking our applications our monoliths down into smaller pieces so that we can go faster and get reduce the time to value when we make changes to our applications so that our our teams can scale our applications can scale but I what I've been seeing all along those two couple years is when people do go try to break down their applications they very quickly realize that the application code there's there's definite complexities in the application code but once you get down to where is it data for that application lives in the myriad of different types of data storage database type technologies breaking the application down at that level becomes kind of painful and kind of difficult and raises a lot of red flags what do you do when your application is nicely talking to this this database that has all the tables is able to do all kinds of complex joins and queries and all this stuff and transactions and so on and now you're talking about breaking your application down into small services and that each service owns its own data and owns its own database so we're going in this talk we're going to look at some of the challenges around data and some of the challenges that come up when you do try to go and break down your applications and your image and you confronted with possible solutions so let's let's get going this is an abbreviated version of the slide deck and this slide deck you can get the full slide decks here it's something that I'm constantly updating also Cleavon if you've seen it Florence in the past I'm constantly adding more more content to it and this is this is the the version of the talk that will fit in about 50 minutes or so but if you want to pull the full slide deck then feel free to go there I'll introduce myself my name is Christian because you post I'm a principal architect at Red Hat I travel all over North America meeting with our enterprise customers and and I've been helping them build large complex distributed system to solve some of these business problems that they have I recently wrote a blog s recent in June called micro services for Java developers and the book just is absolutely a hands-on step-by-step introduction to using Java frameworks to start to build micro services and deploy those onto technology like doctor and kubernetes so things like spring food and Walt Weiss warm and drop wizard I in the next slide I know so if you go developers that redhead comm you can get a free copy but I actually have some hard copies here a couple different books related to micro services related to data about 11:00 or so if you go to the Red Hat booth at the expo hall then we have we have a lot more feel free to pick up a copy and I'll hand these out to people who ask questions at the end you can pick which one you have I only have four of the database one and seven and one but I when I meet with these customers I draw I try to draw on the experience that I had working at Zappos comm and their parent company you may have heard of Amazon and at this time back in 2012 I went to start working with their integration teams and their services teams and I was really excited to work with Zappos and Amazon Amazon at the time was the the you know the vanguard of SLA and at the time and before SLA was but you know was the hyped hotness and when I went to work there I when I first started there I thought oh my god when I saw their architecture I was like this this is not s the way this looks like a huge mess this doesn't make any sense this doesn't this is not what the Thomas Earl books taught us about SOA this is not right and all of their developer tooling and the way they did deployment it seemed kind of cool but it's all closed source repository stuff I didn't really I wasn't as interested in that it wasn't open source but a lot of the things that I that I learned and the principles that that I saw I look back and I realized what all of that stuff is exactly what Netflix talks about micro-services and LinkedIn and Twitter and all these people that's exactly what Amazon was doing and now Amazon of course you know let's change the name to I'm sorry the microservices Amazon is one of those those are the unicorns as well but so what I do is I try to draw on that experience that I had at Amazon and help enterprise our enterprise customers understand and realize what micro services might mean to them because it what it looks like at Amazon or Netflix doesn't mean it's going to look like that at your companies or at the enterprise companies that I work with and trying to understand help them understand the path the journey that they need to go down to get to that the state you know that state where they're going faster and they're doing more builds and they're able to experiment and learn that's that's the goal not doing my sisters of the rest later or whatever the current buzz is so I'm going to be talking about how we take our applications and what are some strategies that we can use to confront some of the problem that we'll deal with with data and when we talk about micro services Adrian Cockroft says this perfectly and I use this on all my slides that we can't just start breaking up applications start using that books OSS and then say well Locke we're doing micro services now where now we're much better it's a journey and it crosses technological boundaries organizational boundaries cultural boundaries process boundaries all the things that we kinda as developers we would the technology's awesome we love the technology we love learning stop distributed systems but the hard parts are you know we'll talk about the hard parts with related related technology but the the process of getting there is more important not not the technology that you use and when we talk about micro-services we're talking about optimizing IT we're talking about optimizing the business for speed and this is quite a bit different from how IT operate in the past IT was seen as the you know how do we support business functions how do we do things like set up email servers and automate human tasks and automate away paper tasks and what we're talking about with micro services is optimizing to be able to make changes to our systems our our technology systems much faster and deliver value through technology which is totally different than we used to look at it technology I'm sure you've seen slides roots and software's eating the world and everyone's a software company now and all that so Microsoft about optimizing for speed but how does a potentially 100 year old company and and all the legacy that comes with it how do they go fast and not every part of the organization has to go this fast but whether it's your organization whether it's technology if you look at a piece of code how do you end up making it go faster and optimize it for that for speed is is different than optimizing it for other concerns I cost like we did for traditional IT but it does come down to managing or in some cases isolating and reducing some of the dependencies that we have between the components in the system the more dependencies that we have the harder is going to be to make changes and go fast which is the goal and there are a lot of different types of dependencies from we can look at it at the service level surfaces talking to each other we can look at the team level if I want to make a change to my my UI or something then I kind of have to change the backend services as well and that might cascade down into changes to the database and so now I have to solder tickets tickets and coordinate with a bunch of other teams so there's dependencies at that level and on and on and on but I think data is is one of those major dependencies but before we can talk about what you know what sort of dependency that is or how do we break that dependency or how do we alleviate the dependency I think we should first talk about well what what is data what is data where we got to start somewhere so what is data and the best definition I've heard is from William Kent in a book called data in reality where the first couple paragraphs he explains that data is is it's like when we talk to each other and we explain concepts to each other and we use you know books and computers and laptops and we talk to each other have a conversation or able to in real time figure out any discrepancies or ambiguities in in in real time but when we when we talk about data what we're talking about is a human trying to encapsulate an idea and explain it to another human just like we do in natural language but we're doing it by putting it into the computer first we're telling the computer what this concept is and where if if we explain it right then we're hoping that the human on the other end who reads this data it does something with the data in the computer we'll be able to understand it and make sense of it and that does sound sound so trivial but it's and I'm going to use a very trivial trivial example but it's formally it ends up being far more complicated so what is one thing how do you describe one potentially physical object or one thing in this case a book so I wrote a book but how would you describe that in a data system that maybe is a online online book store or a bibliographic database would you would you have one entry per author one entry per title you know I have about seven copies of my book seven books up here's our each one of those things a book some books get so big that you have to break them down into smaller pieces and smaller volumes are all of those things the book or is it just one physical thing for the book have hardcover softcover newspapers are not might not be considered books and so on so just this one simple concept how do we describe that to another person and write that down in the in the computer in some sort of form is not very strict as straightforward as you as one would think now you take that and you explicit that out to what we deal with in our enterprises and accounts and customers and claims and all the different concepts that we try to model in our and our business offer and this becomes far more complicated if we take the book example to another step we could I don't know how clear the Texas but you can see that in certain contexts and certain certain conversations that you might have about a book for example buying a book checking out a book each individual physical copy actually is a big deal we need to keep track and think of a book is in an individual copy that's how we charge and make money but if we're doing things like building a search engine we might not care about every individual copy we might just care about the titles if we're doing a recommendation engine we might not care about any of that stuff we might just care about metadata that is related to the book or is it relate to other books so how we capture this and how we describe this in the computer going to be different depending on the conversation we're having and there's amazing set of patterns and practices from the domain-driven design community that helps us model these differences and be very specific about these differences across these different conversations that we're having about our domain domain-driven design is not new that's you know Eric Evans book has been out on Google that 2002-2003 and the study of data and data systems information systems goes back way farther than that and I've had people ask me well you're okay so we can go through domain driven design that helps us identify models that helps us identify boundaries and put context around those boundaries and if you start to squint that starts to go down the path of oh yeah we want to break up our thumbs into microservices well this is boundary driven this context driven approach helps but I can't had people ask well Netflix never mentioned this domain driven design LinkedIn doesn't mention domain driven design why why why are you talking about the American design and I think the answers those companies had to solve very difficult problems at a scale that a lot of enterprises don't see with the number of users the amount of data they had that what they were collecting writing a week on Twitter is is pretty it's pretty easy but if you have five hundred million followers or some gigantic number of file followers it's very technically it's a very complex problem to solve now if you if you shift away from maybe some of the internet unicorn companies that have built micro services and scale down those it always said and don't talk about domain driven design and you look at our enterprises you start to notice that there's a there's a very very clear comparison or juxtaposition of how complex for example Twitter is the domain in Twitter and how complex our financial services companies are or insurance companies health care companies that are far more complex writing a tweet to Twitter or updating a LinkedIn profile is pretty simple displaying it and aggregating and do all that stuff behind the scenes is is technically challenging but in the enterprise wouldn't have to deal with both we're going to have to deal with complexity in the domain how do we solve that and we're going to have to deal with complex at some scale maybe not Amazon scale but some definitely some scale so okay so we start going down the path of domain driven design and we will use the concepts from from the literature bounded context and building aggregates and domain models and all that all that really good stuff and if we're lucky what we end up doing is we build our applications and we sort of try to elicit those models in our application right and we put them under or on top of a database and we were able to get by and do actually pretty amazing things to think about some of your companies we're that are not doing micro services to probably make tons and tons of money they're doing fine but how do they get to that next level well we have to kind of stop and take a week doing this has some pretty pretty astounding advantages normalizing so if we go a little deeper and so when we when we take our domain models and we actually synthesize those in the database normalization and cycle clearing is actually incredibly powerful it allows developers to normalize the data at restore the data in one format and then do some pretty powerful querying on top of it queries that you probably didn't know ahead of time that you were going to write or going to need but you're able to do this because of the way we're storing the data if we if we move away from that I don't using like no sequel stores like a Cassandra MongoDB or so some of those and I wonder if you've run into this but when you start using different types of technologies for that you have to know your queries ahead of time you have to plan you have to do the query planning ahead of time so this is quite amazing flexibility that we have other things like the acid properties I'm sure people have heard of acid out of Missa tee so it allows developers to not have to worry about what happens when things fail partial failures it's a really nice abstraction durability when things get written to a database they ideally don't get lost isolation make the developer think that they have access to the full database and everything will appear in a nice sequential order consistency is is kind of an overloaded word and we're going to talk about that in a little bit but it's not the same consistency that's in the cap theorem it's not necessarily something that the database itself provides so C is kind of like acids acids very comfortable for developers C is comfort you feel good when you're using this stuff and you abstracted away from some of the very nasty things that the database we need solved for you over the last 40 years so my recommendation is to stick with these conveniences stick with these safety guarantees for as long as you can because after that things get pretty nasty so all right well okay we'll stick with it as long as we can so we still end up running into some of these some of these issues right we're maybe now Holliday will fitness single database and we need to scale out the database maybe there's you know the joins and the clearing that we're doing is getting getting way too complicated and we need to look at how do we maybe do normalize the data and maybe we want to reduce dependencies between teams and teams so that they can work on the different pieces of the system independently and move at their own cadence but we have to we have to very clearly in our minds know that what we're seeing to ourselves and to our the you know the infrastructure folks and the operations folks that hey we we developers we know that there's been 40 50 years of data database research and and theory and implementation and has been around for a long time but we developers we got it from here we'll take it from here and we're gonna build micro-services and this is what we're going to build and everything is going to have its own database and everything is going to be awesome but when we go down that path what we you know we're saying this but we have to reconcile the fact that now we're building a distributed system a fully fully data centric distributed system and we need to understand some things about data in a distributed system that we didn't really have to reason about in our nice safe world that some of those some of the things we have to worry about our well the data that we see what what what can we what do we know about the data that we see inside a single service inside a monolith for example we're able to query data and we know that that's the data in the database right now right if we start breaking things up individual services will still have that that nicety they can say well the data inside my service that's current I know that but data outside my service once I start sending it and anybody that receives that that data that is not now anymore that's that there's there's a very stark difference between how you interpret the data when you're when you're doing data queries and manipulation and so on and you want to make very clear that when I'm seeing data after it's left the service that is stale data house Dale I don't know but it but it's not the current data so we definitely have this notion of data on the inside of a service transactionally consistent and is what the current now is and then when we expose data send it outside of our service that is a point in time snapshot of that data if you start to squint you start to think oh this sounds like this sounds like events it sounds like immutable data that has left the service that if you if you read about it and the model it probably that we can call these events but we'll come back to that another thing we take for granted is that our services can just talk to each other over the network and in some cases we've written libraries or used frameworks that hide the fact that there's even a network it makes it look like there's a nice local call but this is not the call graph this is not really what it looks like when we want to make these calls over the network it probably is something more than this but here's an abstract idea of what what that looks like but what we're talking about it these services are communicating over networks IP packet switched networks which are asynchronous networks the opposite of that would be a synchronous network where everybody's on exactly the same time and we're able to send data and transmit data and everyone accepts as a central point in time or there's a shared concept of time in a an example of that would be I guess maybe the CPU systems or the compute the the chipset systems inside of our our computers those are all clocked by the same CPU and everybody runs at the same rate another one might be the traditional old landline telephone where you picked up the phone an actual direct circuit was created and a with allocated for you between you and the other end but what we're talking about in these networks and these these packet switch networks is that everyone has their own concept of time there's there's actually no guarantees at all of time between ones between a and B in the system if I start sending messages to a that that message could be arbitrarily delayed before it even gets to B it could also not get to B or it could get to B & B respond and that gets lost or B thinks it's responding and the computer itself is just dropping packets on the floor so there's no there's no concept of time each individual service has its own time so when we start to look at data that's left is left the service we kind of have to model this idea of so here's another example of time coming into the picture when data is left it's data on the outside and it's then it was back then in the past and when we have these services communicating at their own cadent can stand on their own concept of time we want to we want to model that and we want to take that into account because we don't really know because of because of this discrepancy the time we don't really know when there's a failure we don't know if things failed we don't know if they're just taking a long time so we have to build this these concepts that model them into the system as a first-class design right right is it is first-class citizen so how do you end up reading data and updating data in this environment where there's no concept of time there's things that look like failures and there could be failures and we start to see solutions that that are a little bit suboptimal so for example let's say we need to update a customer's address their address changed and they're moving so we got updated in the system it's not as simple as Oh we'll just call the database and tell it to update the address because that address impacts other parts of the system when when we move when we change locations back to potentially pack how taxes are calculated that could impact potentially in-flight deliveries that could impact how we send out promotional material so we want to be able to tell other systems that the address is changed not just update it locally and leave them on their own but this is not a very good way to do it where we start a transaction we update it locally and then we close the commit the transaction and then send off that update right because what if what if something what if the minute what if this process fails after the commit and doesn't actually send the update so we could move it into the commit but what if we send the update now and then fail or rollback so now we've sent this update but we cannot actually made the change locally so we can try to mitigate that so this is more of a visual description of what we were trying to do what we're trying to write to multiple infrastructure services data services we could mitigate that with some sort of coordination between the the resources that we're using and you know between services between these time boundaries between these domain boundaries we don't want to use this inside a single domain inside a single boundary single time boundary this sort of coordination is some more acceptable but in general having to having to try to coordinate between two systems using actually transactions or two phase commit transactions adds a lot of management overhead so where are we storing the transaction logs what happens when there's deadlocks what happens when there's heuristics and we give up and we ends up becoming kind of a mess and it's difficult to manage at scale with a lot of services and we're and a lot smaller services now when we're talking about micro services another example where we see something similar where customer updated their profile now we need to call back to back-end systems and say okay well now let's update recommendation engine let's update other parts of you know the promotional or relevant search engines and on and on we end up making calls for all these individual services and that's that's fine but when things fail we have to think about well do we need now now do we need to roll back some of those changes that we made and are we properly are we doing this with the salgo pattern or compensating transaction or something are we storing the state are we basically implementing a transaction manager in our inner service and then what happens if we do make this compensating transaction or this compensating action now it's probably if you've modeled this correctly in the way you've architect your services and its product it's fine but if you hit if you did it what you're basically doing is writing data and then rolling it back but somebody could have seen this data somebody could have made decisions based on the change that you made here that you actually decide well I change them I don't want to I don't want to do that and in a database world that's called a read uncommitted change that we're seeing data that actually wasn't meant to be seen and we're making decisions that were going forward so this could have some pretty nasty effects if you're if you're not careful about what what this what this compensation transaction is doing another another thing that that may come up and actually we're gonna skip this going to come back to that another thing that often comes up is when we start making queries out to our services and we're getting returned a potentially unbounded lists of objects and for each one of those objects now we need to iterate through and call and do an enriched call a different service and get more data about those individual elements in the list it's called the NN + 1 problem now the solution or parts of the solution I guess that people start to implement is well then we'll just create a bulk API to do that will create a vole que pide service we can end up creating a bulk API and they all might do it a little bit differently and then they realize what wait a minute hey I because we want to process this all at once we don't want it we don't want to n plus 1 but so maybe we'll we'll add but we need more fine-grain sorting and filtering and exclusion credited so we'll do we'll write our bulk api is with the ability to pass criteria for how we actually process it and then we just kind of keep doing that and adding all these weird api's on top of our bulk API and we sort of just one point give up and say well find us just here's here's a generic query just run it for me and we end up basically implementing a database in a really crappy way another another thing we might resort to is is caching and this makes the problem of time and state very obviously and we're talking about cache state so there's going to be some stale and so it brings that into the into the design which is good but the problem happens when things start to change behind the scenes now we know it we're stale but now we're also not able to keep up there's no change the triggers that there's no triggers on the on the downstream service to say hey this thing this thing is updated there's stale beyond too long of a limit so what we need is we need to be able to model these failures model time as a first-class citizen we also have some expectation for data consistency we want to make changes to our system and if we start talking about distributed systems and consistency and time and fillers all that stuff this sounds very very familiar there's a there's a theorem out there who's heard of the cap theorem a handful of people so a lot a lot more time thanks good ok crowd and with with the cap theorem the calculator is probably a good place to start when we reason about these these sort of struggles but it's not it's not ideal way of discussing it because cap says that we have three three things consistency availability and partition tolerance and then we just need to pick two that's what counts this and that sounds very simplistic especially when you can't really you can't give up partition tolerance that's a given we looked at our our networks where I think where this networks there's no guarantees on time failures queuing all this stuff we can't give up partition tolerance stats forgiven so we have to pick then interface of partitions we have to pick either C or a now cap defines C consistency as very very strict consistency that you can get and defines a availability the most availability that you can get so far ends of what is actually a spectrum but so if we have to look at it through the lens of picking either the most extreme of either the cases that we're kind of doing ourselves a bit of a disservice and we're overlooking possibility the possible other solutions so for example in the consistency is is really just about how how you read back what may have happened in the past and at the top the most strict linear eyes ability says when things change you immediately see it everybody to me immediately will see it and everything happened in Nice strict order a nice straight line and that's it there's no there's no stillness you take a step down sequential consistency says well you still have this nice line but you may not see it right when it happened you may there might be some lag you'll see everything in the same order but you you might not see it right away and you keep going down causal consistency says well you may not see a perfect line but the things that are related to each other you'll see those in the right order so for example you'll see the blog post before you see the blog post comments right does it make sense if it's the other way so you'll see a causal relationship between events and on and on all the way down to like monotonic reads means well you want to kind of see who might not see all that but you'll see it all moving forward in time all the way down to can eventual consistency in this in this in this model is it just could be anything as long as eventually right it could be anything when you read it and that may or may not be good for the use cases you're exploring but there's an awesome awesome paper that I highly recommend from Doug Kerry who still work at Microsoft thinking ah ah where he explained these consistency models in real life in terms of baseball the game of baseball and what sort of consistency models you might need if you are the score keeper you might need linearise ability you want to know what the score is you know you want to know when the game ends who wins the game and you also want to know if the games tied so that if you need to go next two innings in art and so on but you might as a score keeper you might really just need read my rights consistency I want to see all the rights that I made how other people see the game it really just depends on who you are but as I could scorekeeper I might just need read my rights not full strict consistency as a as a sports writer a might not write an article until the next day I might not need the data in certainly not strict linearise ability maybe maybe some moving window a bounded stillness maybe eventually consistent a consistent model down to you know the what was the radio one so the radio one where yeah you might catch some of the radio updates but you don't want to see scores or hear scores move back in time so you just you just you might want monotonically increasing reads of the scores so that you're constantly seeing what the matter is moving forward and on so there's the the point is cap might not be a good way to think about how to how to reason about these these issues and they're different levels of consistency that you might need and the trade-offs that you'll make for availability but how do how is this as though there's all theoretical stuff now how does how would this apply can we use relaxed consistency between systems I would argue yes I would argue that in real life there's very very few examples of things that are strictly consistent we communicate by passing messages and alerts and events between each other and we react and Saunders there is eventual consistency between our interactions but how does it what would this look like in in a IT system where if a customer made updates to their profile and we want to share that with downstream systems we can instead of trying to do all kinds of two phase three phase commit type protocols to get consensus across these different systems across these boundaries but we don't really have any influence on these services that live in their own time you know we can use something like strict consistency or maybe causal consistency to communicate some of these updates downstream so in this case we're going to use a stateless end up being sequentially consistent log that we constantly append data to and will eventually be seen in the right order 5e systems downstream and I'll go with a couple more neat examples but what we what we've done what we're doing by saying we'll manage the data we'll tune the consistency models to what we need is what we're saying is we're building a distribute system of building a very data-intensive system at the application layer at the application there where this all used to be in the infrastructure now is this is this practical do people do this there are at least three the time or actually a lot more but at least three examples where some of these internet companies have done exactly this they they model changes to their system as a set of events driven through a consistency model that is tuned down much lower to be able to scale out to be able to react to changes and make changes to their system about impacting other parts of the system and these are these are time dates they open sourced a couple of these the the Yelp one has a big paper that our article blog post that they wrote about how they built out this data pipeline but if you were to start using any one of these there's there's some drawbacks to using some of these one is they're kind of built so for example the Yelp one was built for Yelp it was built for their infrastructure is built for their observability teams it was built for mine just being able to stream data out of my sequel my sequel database what we want is something similar to these built on Best of Breed open source technologies in a modular way so that we can pick and choose the pieces that we want and construct the data pipeline that we want and that's exactly what this open-source project is this open-source project is called the BGM and it is an engine is it change data capture engine that allows you to expose changes in your micro services to the outside world in a date on the outside type of way where we're exposing these events to the outside world by capturing changes in the database as they happen so we're we're not trying we're not trying to do weird application level two-phase commit type things when when events happen when things change we're not manually going in trying to figure out how do we put in triggers for this table and that table and what we're doing is we're saying any time the database makes change let's capture that and put it somewhere and specifically the BGM is is meant to be a modular component best of best agreed Open Source engine that is built on top of Apache Kafka so what we're able to do with division is read the database read the database transaction log log natively and pipe that data pipe those changes each one of those records into a Kafka log that consumers downstream can then read and react to and either store data in their own systems or interpret the data or potentially join it with other other stream processing technology but we're taking the database and we're turning it into a stream now the BPM specifically is and what I meant by modular and component driven all that stuff is the BGM is an engine that has plug-ins for different databases the idea is the architecture pluggable that we can build new connectors for different database technologies the elf one for example was just my sequel LinkedIn one had my sleek line Postgres but they're kinda and we're the commingled with with the BGM what we have connectors today for my sequel for MongoDB Postgres we did just merge of total customer for the Postgres connector about three weeks ago see you at your own your own rate and then we'll be adding things like Oracle and Microsoft has gone and others but the way it works is we're reading the database transaction logs we decide which the b-team connector we want to use captured data changes from this database and then we stream those those tables and those changes into Kafka topics are there any questions about about that yeah yes yes so the question was can you be very specific and configure which tables schemas database and so on that you're interested in in the in the database to answer is yes you can specify whitelist blacklist include these don't include these which which databases in mind my sequel lingo which schema and other database lingo yeah you can very finely control what parts of database oh you can't you start to capture yep both actually so calm the bees iam actually attracted vml changes or sorry DDL changes and schema changes and keeps an in-memory representation of that schema because as it's reading the DM else it needs to know which version of the schemas go along with that change and yep exactly exactly so you can what you can do is store the different schemas out-of-band somewhere in a retina schema registry we do have integration with confluence schema registry so use a bro in that case and you can store these different schemas in the Avro computer registry and then as producers we you know you can store the Avro schemas into the registry and then as consumers you pull down the ones that match the version of the data that you're reading yep so some of the sort of the benefits of this if you start to go back and think through some of the hardships that we would have to confront with trying to manage connection or coordinating data changes between systems what we're able to do here's an example of a cache the catchment where we put a cache in front of our our service now and part of the problem we had with that was a data invalidation we can use the bzm use Kafka as a live stream update into our cache so we always have a hot cache we always have it's still going to be somewhat stale right because anytime data comes out of the service it's it's by definition stale so there's going to be some some time lag but the cache you don't have to try to build one keep a petition level things for being able to do capture validation so now we have the BG amount of streaming the data changes into into the database we can do things like when we maybe we separated our application out into multiple bounded contexts and we're following domain driven design patterns and in you know we've decided that the orders and booking abounded context has a domain model that's very specific in tune for being able to take orders the administration side of the website where maybe we're adding if we're four books maybe we're adding new titles or authors are updating book information when we make changes to some of those books we can automatically just go publish the part the parts I guess to the gentleman's question back there just a part of the data that's interesting through outside consumers we can publish that through DBT and Kafka and have our search service automatically pick that up and and process that stream and maybe store that data into an elastic search that's much more highly tuned for querying and we can have other servers we build audit services and recommendation engines and all this other stuff based on the data in these streams so now we just add new new consumers and what you end up kind of doing is in a micro service architecture you end up having a couple - two definite - definitely two distinctly different types of services services that are aligned very nicely to your bounded context and the domain model is consistent you're able to do things like order processing or search or administration whatever but then you're able to build these that's where it's kind of overloaded - but these these services that kind of aggregator or watch these data streams and build new functionality on top of aggregated data and display data in a very denormalized way so what we're doing is we're we're kind of we're trying to get away from when we distribute our data across all these services and having to call each and every one of these service services just to display something on a web page or a cell phone so we're calling all these services and they were kind of trying to do joins inside our services against calls and try to do it in memory for each time we get involved I guess kind of asked what we're doing is we're turning that upside down and we're saying precompute that denormalized data precompute those joints those queries that we know we want to run and all and have a cache put in elasticsearch put it in whatever store it however you need each services on database so they can store it however they want and now the queries into the system become more simplified we're putting the joins where we're trying the joins inside out and I do have an example of this much more detailed example of this on my github using a project a tree head that we that we typically use the demo Java EE concept called ticket monster and if you're asking who does anybody use the b-team and the answer is yes there's a bit actually a lot of folks in the community but one one company specifically just did a case study called their learnings you know companies we Payne uses the bzm to build the sort of reactive event-driven denormalization of the data across microservices and they have a really nice write up describing exec so you asked about the registries they tied it in with confluent and you know they they explained in gory detail how exactly they implemented all that's it so I'm kind of out of time I was going to show a demo we have three minutes went a little bit long yes I do I absolutely do here so you guys can either write it down or I'm going to tweet it right now you can write it down if you like but otherwise I'm going to yep got it I'm going to literally live feed it right now yes so if there's any questions I'll be happy to take questions I'll be here for a little bit longer afterward oh damn ask questions so I mean the idea of doing data capture in the databases isn't isn't terribly new but they've always been very rigid proprietary solutions for that so for example Oracle GoldenGate has been able to do change of the capture from from Oracle for a little while yeah and then we've been able to do things like you can hack Postgres or in the past we had to ask a hack Postgres to expose changing to capture information but that doesn't do a way of doing it at the BPM actually implement who else after question after question can you hand that back Thanks any other classes I have three more books yeah no so this is just one way get the data out of the database and at that point you can do you can stream processing put it back in the database - algorithm however you want yeah no actually no so the implementation that we envision is using Kafka but it doesn't have to that connec there's a little those little connectors for the different databases can be run in embedded mode or outside of conflict connect framework so good so you can run it outside of khakha connect in your in your own applications and then stream the data into whatever into JMS them to wherever that yeah Microsoft is a specific disability useful let's pretend the term microservices yeah so however the data becomes materialized in a cache or a database or some uh system downstream is kind of application specific and actually if you squint that it's this whole idea of taking changes taking events and denormalizing them I get easier to read doing joins pre computing these joints is kind of low CQRS having the command query responsibility separation pattern and in CQRS the read side of CQRS is very application specific now how you get the data into that application specific form is going to be application specific stream processing so you'll be reading the canonical I don't want to use economical but you'll be reading the data format and a schema format that's on the messaging topic and doing the transformations that you need to do to fit it into the model that that's outside of the be together once it gets into Kafka so the B team will take changes out of the database and put it into Kafka once it's become a stream it's you can you can treat it however you'd like we are at Red Hat we have a product called JBoss data virtualization which is a data Federation framework and we are soon going to be using to be a team under the covers and embedded mode like like someone else over here to be able to keep materialized views of our federated data up to date I'd say I want to know if I started I get data from that moment or is now it's a something to say all the way one data or starting from Monday or from beginning on time I want to synchronize yeah so with the bees iam at least specifically the mice equal connector we are able to when we first connect to my sequel we're able to take a consistent snapshot so bicycle has MVCC and all stuff we're able to take a consistent snapshot of the data stream that out and then and then catch up in the bin logs from where for that point time yeah yeah that's it again yeah foreign keys that is a good question because when we start talking about breaking these systems up and kind of elevating some of the entities or the value objects in our in our domain we have to make identity very explicit in those in those pieces and we you know in within a single service we can leverage the database technology and features that it has to help enforce integrity constraints and all that stuff but as soon as data leaves the service and gets brought into materialized views that elasticsearch or however else there's going to need to be more Universal way of identifying that piece of data what or what that data represents and if you look again I'll go back to the domain driven design patterns and the community has great patterns for how you do that identity how you do that Identity Management how do you capture that in the domain how do you associate that across different services domain driven design explicitly I would say recommends using these sort of eventually consistent systems between bounties the context and because of that because of the the eventually consistent nature of these systems and you're not going to have very strict and variance across systems when changes happen unless you try to coordinate all those changes right with a distributed transaction or something well I thank you thank you very much for joining me [Music]
Info
Channel: Devoxx
Views: 13,541
Rating: 4.8873239 out of 5
Keywords: DVUS17
Id: EdtZnmV_D-g
Channel Id: undefined
Length: 55min 24sec (3324 seconds)
Published: Wed Apr 19 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.