Breaking Your Monolith into Serverless Microservices (Cloud Next '18)

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

[MUSIC PLAYING] JUSTIN GRAYSTON: Hello, everyone. Welcome to the last session. You made it. Woo! AUDIENCE: Woo! HUSSAIN AL MUSCATI: Hi, everyone. My name's Hussein Al Muscati JUSTIN GRAYSTON: And I'm Justin Grayston. HUSSAIN AL MUSCATI: And we're both customer engineers based out of London. So what does a customer engineer? Well, we work with customers to design and build solutions for the cloud. And to be honest, we still see a lot of customers struggling to deal with their monoliths. And they're trying to figure out how to migrate them to a microservices world. So you guys look to be very tech savvy. I mean, you're here. Does anyone here not know what a microservices is? Can we see a show of hands? JUSTIN GRAYSTON: Well, that's good. That's handy. [LAUGHS] I'm going to do the reverse, right? Who knows what a monolith is? OK, we got a good show of hands there. HUSSAIN AL MUSCATI: Yeah, that's good. JUSTIN GRAYSTON: OK, yeah. I'm going to do some more data extraction. OK, so who here, as part of your day job, works with monolith? OK, that's quite a few of you. Wow. OK, now, this is a select group, but we've all been here. Who here secretly knows in their heart of hearts what they work on is a monolith, but everybody else decides to ignore the fact? That it is a few. That's good. That's good. OK, so we're going to try and-- we haven't got long, and it's a massive topic, but what we're going to try and do today is we're going to walk through an example and hopefully give you some tips. But While we're doing this, what we're going to do is we're going to use serverless compute. Now, obviously there's been some great announcements this week, you know, that opens up serverless to Kubernetes and GKE. But serverless really enables us to concentrate on the migration and not the standing up of the DevOps. We just want to be able to migrate and scale. So let's kick off and start looking at our monolith. HUSSAIN AL MUSCATI: All right. So monolith applications have several things in common. They tend to be standalone, isolated. They're self-contained in the sense that all the logic's contained within a single instance or a single unit. And they tend to be large and very complex to work with. And a lot of the time, it's even difficult to figure out where do you actually start when you want to understand how your application works. So we realize that this is a very opinionated topic, and there's a lot of content out there. We're going to be focusing on showing you a few examples, showing you a few techniques on how to break off different parts of your monolith, convert them to microservices. And hopefully this is something you can take with you to your organization. So we're going to start with-- as an example, we're going to be using an e-commerce application. Why an e-commerce application? Well, we think that a lot of the properties in this application resemble other monoliths out there. So there's a user interface that's generated for the user to see and interact with. We have a load balancer that handles requests from different locations and distributes them among instances. All the application logic is contained within a single unit. State is associated within the instance itself. And in terms of storage, well, you're probably using something really ancient as a catch-all solution for all your data, maybe something like a relational database. So let's take a step back and think. When do you actually want to move to microservices? What motivates it? There are several good candidate applications that are good for microservices. Web applications, for example. They tend to be simple but require the ability to handle a lot of traffic, so they need to scale. Enterprise applications tend to be a lot more complex, very huge, and they tend to be applications that need to do everything. But what drives it? In the end, it's the business requirements. Essentially, what does the business actually need to succeed? Does it need the application to be scalable such that it can handle increased requests and volumes of data? Or do you need to be able to optimize the application such that you would only be using the resources that you actually need? Or is it more about being able to remain competitive and have the ability to develop features very quickly, so you need high development velocity? There are a lot of reasons. Let's look at an example to showcase this better. This is our monolith. And it has all the main components, has a UI, a load balancer, application, an application instance, and a database. And imagine that you have a celebrity that tweets about one of the products that you're selling on your e-commerce application. And let's say that tweet goes viral. So suddenly, you have a huge number of requests coming in for people searching for that product, wanting to put it in a shopping cart, and maybe purchasing it as well. So you need to be able to handle this increase in volume of requests. What do you do? You scale. You can scale vertically. That will only take you so far, so you have to scale horizontally. This is how it would look. Basically, you're adding instances. But think about this for a sec. What are you doing here? You're replicating the whole application on each instance. Do you actually need that? The scenario might only need you to scale certain bits, but you're scaling the whole application, which means you're using a lot of resources that you don't actually need. And at the same time, we said that state is associated with the application instance. How do you do that? Well, you can set up the load balancer to have a sticky session with that instance for every specific user. What if that application instance falls over? Then everything's lost for that user. You could move the state to the database, but then you're overwhelming the database even more. And if that falls over, you have a whole set of other problems to deal with. So imagine that you also want to be able to add features. And let's say you do a small UI tweak, but because this is a monolith, you actually have to deploy the whole application. And let's say you made a mistake. Your whole application goes down because of a simple UI tweak, and there's nothing you can do about it. The thing is here the main idea behind microservices is that it helps you bypass all that, because it gives you ability to do partial deployments. It's published in small iterations. And it gives the ability to better handle scale and optimization. JUSTIN GRAYSTON: So all good reasons to move to microservices, but should we microservice the world? I mean, does everything need to be in microservices? So this is a talk about migration to microservices, but one of the things you've got to do, you've got to make sure that you sense check, that you're not just following the internet or what somebody else has done. If you're not trying to fix a problem of scale or optimization, then what are your motivations to go into microservices? Is it a complexity thing? Well, microservices may not actually make your life any easier. I'm sure most of you are aware of that. You might be running specialist software or hardware or maybe both. Would you then go to microservices? What are you actually gaining? And just like you've got to have the business reasons to move to microservices, well, let's think of the business reasons to stick with a monolith. If your monolith is serving the business well, it's secure, it's stable, then what is the business motivation? And I think anybody who undertakes this journey is going to require a lot of money, time, and emotional toil. You must make sure that you sense check first. But with our example, with our web app, our e-commerce app, we're taking a scenario that we actually-- it was built some time ago. And we had a regional customer base. And now we've gone global, which is great. But the problem is our system's now suffering from stability and deployment issues that Hussain has just highlighted. We've also noticed that our customer competition is outperforming us. They're deploying new features faster than us, and they're eating our breakfast. So we have plenty of justification to move to microservices. So let's begin. HUSSAIN AL MUSCATI: Yeah. So what we're going to do is start by focusing on capabilities. What are capabilities? They're essentially what the application does or what the code does, not what the code is. Code can be rewritten, refactored. A lot of the time, you're going to have to rewrite stuff from scratch. That's not the point here. What we need to focus on is being able to take these capabilities that are in your current monolithic application and move them to your microservices application. So that's going to be the focus of this discussion. So what capabilities are we going to talk about? We're going to talk about a bunch that are common among monoliths as well? Storage, do we actually stick with one database or look at several? We're going to be discussing what we call as edge capabilities. These are ones that can easily be detached from the monolith. We're also going to be talking about sticky capabilities. These keep you stuck to the monolith. And we're going to also talk about networking in the sense that how the different parts of the application actually talk to each other. So let's take a closer look at data and our data store. In our monolithic application, in the e-commerce application, we had one data store for everything, which is relational. What happens when you have both an increase in the number of requests and the amount of data? You need to be able to scale to handle, but it's relational, so you can only probably scale it vertically. I've actually seen customers trying to mimic horizontal scaling by creating multiple database instances and then sharding the data across them and then writing some logic to try to figure out where the request would go. That's not a good idea. Don't do that. And I've also seen some customers that actually take a NoSQL database, but then use that as a catch all solution, which still doesn't help, and try to write some SQL queries over that. That also doesn't work very well. The idea is you need to think about the problem you're trying to solve and the right solution for it. It could be multiple solutions, not a single solution. So what do we mean by that? Imagine that you have a huge increase in the number of users and the number of products. Let's say a few million users and hundreds of millions of products. How do you deal with that? What do you use? How do you store that? You can use a NoSQL database, something like a document database. That would fit very well, because for each you can have these little documents, and each document would be the user or the product and any number of properties. That's a good fit. So that's one solution. What if you want to be able to-- let's say, be able to retrieve the 10,000 or 1,000 most popular products? Then you need an in-memory database, something that acts like a cache for your hard data. What if you want to be able to see what the user is actually doing in terms of where are they clicking, what are they searching for, what are they purchasing, and maybe use that data to try to predict what they're going to be doing, predict what they're going to be buying? That's where a data warehouse solution would fit, something like BigQuery. What about the other stuff that might still fit well in your SQL database? Stuff like transactions, which relational databases are kind of fit for. You can keep using that for those kinds of transactions. But look at a managed solution. There are a lot of those on the cloud now. One thing also to mention is that do we want to actually start by implementing all these different NoSQL solutions, put in an in-memory database right away, build up or set up a data warehouse? That seems like a lot of work, and you don't really want to do this in one go. You want to do it in an iterative way such that you're not overwhelm yourself. So the idea is you have all your data sitting in your legacy store. And as you move forward, you chip away different parts, find the right solution to solve that problem, and use that solution solve that problem. What about edge capabilities? These are capabilities that are easy to migrate. These are sort your the low-hanging fruit or easy wins, let's say. They have certain properties that kind of make them stand up. One thing is that they're standalone in the sense that if you detach them from the monolith or break them off the monolith, they don't really break the monolith itself. At the same time, they have minimal dependency back to the monolith. So let's look at a few examples. Image upload, that seems like an easy thing to do. All you do is basically take an image and push it to a storage location. That can be a service on its own that just responds to requests. It can even be an external service. That's something that's not really tied that much to the e-commerce app that we have. Another example of thumbnail creation. That's also something that can easily be detached without much dependency back to the monolith. Both of these are easy services to do that. Let's look at something a bit more complex. We're going to be looking at one that is not easy. It tends to be everywhere when it shouldn't be everywhere. Any ideas? Is It's going to be your HTML, CSS, and JavaScript. What is that essentially? That's your UI. Think of it this way. You have this big monolithic application and in different parts of that application, depending on the feature or the capability, you're going to have HTML, CSS, JavaScript embedded there. And the UI that the user interacts with is kind of a combination of all that HTML, CSS, and JavaScript from all across your monolithic application. So what do you do here? How do we actually break this off to microservices? So the idea is we need to have all that HTML, CSS, and JavaScript-- basically all your code-- in one service. So lets do that. Here we have it, a monolith and a new service called the front-end UI that's running all of your UI logic. It has all the CSS, JavaScript, and HTML. But there's a problem here. Any ideas? The thing is it doesn't actually tie back to your monolith. It's a standalone service. So it's static. The UI doesn't really do anything. So how do we deal with this? We need some sort of API layer on the monolith such that front-end UI can interact with that there. And that layer would interact with the different pieces in the monolith, maybe how it's showcased here and the different capabilities. So how would something like that look like? Something like this, where you're looking at each capability or each part of the monolith where you extracted that HTML, CSS, and JavaScript and have some sort of API interface to interact with it. That seems pretty neat. We didn't really have to change that much code, and we managed to separate the service all on its own. But there's a problem here. Does anybody know what that is? Maybe Justin knows what that is. JUSTIN GRAYSTON: OK, so this takes us onto sticky capabilities. And it sounds a bit gross really, doesn't it? But what do we mean by sticky capabilities? Well, they're hard to pull from the monolith. If you can imagine a piece of gum, and you're trying to take it, you're always being pulled back to that monolith. , Now why are we being pulled back? Well, the code might take an approach where if you had nice modular code, you can go and stick that and create that into a sort of service, maybe not a microservice, but that could be great. But actually, some capabilities here, you know, monoliths? That code is not necessarily nicely encapsulated. So we've also got state. State could be in-memory in the monolith. It could be in the database, you know? If we want to create a service, we need to be able to have that stateless service, and we need to be able to not have to go back to the monolith. We want to actually move the capability completely away from that monolith. OK, so these were the capabilities we got in our system. We've got rid of the UI, the main UI. That's gone. Hussain's also found a couple of edge capabilities. There may be more than that, but we're going to get rid of those. Who's near a microphone? Right, who wants to take a shot? And this is really high level. There's no detail of the system. Who wants to maybe point out what they think might be a sticky capability in the system? Anybody? HUSSAIN AL MUSCATI: Come on. Don't be shy. JUSTIN GRAYSTON: Authentication. Yeah, great Yeah, OK, maybe search. Well, authentication, that was the one I was looking for. And we're going to come to that. The reason why is because of the state. So there's a few others that we identified. Now, we'd hope that, you know, if you've got teams working on a monolith, they should know the code. So there should be some easy wins. They should really understand it. But for this point, for the point of this exercise, some of these are really high level. I mean, what does schedule tasks do? I mean, that could be Bash scripts doing who knows what, which is absolutely critical for the monolith to stand up. How does that turn into a microservice service, you know? Stock tracking, shipment management, returns management. That all sounds very stateful. We're going to have to track things. Maybe we're having to talk to other systems. All of these things are not edge capabilities. All of these things, you're going to have to look at breaking them down and really understand what it is they need to do and how we can move the state to somewhere else, because we don't want to be coming back to this monolith. The other one is whatever the databases. And in our case, it's this single large relational database. I've seen a couple of migrations from monoliths to microservices, where all the monolith code is completely gone. And maybe there's people here that will know this kind of scenario. All that code's gone. Monolith is deprecated. But what's still there? In the middle of that is one single large database, which is actually the phantom legacy of the monolith, because it wasn't tackled while you migrated. And although we don't want to do-- as Hussain pointed out, we don't want to go ah, there's all these database technologies. It's going to be great. Every single microservice, let's use another one. We don't want to do that. But what we do want to do is as we start to pull these hard services out, we need to really consider how we move that state to somewhere else, like whether it's in-memory, NoSQL. But you want to consider that as you go, right? Don't try and do it all at once. And then we got user auth points. So user auth. The monolith is handling log in. It's handling log out. It's handling session management. So how are we going to deal with that? Well, let's take that one first. So we had this notion that we can put APIs everywhere. But there's a big problem with this. Well, we haven't actually moved it from the monolith. We haven't actually done anything. We've just created an API here, and we really want to make sure that our platform going forward is on the left-hand side, not in the monolith. So let's move it over there. That was very easy. I press the clicker and now my user auth's over there. I want one of these. So how do we do that? Well, if you are lucky and your monolith has nice encapsulated code, all nice and modular, maybe you could just lift and create a mini monolith, which just handles the authentication. You could still use that database. Maybe you could take the user tables out of that database and just have a smaller relational database. You know, you just keep that as a domain-specific database We didn't do that. We decided to use Firebase auth. And this is not necessarily a pitch just for Firebase auth just because it's Google. You know, you could choose anything. But what we chose is to move completely away from the monolith. Now, we've got all these users over here, but with Firebase, we can use the CLI tool to import those existing users into Firebase auth. Now, the reason why we chose this is because, actually, it was a lot less work, and it gave us extra benefits. So it fully broke us away from the monolith. We're no longer going back to that monolith for anything. It gives us more login options for very little effort. We didn't have native apps before, but now we can. And we've got one single authorization plane. And it potentially gives us real-time possibilities in the future and our clients if we wanted to. So how does this actually work? Well, we have a user API. We've created an interface that everything is going to use going forward for your user authentication. And on there, it's basically using the Firebase Admin SDK to authorize. We've got another little benefit here, because our monolith was using session cookies, but Firebase uses JWT tokens. And JWT tokens are a really great way to parse identity between microservice calls. Now, session cookie would not be so great. So that's what we chose to do. But we have another problem. It's always problems, isn't it? We have a problem now, because we created all those APIs for the quick wins for all the different capabilities in the monolith to get our front-end UI working. And what we've done is we've broken it by taking user auth out of the monolith. Not great. So how do we fix that? Well, we thought of a few different scenarios. We could put the Firebase Admin SDK all over the monolith. That sounded like a pain. What we chose to do is we altered the monolithic code to be remote procedure calls out to our new user service. And what this has done is now that's flipped the situation. Instead of auth being pulled back to the monolith, now the monolith is going to our new service environment. That's a big win, because once you've done that, it makes it easier to migrate other services. There's one more thing to do with auth. HUSSAIN AL MUSCATI: So one important thing to remember is that auth is atomic. What does that mean? Let's take a step back and think about this. Every request coming into your application needs to be authenticated. And in fact, a lot of the requests between services need to be authenticated as well. Why? Because now you don't have one monolithic application. You have these microservices talking to each other over a network. So there's more into it in terms of the added network latency because of this. Network latency has a huge impact on how your application would behave. Think of this scenario. You have your monolithic application, and whenever you send a request to it, it responded at a certain rate or with a certain delay. Now that you went to microservices, you're actually seeing a bigger delay. That shouldn't be the case, but why? It's because of the latency that's added. There are a number of things we need to think about. How many services do you have talking to each other? Do you have a continuous chain of one service calling the other? That would introduce a lot of latency. You need to think about location. Are your services talking to each other across the Atlantic or are they within a specific region? They should be within a region, because having them spread apart would add a lot of latency as well. You need to take advantages of things like caching, because if you have services that communicate a lot with each other, each time, that's latency. That's added more latency. So taking advantage of some caching would reduce that. And at the same time, you need to make sure you have really good monitoring and alerting, because things will go wrong. And when they go wrong, you need to be able to figure out what went wrong, because you have a much more complex system now with, like we said, services interacting with each other over a network. So you need good monitoring and alerting to be able to detect that. We talked a lot about how to actually take a monolith, break out some pieces off, and turn them into microservices. In the beginning, Justin said we should recommend using a serverless platform, something that's completely managed. But we see that there's an opportunity to talk about event-driven architecture, because it gives you the capability to do things very quickly, especially since we're rewriting a lot of stuff. Using event-driven architecture can help us add a lot of features or a lot of services very quickly. Let's look at a few examples. This is a case where we are creating a user. So what happens? You get a request, create request. That goes to the user API. And the user API does its thing and updates the database. At the same time, it kicks off a message through our messaging queue, pops up. And that triggers off a few functions. There's one function that sends out a welcome email through the mailer API. Another function signs the user up for the newsletter. And to be honest, you can add more and more functions very easily. And think about this here. Each function is actually acting as its own microservice, because it's isolated from the rest of the application, and it's providing a service. And it can scale based on the demand, because you're doing it on functions, which is a serverless, scalable platform. Let's look at another example. You're uploading an image. That image, let's say, is uploaded to objects storage or something like GCS. And that results in a trigger. Also kicks off a bunch of functions. One function to store image URLs in the user profile. Another function creates thumbnails from that image. And again, you can add more and more functions. And each one of these is acting as its on microservice and scalable, because you're running it on a serverless platform. That was very easy to do. There's also an opportunity here to talk about event sourcing, which is essentially when you want to be able to update state and update state in different locations within in a sequence. So look at this as an example. User purchases something. That creates a payment notification, which tells the card service, hey, this product has been purchased, or this item has been purchased. That marks it and basically kicks off a message that triggers a few functions as well. One function is updating the purchase log, which is in some data solution. In this case, Bigtable. Another function in the same sequence updating state in a different location, like in the user purchase database. So see here, we're making use of things like functions to do things very easily. And when you're building your new microservices, taking advantage of these sort of patterns will help you a lot in terms of being able to add new capabilities or new features very quickly. JUSTIN GRAYSTON: OK. So, that's a nice, big red slide. When things go wrong, they will always-- so one thing that's always mentioned in these kind of things is the culture change that happens if you're running a monolith dev team to a microservice dev team. And as we migrate more of those services over to the left-hand side, team's going to-- that number is going to grow, yeah? This case, we may have like six. But what if that was 600? The teams basically-- they take ownership of that tech stack [INAUDIBLE] whether that service is. So right up to pager duty. Now if you have 600 microservices, how on earth is the dev team looking after user auth going to have any information about how the whole wider system works? One of the things that you don't want to have is dev teams that are working in silos. Nobody likes that, and that's never productive. So what we know is we're moving to microservice and complexity increases. And as they own the whole of the service, what we don't want to have is accidents. We will have accidents, but we want to try and avoid them. So one particular example of this is the recursion problem. Hey, service team, the team that looks after service A, we're going to deploy a new feature. Great, we're going to use service B, because that's really cool. We didn't know that service B called service C that actually calls service A. And now, Hussain's latency problem is a real big problem. So how do we fix this? Now, one thing that's really important is that developers document their service. And I'm not talking about inline documentation, which all of its developers did that other time. But they're not very good at this. I'm talking about having some sort of open and accessible place for every single service as you migrate, where everybody knows where to find information about it-- What it does, so what you expect it to do, what dependencies it has, and include which other service it calls and make it discoverable. This is not technical, but this is important. It's adding a lot of complexity. We keep using that word. Complexity, complexity, complexity. And we're thinking, OK, we've got the business justification to move to microservices, but it's sounding like a pretty hard, long slog to move to microservices. So why do it? Well, I'm going to repurpose a real situation that happened to me quite a few years ago. I'm going to restyle it in the e-commerce example. But it was running on App Engine. It was a small monolith, right? We're using Datastore. State was in Memcache, you know, and data store. It was a distributed monolith. It wasn't terrible. Code base, quite small. Nothing-- no surprises there. You can see how old it is. If any front-end developers are there, can you spot something sort of which takes us back? There's a nice grunt file in there that really takes us back. If you're not a front-end dev, you probably don't care. Anyway, so all of this had simple scripts that run on the deployment pipeline, a few minor tasks bundled up all to JavaScript, deployed out to App Engine. And in this case, we've done a major deploy to days ago. It wasn't on a Friday, and it wasn't 5:00 PM. So we did have actually the ability to know that it wasn't going to crash, and it seemed to be performing really well. So, great. We can start working on the next features. And I go home in the evening. You know where this is going, right? Yeah? I go home in the evening, go to bed. 2:00 AM, mobile goes off, a really unhappy person on the other end of the phone. Website's down. And I'm thinking, hang on. Who's deploying it 2:00 AM in the morning? Luckily, being a monolithic, I've got one place to look. And being in App Engine, I go straight to the logs, and I can see that there is one file to break them all. And it was config.py I know that doesn't sound that great, but what was in there was some hacky code. Nobody ever has that, right? We had some hacky code that worked, was in the backlog to be fixed, and to make less hacky. But because it worked, it never got prioritized. And what that code did is it told the front-end what state it should be in, what mode it should be in. So let's say for e-commerce, it was on this date. From this date, we should be in Black Friday mode. On this date, Cyber Monday. You get it. On this date, blah, blah, blah. The problem was is in that deploy two days ago, somehow it got unnoticed, but some dev had changed the date to the last date that midnight was just past. And because there was no exception around the error, the whole application went, ah. I don't know what state the front-end should be in, so I'm going to show a nice 500 error. And it's the App Engine 500 error. If you haven't got a default template set up, you know how ugly that is. Luckily, it's easy fix. Just fix the date, deploy it. Up and running, no problems. Go back to bed. Well, afterwards we did a quick post-mortem. There was two key things that we decided that we learned out of this. Our alerting needed to be much better. Why did our alerting need to be much better? Because somebody had to phone me, which wasn't great, because it was my boss. But the fact is why did it take two hours for me not to notice that? Well, we had front-end cache headers on all of the logged out pages. So all of the non-user sessions, yeah, it looked like it worked. And at 2:00 AM, there isn't very many logged in users buying stuff. So our alerting need to much better, because we need to know as soon as that happens that we need to handle it. And it was a really terrible user experience. The first one is actually a lot easier in a way when you have a single service App Engine. If you had 600 microservices running, you need to really make sure that your alerting is better and you can understand the problems. OK, so what do we do next? What we did, we got all display logic, stuck in an Angular wrap, and gave it its own service, complete 100% static, no Python anymore. The backend, we quickly made APIs for everything. We had-- well, it says less monolithic. We're still a monolith, it just wasn't working quite the same. No display logic in it. We still have the same problem, though, right? One file, one exception. Whole thing goes down. But we have a real good improvement now. The front-end client can go, hey, the backend has decided to die. I'm going to show a picture of a puppy or whatever. I can do something better than the default App Engine page. So what do we do next? Well, we actually started breaking the app into domain-specific APIs. And this gives us the ability to more intelligently handle errors and keep the users unaware that everything's on fire. OK, so we got six things there. And now, I'm going back to the 600 things, right? That's going to be really hard to manage the combination of errors that possibly could happen in a 600-microservice environment. So you need you need to identify your critical paths in your microservice environment. So what does a critical path look like? And us, we want people to buy things, because otherwise there's no money to pay for the web development, and we're all unemployed. So user comes to the checkout page, and the user cart service, the thing that tells what the user has put in their cart is down, OK? Well, what we can do is we could have multiple projects for App Engine, and we could deploy those services out to multiple regions. And the client can go-- I'm in US East. I'm going to try US West. OK, that's OK. What happens if that disappears? OK, so what do we do now? Well, we need to make sure that people buy things. So shopping carts, they used to use cookies to track. In a multi-device era, that doesn't work, so that's why we have the service. But let's not discount it. We could use local storage, and we could keep the client keeping a record. Maybe we know when the two services are out, they're actually-- we'll say a message to say it may no be the latest information, but the key thing is the user can carry on and make that purchase. So you can see there that wasn't the microservice team necessarily that actually had to fix that. It was the people who were looking after the front-end UI. So you need to make sure that the critical paths are jointly owned and actually properly owned by the teams that it effects. You know, don't be blinkered down one technical path either. There may be multiple solutions, and the thing that should drive those solutions should be the business. What is it you are trying to do? I can' see we're running a bit out of time, so I'm going to speed up. OK, so the user services was down. Right, that's really simple. But actually, why was the use of service down? The user had caught service. It was because the user service was out. And let's say we didn't go to Firebase, and we still have that relational database. That relational database is out. OK, so now we have even more teams on the critical path. We have the people looking after the database, we have the user service. What can we do? Well, you know, we could have an explosion of services dying, because, as Hussain said, the user service is atomic. So now, we have lots of complexity on what has just happened. One service is down and our whole platform is going out. And It's feeling like microservices is hard work. OK, so we'll file over to a read replica. We'll limp on. We'll be in read-only mode. We won't do logins, and all the clients will stop doing logins. But users that are currently in the system, that's fine. OK, so region one, we send to region two. Region two is now sending both things. And yeah, welcome to your self-created denial of service attack on region two. And everything is out. Hey, I thought this was a good idea, right? OK so let's just take a step back a little bit. Things you ought to remember. Again, six services, not 600. Let's think how complex this is with 600. I'm not going to do that, actually. Let's keep it simple with six. You can see that actually request three to-- even a service that isn't directly relying on it is going to be out. Everybody has to understand what the error situation is. And what you need to do is you need to file fast. What if you had timeouts set at five minutes for all these? All the requests will balloon. Oh, that's OK. We're using serverless, right? We can scale. Oh, you're paying for that, you know? So you need to fail fast. You need to send an error message so that everything in your system-- and it should be part of your documentation too-- understands what it should do in this situation. In this situation, actually, we're going to throw in an extra piece of technology. And this is probably a good time to throw in an extra piece of technology. We're going to use an in-memory database. We know we can scale that. We could use Cloud Memorystore. I mean, we know we're not going to DoS another region. So to do this, maybe we used the event sourcing idea. We have two cloud functions pushing to both databases. What we can do, and the reason why the amber is the user service knows that we're in file over mode. And it's going to send an error message, which tells everything, that we're only in a read-only situation. Hopefully, that's given you enough time to restore that database. Now, if this was a user service and it was me, I would probably make the in-memory database the primary database and have another database as a fallback, because, as Hussain pointed out earlier, it was atomic. It's going to be very, very chatty. So the advantages of having that really low latency is going to be important. OK, speeding through. So microservice and faults. It's complicated, right? But with that complexity, we have more options, right? We can safely say that even though that is more complex, we can give the user a much better user experience. If the user has a much better user experience, they're more likely to stay with your website or your platform. Circuit breakers. If you all know microservices, I'm going to explain what one of those are. But it's important that you have them. And maybe you limit the amount of languages you have in your platform, so you have standardized circuit breakers. You know, fail fast and have a plan B. Bulkheads. Create partitions between the domains. Make sure that when you have that catastrophic failure, your whole system doesn't go down. Now, the bulkheads thing comes from boats or a submarine, where you put a bulkhead between the different partitions to stop the whole boat sinking. But I want you to take away-- if you're going into your board, you're going to your CTO, or you're advising a customer to go into-- you're going to go on this journey. You don't want to show them these slides, right? Because that may frighten them, right? So how can you explain this migration process and handling all of this to a non-technical person? OK, so is it anybody's birthday today? No? [AIR BLOWING] Sure it's nobody's birthday? Let's say the business objective is to hold a volume of air. This is your volume of air, thank you very much. This is your monolith, and the volume of air is your logic. [BALLOON POPS] There we go, one error. It's gone down. Sorry. Did that land on your head? [HUSSAIN LAUGHS] This is bubble wrap, right? We can hold the same volume of air with our bubble wrap. But wasn't it a lot easier to understand the balloon? It had a nice shape. We could understand it. Microservices, I don't know what this is. It's multi-layered, but I'm holding the same volume of air, roughly. Caveat, big needle. And probably if I get addicted to popping the cells. You get the point, right? We're only going through those certain things, and the rest of the system stays functioning. That has got to be your objective. And as you migrate from your monolith with all of the culture change and with everything, try and keep that in your head, because that's where you want to try and achieve to. And with that, I'll pass you back. Hang on. HUSSAIN AL MUSCATI: Yeah? [LAUGHTER] JUSTIN GRAYSTON: Oh, yeah. I've got one more thing. Hold on, Hussain. Semantic versioning. Got five minutes. We're good. Semantic versioning. Everybody knows what this is, yes? Anybody doesn't know this, put your hands up. Don't be shy. OK, go look it up on the internet. It's right there. Take a picture. [LAUGHTER] Why is it important and why am I doing that? Well, because this is your contract to everybody who uses your service. You never had to do this with a monolith. Well, actually, you might have had a version number, but it could have been kind of meaningless. Here, if that service team A wanted to use service B, they could see which version they had started using. You probably want to have a deprecation policy throughout the organization, so you understand that if somebody changes the major version and they have a breaking change, you know you have a year, right? Or six months or whatever you choose to put that on the backlog and get that changed in all of the services. Without this, you have the classic problem of somebody making a backward, incompatible change, deploying that into production. You're back to that situation, where everything dies. So make sure you use semantic versioning. HUSSAIN AL MUSCATI: So let's review where we are. What have we achieved with our monolithic application that we started with, the one that only had a load balancer and that one instance and a data store? We added a new web front UI. We added a bunch of new APIs and migrated something difficult like auth into a microservice. We identified a bunch of easy wins with the edge capabilities and identified the difficult ones with the sticky capabilities. We also looked at error handling and identified what critical paths are. And right now, we have a deeper understanding of what the system is. JUSTIN GRAYSTON: Hopefully. HUSSAIN AL MUSCATI: Yeah, hopefully. So what do we do next? What's after this? You need to continue your journey. We don't really have a full-fledged microservice system, nor we have a monolith. We have something in between. The idea is take what we've done here and iterate, follow the same process. Extract services, extract capabilities out, turn them to microservices. And it's a continuous way to evolve. You kill different parts of the monolith and create new microservices and do it at your own pace to avoid any of the issues we saw. And one thing that's important to think about is the culture of change that this will have on your-- culture change that this will-- how this will impact your organization. Your organization could have been a monolith, where you have one big team that manages this monolithic application. JUSTIN GRAYSTON: And we could have done 15 minutes just on that, right? HUSSAIN AL MUSCATI: Yeah. JUSTIN GRAYSTON: So-- HUSSAIN AL MUSCATI: And right now, what's going to happen is that you're going to have services, different microservices. And you're going to have different teams managing those microservices as well. So make sure to embrace that, because that's the best way to move forward. So good luck on your journey. Understand why you're making the decisions you are and be aware that things will go wrong, so be careful while you're moving along. Go on your own pace. Don't try to do this all at once. This is an iterative process. And make sure you balance between business motivations and whatever trade-offs and risks you're taking. Thank you very much for attending this talk. JUSTIN GRAYSTON: We have two minutes. Does anybody want to do a QA? HUSSAIN AL MUSCATI: Yeah. [APPLAUSE] JUSTIN GRAYSTON: Ah, thank you. [MUSIC PLAYING]

Info

Channel: Google Cloud Tech

Views: 3,525

Rating: 5 out of 5

Keywords: type: Conference Talk (Full production);, pr_pr: Google Cloud Next, purpose: Educate

Id: uiB01f2ZuWg

Channel Id: undefined

Length: 48min 38sec (2918 seconds)

Published: Thu Jul 26 2018