[MUSIC PLAYING] JUSTIN GRAYSTON:
Hello, everyone. Welcome to the last session. You made it. Woo! AUDIENCE: Woo! HUSSAIN AL MUSCATI:
Hi, everyone. My name's Hussein Al Muscati JUSTIN GRAYSTON: And
I'm Justin Grayston. HUSSAIN AL MUSCATI: And
we're both customer engineers based out of London. So what does a
customer engineer? Well, we work with customers
to design and build solutions for the cloud. And to be honest,
we still see a lot of customers struggling to
deal with their monoliths. And they're trying to figure
out how to migrate them to a microservices world. So you guys look to
be very tech savvy. I mean, you're here. Does anyone here not know
what a microservices is? Can we see a show of hands? JUSTIN GRAYSTON:
Well, that's good. That's handy. [LAUGHS] I'm going to
do the reverse, right? Who knows what a monolith is? OK, we got a good
show of hands there. HUSSAIN AL MUSCATI:
Yeah, that's good. JUSTIN GRAYSTON: OK, yeah. I'm going to do some
more data extraction. OK, so who here, as part of your
day job, works with monolith? OK, that's quite a few of you. Wow. OK, now, this is a select
group, but we've all been here. Who here secretly knows in their
heart of hearts what they work on is a monolith, but
everybody else decides to ignore the fact? That it is a few. That's good. That's good. OK, so we're going to try and-- we haven't got long, and
it's a massive topic, but what we're going
to try and do today is we're going to walk
through an example and hopefully give
you some tips. But While we're doing this,
what we're going to do is we're going to use
serverless compute. Now, obviously there's been some
great announcements this week, you know, that
opens up serverless to Kubernetes and GKE. But serverless really
enables us to concentrate on the migration and not the
standing up of the DevOps. We just want to be able
to migrate and scale. So let's kick off and start
looking at our monolith. HUSSAIN AL MUSCATI: All right. So monolith applications have
several things in common. They tend to be
standalone, isolated. They're self-contained
in the sense that all the logic's contained
within a single instance or a single unit. And they tend to be large and
very complex to work with. And a lot of the time,
it's even difficult to figure out where
do you actually start when you
want to understand how your application works. So we realize that this is
a very opinionated topic, and there's a lot of
content out there. We're going to be focusing on
showing you a few examples, showing you a few
techniques on how to break off different
parts of your monolith, convert them to microservices. And hopefully this is
something you can take with you to your organization. So we're going to start with-- as an example, we're
going to be using an e-commerce application. Why an e-commerce application? Well, we think that a lot of the
properties in this application resemble other
monoliths out there. So there's a user interface
that's generated for the user to see and interact with. We have a load balancer
that handles requests from different locations
and distributes them among instances. All the application logic is
contained within a single unit. State is associated within
the instance itself. And in terms of
storage, well, you're probably using
something really ancient as a catch-all solution
for all your data, maybe something like
a relational database. So let's take a
step back and think. When do you actually want
to move to microservices? What motivates it? There are several good
candidate applications that are good for microservices. Web applications, for example. They tend to be simple
but require the ability to handle a lot of traffic,
so they need to scale. Enterprise applications tend
to be a lot more complex, very huge, and they tend
to be applications that need to do everything. But what drives it? In the end, it's the
business requirements. Essentially, what does
the business actually need to succeed? Does it need the application
to be scalable such that it can handle increased
requests and volumes of data? Or do you need to be able to
optimize the application such that you would only
be using the resources that you actually need? Or is it more about being able
to remain competitive and have the ability to develop
features very quickly, so you need high development velocity? There are a lot of reasons. Let's look at an example
to showcase this better. This is our monolith. And it has all the
main components, has a UI, a load balancer,
application, an application instance, and a database. And imagine that you have a
celebrity that tweets about one of the products
that you're selling on your e-commerce application. And let's say that
tweet goes viral. So suddenly, you
have a huge number of requests coming in for people
searching for that product, wanting to put it
in a shopping cart, and maybe purchasing it as well. So you need to be able to
handle this increase in volume of requests. What do you do? You scale. You can scale vertically. That will only take
you so far, so you have to scale horizontally. This is how it would look. Basically, you're
adding instances. But think about this for a sec. What are you doing here? You're replicating the whole
application on each instance. Do you actually need that? The scenario might only need
you to scale certain bits, but you're scaling the
whole application, which means you're using
a lot of resources that you don't actually need. And at the same time, we
said that state is associated with the application instance. How do you do that? Well, you can set
up the load balancer to have a sticky session
with that instance for every specific user. What if that application
instance falls over? Then everything's
lost for that user. You could move the
state to the database, but then you're overwhelming
the database even more. And if that falls over, you have
a whole set of other problems to deal with. So imagine that you also want
to be able to add features. And let's say you
do a small UI tweak, but because this is a
monolith, you actually have to deploy the
whole application. And let's say you
made a mistake. Your whole application goes down
because of a simple UI tweak, and there's nothing
you can do about it. The thing is here the main
idea behind microservices is that it helps
you bypass all that, because it gives you ability
to do partial deployments. It's published in
small iterations. And it gives the
ability to better handle scale and optimization. JUSTIN GRAYSTON:
So all good reasons to move to microservices,
but should we microservice the world? I mean, does everything
need to be in microservices? So this is a talk
about migration to microservices,
but one of the things you've got to do, you've got to
make sure that you sense check, that you're not just
following the internet or what somebody else has done. If you're not trying to
fix a problem of scale or optimization, then
what are your motivations to go into microservices? Is it a complexity thing? Well, microservices
may not actually make your life any easier. I'm sure most of you
are aware of that. You might be running specialist
software or hardware or maybe both. Would you then go
to microservices? What are you actually gaining? And just like you've got to have
the business reasons to move to microservices, well, let's
think of the business reasons to stick with a monolith. If your monolith is serving
the business well, it's secure, it's stable, then what is
the business motivation? And I think anybody
who undertakes this journey is going to
require a lot of money, time, and emotional toil. You must make sure that
you sense check first. But with our example, with our
web app, our e-commerce app, we're taking a scenario
that we actually-- it was built some time ago. And we had a regional
customer base. And now we've gone
global, which is great. But the problem is
our system's now suffering from stability
and deployment issues that Hussain has
just highlighted. We've also noticed that
our customer competition is outperforming us. They're deploying new
features faster than us, and they're eating
our breakfast. So we have plenty
of justification to move to microservices. So let's begin. HUSSAIN AL MUSCATI: Yeah. So what we're going
to do is start by focusing on capabilities. What are capabilities? They're essentially what
the application does or what the code does,
not what the code is. Code can be
rewritten, refactored. A lot of the time,
you're going to have to rewrite stuff from scratch. That's not the point here. What we need to
focus on is being able to take these
capabilities that are in your current
monolithic application and move them to your
microservices application. So that's going to be the
focus of this discussion. So what capabilities are
we going to talk about? We're going to talk
about a bunch that are common among monoliths as well? Storage, do we actually
stick with one database or look at several? We're going to be
discussing what we call as edge capabilities. These are ones that can easily
be detached from the monolith. We're also going to be talking
about sticky capabilities. These keep you stuck
to the monolith. And we're going to also talk
about networking in the sense that how the different
parts of the application actually talk to each other. So let's take a closer look
at data and our data store. In our monolithic application,
in the e-commerce application, we had one data
store for everything, which is relational. What happens when you
have both an increase in the number of requests
and the amount of data? You need to be able
to scale to handle, but it's relational, so you
can only probably scale it vertically. I've actually seen
customers trying to mimic horizontal scaling
by creating multiple database instances and then sharding
the data across them and then writing some
logic to try to figure out where the request would go. That's not a good idea. Don't do that. And I've also seen
some customers that actually take
a NoSQL database, but then use that as a
catch all solution, which still doesn't help, and
try to write some SQL queries over that. That also doesn't
work very well. The idea is you
need to think about the problem you're
trying to solve and the right solution for it. It could be multiple solutions,
not a single solution. So what do we mean by that? Imagine that you have a huge
increase in the number of users and the number of products. Let's say a few million
users and hundreds of millions of products. How do you deal with that? What do you use? How do you store that? You can use a NoSQL
database, something like a document database. That would fit very well,
because for each you can have these little
documents, and each document would be the user or the product
and any number of properties. That's a good fit. So that's one solution. What if you want to be able to-- let's say, be able to
retrieve the 10,000 or 1,000 most popular products? Then you need an
in-memory database, something that acts like a
cache for your hard data. What if you want to be able to
see what the user is actually doing in terms of where
are they clicking, what are they searching for,
what are they purchasing, and maybe use that data to
try to predict what they're going to be doing, predict what
they're going to be buying? That's where a data warehouse
solution would fit, something like BigQuery. What about the other
stuff that might still fit well in your SQL database? Stuff like transactions,
which relational databases are kind of fit for. You can keep using that for
those kinds of transactions. But look at a managed solution. There are a lot of
those on the cloud now. One thing also to
mention is that do we want to actually
start by implementing all these different NoSQL solutions,
put in an in-memory database right away, build up or
set up a data warehouse? That seems like a lot of
work, and you don't really want to do this in one go. You want to do it in
an iterative way such that you're not
overwhelm yourself. So the idea is you
have all your data sitting in your legacy store. And as you move forward, you
chip away different parts, find the right solution
to solve that problem, and use that solution
solve that problem. What about edge capabilities? These are capabilities
that are easy to migrate. These are sort your the
low-hanging fruit or easy wins, let's say. They have certain
properties that kind of make them stand up. One thing is that they're
standalone in the sense that if you detach
them from the monolith or break them off the
monolith, they don't really break the monolith itself. At the same time, they have
minimal dependency back to the monolith. So let's look at a few examples. Image upload, that seems
like an easy thing to do. All you do is
basically take an image and push it to a
storage location. That can be a service on its own
that just responds to requests. It can even be an
external service. That's something
that's not really tied that much to the
e-commerce app that we have. Another example of
thumbnail creation. That's also something
that can easily be detached without
much dependency back to the monolith. Both of these are easy
services to do that. Let's look at something
a bit more complex. We're going to be looking
at one that is not easy. It tends to be everywhere when
it shouldn't be everywhere. Any ideas? Is It's going to be your
HTML, CSS, and JavaScript. What is that essentially? That's your UI. Think of it this way. You have this big
monolithic application and in different parts
of that application, depending on the feature
or the capability, you're going to have HTML,
CSS, JavaScript embedded there. And the UI that the
user interacts with is kind of a combination of all
that HTML, CSS, and JavaScript from all across your
monolithic application. So what do you do here? How do we actually break
this off to microservices? So the idea is we need to
have all that HTML, CSS, and JavaScript--
basically all your code-- in one service. So lets do that. Here we have it, a
monolith and a new service called the front-end UI that's
running all of your UI logic. It has all the CSS,
JavaScript, and HTML. But there's a problem here. Any ideas? The thing is it doesn't actually
tie back to your monolith. It's a standalone service. So it's static. The UI doesn't
really do anything. So how do we deal with this? We need some sort of API
layer on the monolith such that front-end UI can
interact with that there. And that layer would interact
with the different pieces in the monolith, maybe
how it's showcased here and the different capabilities. So how would something
like that look like? Something like
this, where you're looking at each capability
or each part of the monolith where you extracted that
HTML, CSS, and JavaScript and have some sort of API
interface to interact with it. That seems pretty neat. We didn't really have to
change that much code, and we managed to separate
the service all on its own. But there's a problem here. Does anybody know what that is? Maybe Justin knows what that is. JUSTIN GRAYSTON:
OK, so this takes us onto sticky capabilities. And it sounds a bit
gross really, doesn't it? But what do we mean by
sticky capabilities? Well, they're hard to
pull from the monolith. If you can imagine
a piece of gum, and you're trying
to take it, you're always being pulled
back to that monolith. , Now why are we
being pulled back? Well, the code might
take an approach where if you had
nice modular code, you can go and stick
that and create that into a sort of service,
maybe not a microservice, but that could be great. But actually, some capabilities
here, you know, monoliths? That code is not necessarily
nicely encapsulated. So we've also got state. State could be in-memory
in the monolith. It could be in the
database, you know? If we want to
create a service, we need to be able to have
that stateless service, and we need to be
able to not have to go back to the monolith. We want to actually move the
capability completely away from that monolith. OK, so these were
the capabilities we got in our system. We've got rid of
the UI, the main UI. That's gone. Hussain's also found a
couple of edge capabilities. There may be more
than that, but we're going to get rid of those. Who's near a microphone? Right, who wants to take a shot? And this is really high level. There's no detail of the system. Who wants to maybe
point out what they think might be a sticky
capability in the system? Anybody? HUSSAIN AL MUSCATI: Come on. Don't be shy. JUSTIN GRAYSTON: Authentication. Yeah, great Yeah,
OK, maybe search. Well, authentication, that
was the one I was looking for. And we're going to come to that. The reason why is
because of the state. So there's a few others
that we identified. Now, we'd hope that,
you know, if you've got teams working on a monolith,
they should know the code. So there should
be some easy wins. They should really
understand it. But for this point, for
the point of this exercise, some of these are
really high level. I mean, what does
schedule tasks do? I mean, that could be Bash
scripts doing who knows what, which is absolutely critical
for the monolith to stand up. How does that turn into a
microservice service, you know? Stock tracking, shipment
management, returns management. That all sounds very stateful. We're going to have
to track things. Maybe we're having to
talk to other systems. All of these things are
not edge capabilities. All of these
things, you're going to have to look at
breaking them down and really understand
what it is they need to do and how we can move the state
to somewhere else, because we don't want to be coming
back to this monolith. The other one is
whatever the databases. And in our case, it's this
single large relational database. I've seen a couple of
migrations from monoliths to microservices, where all
the monolith code is completely gone. And maybe there's
people here that will know this kind of scenario. All that code's gone. Monolith is deprecated. But what's still there? In the middle of that is
one single large database, which is actually the phantom
legacy of the monolith, because it wasn't tackled
while you migrated. And although we
don't want to do-- as Hussain pointed out,
we don't want to go ah, there's all these
database technologies. It's going to be great. Every single microservice,
let's use another one. We don't want to do that. But what we do
want to do is as we start to pull these
hard services out, we need to really
consider how we move that state
to somewhere else, like whether it's
in-memory, NoSQL. But you want to consider
that as you go, right? Don't try and do it all at once. And then we got
user auth points. So user auth. The monolith is handling log in. It's handling log out. It's handling
session management. So how are we going
to deal with that? Well, let's take that one first. So we had this notion that
we can put APIs everywhere. But there's a big
problem with this. Well, we haven't actually
moved it from the monolith. We haven't actually
done anything. We've just created an
API here, and we really want to make sure that
our platform going forward is on the left-hand side,
not in the monolith. So let's move it over there. That was very easy. I press the clicker and now
my user auth's over there. I want one of these. So how do we do that? Well, if you are lucky
and your monolith has nice encapsulated
code, all nice and modular, maybe you could
just lift and create a mini monolith, which just
handles the authentication. You could still
use that database. Maybe you could take the user
tables out of that database and just have a smaller
relational database. You know, you just keep that
as a domain-specific database We didn't do that. We decided to use Firebase auth. And this is not necessarily a
pitch just for Firebase auth just because it's Google. You know, you could
choose anything. But what we chose is
to move completely away from the monolith. Now, we've got all
these users over here, but with Firebase,
we can use the CLI tool to import those existing
users into Firebase auth. Now, the reason why we chose
this is because, actually, it was a lot less work, and
it gave us extra benefits. So it fully broke us
away from the monolith. We're no longer going back to
that monolith for anything. It gives us more login options
for very little effort. We didn't have native apps
before, but now we can. And we've got one single
authorization plane. And it potentially gives
us real-time possibilities in the future and our
clients if we wanted to. So how does this actually work? Well, we have a user API. We've created an
interface that everything is going to use going forward
for your user authentication. And on there, it's basically
using the Firebase Admin SDK to authorize. We've got another
little benefit here, because our monolith was
using session cookies, but Firebase uses JWT tokens. And JWT tokens are
a really great way to parse identity between
microservice calls. Now, session cookie
would not be so great. So that's what we chose to do. But we have another problem. It's always problems, isn't it? We have a problem now, because
we created all those APIs for the quick wins for all
the different capabilities in the monolith to get
our front-end UI working. And what we've done
is we've broken it by taking user auth
out of the monolith. Not great. So how do we fix that? Well, we thought of a
few different scenarios. We could put the Firebase Admin
SDK all over the monolith. That sounded like a pain. What we chose to do is we
altered the monolithic code to be remote procedure calls
out to our new user service. And what this has done is now
that's flipped the situation. Instead of auth being
pulled back to the monolith, now the monolith is going to
our new service environment. That's a big win,
because once you've done that, it makes it easier
to migrate other services. There's one more
thing to do with auth. HUSSAIN AL MUSCATI: So one
important thing to remember is that auth is atomic. What does that mean? Let's take a step back
and think about this. Every request coming
into your application needs to be authenticated. And in fact, a lot of the
requests between services need to be
authenticated as well. Why? Because now you don't have
one monolithic application. You have these
microservices talking to each other over a network. So there's more into it in
terms of the added network latency because of this. Network latency
has a huge impact on how your application
would behave. Think of this scenario. You have your
monolithic application, and whenever you
send a request to it, it responded at a certain
rate or with a certain delay. Now that you went
to microservices, you're actually
seeing a bigger delay. That shouldn't be
the case, but why? It's because of the
latency that's added. There are a number of things
we need to think about. How many services do you
have talking to each other? Do you have a continuous
chain of one service calling the other? That would introduce
a lot of latency. You need to think
about location. Are your services talking to
each other across the Atlantic or are they within
a specific region? They should be within a
region, because having them spread apart would add
a lot of latency as well. You need to take
advantages of things like caching, because
if you have services that communicate a lot
with each other, each time, that's latency. That's added more latency. So taking advantage of some
caching would reduce that. And at the same time,
you need to make sure you have really good
monitoring and alerting, because things will go wrong. And when they go
wrong, you need to be able to figure out
what went wrong, because you have a much more
complex system now with, like we said,
services interacting with each other over a network. So you need good
monitoring and alerting to be able to detect that. We talked a lot about how
to actually take a monolith, break out some pieces off, and
turn them into microservices. In the beginning, Justin
said we should recommend using a serverless
platform, something that's completely managed. But we see that
there's an opportunity to talk about
event-driven architecture, because it gives
you the capability to do things very quickly,
especially since we're rewriting a lot of stuff. Using event-driven
architecture can help us add a lot of features or
a lot of services very quickly. Let's look at a few examples. This is a case where
we are creating a user. So what happens? You get a request,
create request. That goes to the user API. And the user API does its
thing and updates the database. At the same time, it
kicks off a message through our messaging
queue, pops up. And that triggers
off a few functions. There's one function that
sends out a welcome email through the mailer API. Another function signs the
user up for the newsletter. And to be honest, you can
add more and more functions very easily. And think about this here. Each function is actually
acting as its own microservice, because it's isolated from
the rest of the application, and it's providing a service. And it can scale based on
the demand, because you're doing it on functions, which
is a serverless, scalable platform. Let's look at another example. You're uploading an image. That image, let's say, is
uploaded to objects storage or something like GCS. And that results in a trigger. Also kicks off a
bunch of functions. One function to store image
URLs in the user profile. Another function creates
thumbnails from that image. And again, you can add
more and more functions. And each one of these
is acting as its on microservice and scalable,
because you're running it on a serverless platform. That was very easy to do. There's also an
opportunity here to talk about event sourcing,
which is essentially when you want to be able to
update state and update state in different locations
within in a sequence. So look at this as an example. User purchases something. That creates a payment
notification, which tells the card service,
hey, this product has been purchased, or this
item has been purchased. That marks it and
basically kicks off a message that triggers
a few functions as well. One function is updating
the purchase log, which is in some data solution. In this case, Bigtable. Another function in the
same sequence updating state in a different location,
like in the user purchase database. So see here, we're
making use of things like functions to do
things very easily. And when you're building
your new microservices, taking advantage of
these sort of patterns will help you a lot
in terms of being able to add new capabilities
or new features very quickly. JUSTIN GRAYSTON: OK. So, that's a nice,
big red slide. When things go wrong,
they will always-- so one thing that's
always mentioned in these kind of things
is the culture change that happens if you're
running a monolith dev team to a microservice dev team. And as we migrate more of those
services over to the left-hand side, team's going to-- that number is
going to grow, yeah? This case, we may have like six. But what if that was 600? The teams basically-- they
take ownership of that tech stack [INAUDIBLE]
whether that service is. So right up to pager duty. Now if you have 600
microservices, how on earth is the dev team looking
after user auth going to have any information
about how the whole wider system works? One of the things that
you don't want to have is dev teams that
are working in silos. Nobody likes that, and
that's never productive. So what we know is we're moving
to microservice and complexity increases. And as they own the whole of
the service, what we don't want to have is accidents. We will have accidents, but
we want to try and avoid them. So one particular example of
this is the recursion problem. Hey, service team, the team
that looks after service A, we're going to
deploy a new feature. Great, we're going
to use service B, because that's really cool. We didn't know that service B
called service C that actually calls service A. And now,
Hussain's latency problem is a real big problem. So how do we fix this? Now, one thing that's
really important is that developers
document their service. And I'm not talking about
inline documentation, which all of its developers
did that other time. But they're not
very good at this. I'm talking about having some
sort of open and accessible place for every
single service as you migrate, where
everybody knows where to find information about it-- What it does, so what
you expect it to do, what dependencies it has, and
include which other service it calls and make it discoverable. This is not technical,
but this is important. It's adding a lot of complexity. We keep using that word. Complexity,
complexity, complexity. And we're thinking, OK, we've
got the business justification to move to
microservices, but it's sounding like a
pretty hard, long slog to move to microservices. So why do it? Well, I'm going to repurpose
a real situation that happened to me quite a few years ago. I'm going to restyle it
in the e-commerce example. But it was running
on App Engine. It was a small monolith, right? We're using Datastore. State was in Memcache,
you know, and data store. It was a distributed monolith. It wasn't terrible. Code base, quite small. Nothing-- no surprises there. You can see how old it is. If any front-end
developers are there, can you spot something sort
of which takes us back? There's a nice
grunt file in there that really takes us back. If you're not a front-end
dev, you probably don't care. Anyway, so all of this
had simple scripts that run on the deployment
pipeline, a few minor tasks bundled up all to JavaScript,
deployed out to App Engine. And in this case, we've done
a major deploy to days ago. It wasn't on a Friday,
and it wasn't 5:00 PM. So we did have
actually the ability to know that it
wasn't going to crash, and it seemed to be
performing really well. So, great. We can start working
on the next features. And I go home in the evening. You know where this
is going, right? Yeah? I go home in the
evening, go to bed. 2:00 AM, mobile goes off,
a really unhappy person on the other end of the phone. Website's down. And I'm thinking, hang on. Who's deploying it
2:00 AM in the morning? Luckily, being a monolithic,
I've got one place to look. And being in App Engine,
I go straight to the logs, and I can see that there is
one file to break them all. And it was config.py I know that doesn't
sound that great, but what was in there
was some hacky code. Nobody ever has that, right? We had some hacky
code that worked, was in the backlog to be
fixed, and to make less hacky. But because it worked,
it never got prioritized. And what that code
did is it told the front-end what
state it should be in, what mode it should be in. So let's say for e-commerce,
it was on this date. From this date, we should
be in Black Friday mode. On this date, Cyber Monday. You get it. On this date, blah, blah, blah. The problem was is in
that deploy two days ago, somehow it got
unnoticed, but some dev had changed the date
to the last date that midnight was just past. And because there was no
exception around the error, the whole application went, ah. I don't know what state
the front-end should be in, so I'm going to show
a nice 500 error. And it's the App
Engine 500 error. If you haven't got a
default template set up, you know how ugly that is. Luckily, it's easy fix. Just fix the date, deploy it. Up and running, no problems. Go back to bed. Well, afterwards we did
a quick post-mortem. There was two key
things that we decided that we learned out of this. Our alerting needed
to be much better. Why did our alerting
need to be much better? Because somebody had to
phone me, which wasn't great, because it was my boss. But the fact is why did
it take two hours for me not to notice that? Well, we had front-end cache
headers on all of the logged out pages. So all of the non-user sessions,
yeah, it looked like it worked. And at 2:00 AM,
there isn't very many logged in users buying stuff. So our alerting
need to much better, because we need to know
as soon as that happens that we need to handle it. And it was a really
terrible user experience. The first one is actually
a lot easier in a way when you have a single
service App Engine. If you had 600
microservices running, you need to really make sure
that your alerting is better and you can understand
the problems. OK, so what do we do next? What we did, we got all display
logic, stuck in an Angular wrap, and gave it its own
service, complete 100% static, no Python anymore. The backend, we quickly
made APIs for everything. We had-- well, it
says less monolithic. We're still a monolith, it just
wasn't working quite the same. No display logic in it. We still have the same
problem, though, right? One file, one exception. Whole thing goes down. But we have a real
good improvement now. The front-end
client can go, hey, the backend has decided to die. I'm going to show a picture
of a puppy or whatever. I can do something better than
the default App Engine page. So what do we do next? Well, we actually
started breaking the app into domain-specific APIs. And this gives us the ability
to more intelligently handle errors and keep the users
unaware that everything's on fire. OK, so we got six things there. And now, I'm going back
to the 600 things, right? That's going to be
really hard to manage the combination of
errors that possibly could happen in a
600-microservice environment. So you need you need to
identify your critical paths in your microservice
environment. So what does a critical
path look like? And us, we want
people to buy things, because otherwise
there's no money to pay for the web development,
and we're all unemployed. So user comes to
the checkout page, and the user cart
service, the thing that tells what the user has
put in their cart is down, OK? Well, what we can do is we could
have multiple projects for App Engine, and we could
deploy those services out to multiple regions. And the client can go-- I'm in US East. I'm going to try US West. OK, that's OK. What happens if that disappears? OK, so what do we do now? Well, we need to make sure
that people buy things. So shopping carts, they used
to use cookies to track. In a multi-device era,
that doesn't work, so that's why we
have the service. But let's not discount it. We could use local
storage, and we could keep the client keeping a record. Maybe we know when the
two services are out, they're actually-- we'll say a message to say
it may no be the latest information, but
the key thing is the user can carry on
and make that purchase. So you can see there that
wasn't the microservice team necessarily that
actually had to fix that. It was the people who were
looking after the front-end UI. So you need to make sure that
the critical paths are jointly owned and actually
properly owned by the teams that it effects. You know, don't be blinkered
down one technical path either. There may be multiple
solutions, and the thing that should drive
those solutions should be the business. What is it you are trying to do? I can' see we're running
a bit out of time, so I'm going to speed up. OK, so the user
services was down. Right, that's really simple. But actually, why was
the use of service down? The user had caught service. It was because the
user service was out. And let's say we
didn't go to Firebase, and we still have that
relational database. That relational database is out. OK, so now we have even more
teams on the critical path. We have the people looking
after the database, we have the user service. What can we do? Well, you know, we could have
an explosion of services dying, because, as Hussain said,
the user service is atomic. So now, we have
lots of complexity on what has just happened. One service is down and our
whole platform is going out. And It's feeling like
microservices is hard work. OK, so we'll file over
to a read replica. We'll limp on. We'll be in read-only mode. We won't do logins, and all the
clients will stop doing logins. But users that are currently
in the system, that's fine. OK, so region one, we
send to region two. Region two is now
sending both things. And yeah, welcome to your
self-created denial of service attack on region two. And everything is out. Hey, I thought this
was a good idea, right? OK so let's just take a
step back a little bit. Things you ought to remember. Again, six services, not 600. Let's think how complex
this is with 600. I'm not going to
do that, actually. Let's keep it simple with six. You can see that
actually request three to-- even a service that
isn't directly relying on it is going to be out. Everybody has to understand
what the error situation is. And what you need to do
is you need to file fast. What if you had timeouts set
at five minutes for all these? All the requests will balloon. Oh, that's OK. We're using serverless, right? We can scale. Oh, you're paying
for that, you know? So you need to fail fast. You need to send an error
message so that everything in your system-- and it should be part
of your documentation too-- understands what it
should do in this situation. In this situation,
actually, we're going to throw in an
extra piece of technology. And this is probably a good
time to throw in an extra piece of technology. We're going to use an
in-memory database. We know we can scale that. We could use Cloud Memorystore. I mean, we know we're not
going to DoS another region. So to do this, maybe we used
the event sourcing idea. We have two cloud functions
pushing to both databases. What we can do,
and the reason why the amber is the
user service knows that we're in file over mode. And it's going to send
an error message, which tells everything, that we're
only in a read-only situation. Hopefully, that's
given you enough time to restore that database. Now, if this was a user
service and it was me, I would probably make
the in-memory database the primary database and have
another database as a fallback, because, as Hussain pointed
out earlier, it was atomic. It's going to be
very, very chatty. So the advantages of having
that really low latency is going to be important. OK, speeding through. So microservice and faults. It's complicated, right? But with that complexity,
we have more options, right? We can safely say that even
though that is more complex, we can give the user a much
better user experience. If the user has a much
better user experience, they're more likely to stay with
your website or your platform. Circuit breakers. If you all know
microservices, I'm going to explain what
one of those are. But it's important
that you have them. And maybe you limit
the amount of languages you have in your platform, so
you have standardized circuit breakers. You know, fail fast
and have a plan B. Bulkheads. Create partitions
between the domains. Make sure that when you have
that catastrophic failure, your whole system
doesn't go down. Now, the bulkheads thing comes
from boats or a submarine, where you put a bulkhead
between the different partitions to stop the whole boat sinking. But I want you to take away-- if you're going into your
board, you're going to your CTO, or you're advising a
customer to go into-- you're going to go
on this journey. You don't want to show
them these slides, right? Because that may
frighten them, right? So how can you explain this
migration process and handling all of this to a
non-technical person? OK, so is it anybody's
birthday today? No? [AIR BLOWING] Sure it's nobody's birthday? Let's say the business objective
is to hold a volume of air. This is your volume of
air, thank you very much. This is your monolith, and the
volume of air is your logic. [BALLOON POPS] There we go, one error. It's gone down. Sorry. Did that land on your head? [HUSSAIN LAUGHS] This is bubble wrap, right? We can hold the same volume
of air with our bubble wrap. But wasn't it a lot easier
to understand the balloon? It had a nice shape. We could understand it. Microservices, I don't
know what this is. It's multi-layered, but I'm
holding the same volume of air, roughly. Caveat, big needle. And probably if I get
addicted to popping the cells. You get the point, right? We're only going through
those certain things, and the rest of the
system stays functioning. That has got to
be your objective. And as you migrate
from your monolith with all of the culture
change and with everything, try and keep that in your
head, because that's where you want to try and achieve to. And with that,
I'll pass you back. Hang on. HUSSAIN AL MUSCATI: Yeah? [LAUGHTER] JUSTIN GRAYSTON: Oh, yeah. I've got one more thing. Hold on, Hussain. Semantic versioning. Got five minutes. We're good. Semantic versioning. Everybody knows
what this is, yes? Anybody doesn't know
this, put your hands up. Don't be shy. OK, go look it up
on the internet. It's right there. Take a picture. [LAUGHTER] Why is it important and
why am I doing that? Well, because this is
your contract to everybody who uses your service. You never had to do
this with a monolith. Well, actually, you might
have had a version number, but it could have been
kind of meaningless. Here, if that service team
A wanted to use service B, they could see which version
they had started using. You probably want to
have a deprecation policy throughout the
organization, so you understand that if somebody
changes the major version and they have a
breaking change, you know you have a year, right? Or six months or
whatever you choose to put that on the
backlog and get that changed in all of the services. Without this, you have
the classic problem of somebody making a
backward, incompatible change, deploying that into production. You're back to that situation,
where everything dies. So make sure you use
semantic versioning. HUSSAIN AL MUSCATI: So
let's review where we are. What have we achieved with our
monolithic application that we started with, the
one that only had a load balancer and that one
instance and a data store? We added a new web front UI. We added a bunch of
new APIs and migrated something difficult like
auth into a microservice. We identified a bunch of easy
wins with the edge capabilities and identified
the difficult ones with the sticky capabilities. We also looked at error
handling and identified what critical paths are. And right now, we have
a deeper understanding of what the system is. JUSTIN GRAYSTON: Hopefully. HUSSAIN AL MUSCATI:
Yeah, hopefully. So what do we do next? What's after this? You need to continue
your journey. We don't really have a
full-fledged microservice system, nor we have a monolith. We have something in between. The idea is take
what we've done here and iterate, follow
the same process. Extract services,
extract capabilities out, turn them to microservices. And it's a continuous
way to evolve. You kill different
parts of the monolith and create new
microservices and do it at your own pace to avoid
any of the issues we saw. And one thing that's
important to think about is the culture of change
that this will have on your-- culture change that this will-- how this will impact
your organization. Your organization could
have been a monolith, where you have one big
team that manages this monolithic application. JUSTIN GRAYSTON: And we could
have done 15 minutes just on that, right? HUSSAIN AL MUSCATI: Yeah. JUSTIN GRAYSTON: So-- HUSSAIN AL MUSCATI: And right
now, what's going to happen is that you're going
to have services, different microservices. And you're going to have
different teams managing those microservices as well. So make sure to embrace
that, because that's the best way to move forward. So good luck on your journey. Understand why you're
making the decisions you are and be aware that things
will go wrong, so be careful while you're moving along. Go on your own pace. Don't try to do
this all at once. This is an iterative process. And make sure you balance
between business motivations and whatever trade-offs
and risks you're taking. Thank you very much for
attending this talk. JUSTIN GRAYSTON: We
have two minutes. Does anybody want to do a QA? HUSSAIN AL MUSCATI: Yeah. [APPLAUSE] JUSTIN GRAYSTON: Ah, thank you. [MUSIC PLAYING]