Principles Of Microservices by Sam Newman

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so I hope you had an awesome keynote I got here a bit late but I hear it involve robots and jetpacks and so one of the guys while wiring me up said good luck following that which is just what you want to hear before speaking to a large audience but thanks for coming along today unfortunately the conference looks awesome and I'm literally here for this talk because I am bad at scheduling so hopefully if my flight is taking off from Brussels depending on what left-hander is doing I'll be shooting off straight from here my name is Sam Uman I work for a company called thought works if you don't know what we do you can email me or look us up on the net because we're on the internet now and I'm also the author a book called building micro-services if you enjoy the talk there are copies of the book available at the O'Reilly stand which you can find later on but we're actually here to talk about these things because they are all the rage are the only thing more buzz worthy in 2015 than micro-services is of course docker I'm sure lots of you have docker stickers on your laptop I'm sure some of those stick people with docker stickers on their laptops have even run docker but this is what we get to talk about and these are micro services I draw them as hexagons because it's a nice shape and they're my slides and I get to pick the shapes but also as an homage to Alastair Coburn's paper on hexagonal architecture they have names customer service shipping inventory these these give you an idea what the architecture might be about and this is the definition I use for them small autonomous services that work together modeled around a business domain I normally say small independently releasable services that work to get that modeled around a business domain you might have your own definition of micro services and that's great and then you can put that in your own book ah this is what I say they are they are separate processes they communicate over a network ports but they're all actually about independent evolution they're about being able to make a change and deploy them into production by themselves and so I spent a lot of time working with organizations because I come from a background of having spent a good chunk of the last ten years of my career working with service-oriented architectures so I see microservices as nothing more or nothing less than an opinionated form of service-oriented architectures and so I was sort of intrigued as to why micro services worked and what it was the organization's had to do to make these things work well there are a lot of downsides that come from micro services a lot of complexity that we add how do you sort of chart a path through all around the pitfalls and get the valuable stuff out of it one of my colleagues James Lewis sort of talked about micro services in the context of an architecture that buys options for you that is to say you invest in having these smaller finer grained architectures and in exchange for which you get the ability to make lots of different choices and choices can be good but when we come from a background of working with more monolithic software often we only get you to making one or two major decisions a year we have one main technology stack we use for that monolithic system we maybe only have one type of persistence store maybe only one type of main idiomatic design used in that system with American service architecture you get to make a lot more choices and this can actually be a source of a large amount of friction as always when you make decisions if you sort of approach every single one from scratch thinking about the pros and cons it can become a bit exhausting to go through this the whole time and also it can lead to situations where you end up sort of making different decisions in similar situations and you end up with a heart hole amount of sort of inconsistencies in your architectures quite often we use a set of framing principle to help guide our decision making like a set of value statements that sort of decide how we do something round here so for example Heroku have their 12 factors the key thing about Heroku is 12 factors these are principles that guide decision-making when working on the Heroku platform it's actually you know all these sets of principles are to achieve some goal so this stuff when you follow it well hopefully your application will work well on the Heroku platform is actually a mix of principles so design decisions and constraints the constraints of the Heroku platform itself but nonetheless when you're building a platform on Heroku a system on Heroku this guides your decision-making this sort of principles this was a set of principles done by a colleague of mine Evan butcher and sort of the things we talked about in terms of architectural principles typically a sort of what you see in this central column but Evan really highlighted the fact here that these printables these things that drive how we're going to design our software they exist for a reason and here they exist to drive the company forward so here on the sort of leftmost column we've got a description of what the organization is trying to do this is an organization that's trying to go fast they're trying to expand rapidly into new markets the architectural principles therefore are about going fast there's much less emphasis on being consistent there's much more emphasis on actually just empowering teams and then over on the right Evan has pulled out as distinct from principle Z's idea of practices these are mechanisms by which you implement a principle ever made the observation that you're where your company is going doesn't change that often maybe once a year once every two years our architectural principles they change a bit more we learn stuff we realize that some of our systems weren't great and that maybe modifies on it on a slightly more frequent cycle and then over here on the right the practices the things that are actually the detail they change quite a bit because technology changes all the time but nonetheless by breaking these things apart this actually allowed this sort of larger issue organization they now have over 200 developers to more or less have a good sense of how things are done around here and they also make sure these principles are driving towards an end goal this end goal being is company being successful the 12 factors for Heroku have an goal your sir application should work on Heroku so when I was doing my research into micro services I was thinking one of the things that organizations do in order to achieve their end goal which is namely they get enough of the good stuff out of micro services for it to be worthwhile so what are the principles that we need to follow to build these things these small autonomous services that work together so I've sort of distilled it down I mean it's sort of an earlier version of this in the book this is of a newer version of this distill it down to eight principles the first is modeling things around a business domain because we found that that gives us more stable api's embracing the culture of automation to manage the fact that we've now got a lot more deployable units hiding implementation details to allow one service to evolve independently of another decentralizing as much as possible both sort of decision-making power but also architectural design concepts deploying independently probably the most important principle up there the idea that you can make a change to a service and deploy it into production without having to make changes to anything else consumer first services it turns out exists to be called and maybe we should think about that so thinking outside in not inside out isolating failure making sure that the systems we build are not you know sort of more flaky than their monolithic counterparts which is very easy to do and making sure our systems are highly observable making sure it's easy to understand how they hang together and how they behave so let's dive into the first principle modeling things around a business domain I said earlier that I draw these things you know as hexagons the more important thing of the names they have names that have meaning when you look at an architecture for a microt service system you should have some idea of the domain in which it operates compare that to a lot of the architectures that we saw coming out at service-oriented architecture where people took the sort of horizontal technical players within a process boundary and just said right they're going to become new services and to be ended up with presentation services made it up with sort of business logic services and back-end data storage services the nice thing about those architectures is that you can use exactly the same architecture diagram for an oil rig a banking system or a charity because it's the same diagram it's not very useful though because often when you want to make a change those systems that have been split horizontally a change often has to cut all the way through something as simple as adding a new field to a user interface may require changes in two or three services and when those services are owned by teams that's coordination across teams with Muraki services instead of slicing things horizontally we're sort of slicing things vertically the unit of decomposition is effectively the business domain we found that services modeled around a business domain are much more stable that is the api's themselves don't tend to change fundamentally that often changes across service boundaries are expensive so we want to avoid them we also find that we've exposing these finer grained seems it makes it easier for us to create different sorts of user interfaces because we can recombine the functionality in different ways for a mobile device for a web application and also that teams that own these services become experts in that part of the business domain rather than becoming an expert in some arbitrary technical decomposition of the whole we now get teams that are really understand how invoicing works understand how the accounts process works finding these seams and existing monolithic systems can be difficult but it's a lot of work from domain driven design that can help us here in many ways the same principles that apply to Madhi composition from the 70s still apply but taken with a health healthy dose of domain driven design thinking as well helping us look for things like bounded contexts and subdomains you can actually find these service boundaries so implementing domain driven design is a good place to start if you're interested in using this as a way of understanding the domain you're operating in let's talk about our next print all embracing a culture of automation you need to be pretty relentless about this if you're going to use micro services at scale at the moment when you start on this journey having a small number of services you could probably get away with manually provisioning of machines manual deployments that won't last so let's talk about one a client of ours who've been using micro versus for many years and we're using it before we had the word for it this is a company called Rea in Australia they'd actually spent a couple of years investing in a deployment platform on Amazon primarily to allow them to cheaply provision test and dev environments but they already had a fairly good set of rigor and discipline around automation but they wanted to go a bit further and embrace this sort of Amazon idea of the to Pizza team services being owned and operated by teams that team deploys the infrastructure deploys the service manages that service in production and then actually tears that service down when it's no longer needed so from a standing start they got three of these services that's like two of these services up into production inside of three months which is actually I think very good going I think most organisations wouldn't necessarily get that thing turned around as quickly as they did that went really well for them they thought right we're really going to go fast now we think we can ramp this up it took them another nine months just to get seven more services up because they had to invest all the way along in tooling and creating a platform that allows them to do this efficiently all this is about reducing the transaction cost of having and managing more services and it's not always easy to see what things you're going to need when you start that journey and so actually you see a fairly flat growth here six months later they had 60 services in production the key thing to understand is this is 60 different types of services these services themselves may further be scaled out so you see the sort of hockey stick explosion in the growth of services you see similar growth patterns from gilt who share their numbers of growth over time that organization you connect on from a monolithic rails application to sort of a decomposed often JVM based platform you see similar patterns for many years they had a load digit number of services but once that sufficient investment in the platform kicked in things spiked up and so when we're thinking about automation we're thinking about things like infrastructure automation can I write a line of code and provision an isolated operating system or provision is service have I got the sufficient testing in place that makes that helps me understand whether I can release my software and am i treating every check-in as a release candidate have I really got rigor around that stuff all this is the things you're going to have to invest in if you want to use these things at scale and be relentless there will be some upfront work required to get this working and that will require ongoing investment as well let's talk now about one of the trickiest things to get right which is actually hiding implementation details so we're in a very small cozy environment here this is a face safe space there's only one or two or six hundred of my closest friends so I feel confident in sharing with you the world's most commonly used service integration pattern outside of the internet and that is this I have a service that talks to a database this is okay I'm okay with this this is fine databases are good things but I want to spin up another service and so I do this it's very easy to do it's very quick to do this now allows two services to share information this is very common two services it's not too bad it is quite bad if I want to make a change to a schemer maybe rename a column because the name is bad maybe restructure schemer to achieve different performance targets can I do that safely knowing that other parties are reaching in and looking at my database the answer is that I can't do that safely effectively when you expose a database to another service in this way you have exposed internal implementation details you don't get to decide what is shared and what is hidden and with two services things aren't too bad I worked on a platform where we had 40 separate services integration or a schema that we owned we couldn't track down who all those people were so we had to turn the database off during the day and wait for phone calls right this drastically impacts your ability to change and evolve the design of your systems so this is what we want to talk about if you want to talk get information from another service or you want to change data it holds you need to make a request in somewhere you need to make an API call you need to send a message to it in this way at the API layer people owning that service get to decide what is hidden and what is not hidden which allows them to change the internals of that system safely so hide your databases this will be one of the biggest things to get right in allowing these services to evolve independently but even once we've done that even once we sort of got a nice API boundary we still have to think about what that API shares as well I mentioned the some of the ideas behind domain driven design earlier there's one thing we talked about we talked about service design and domain driven design and that's this idea of the bounded contexts the bounded context is sort of an explicit boundary within a domain and that you have models that you share between those boundaries and then you have models which only really need to exist inside one of those boundaries this is an example from martin fowler's post on on banner context so on the left here we have a collection of functionality around sales we have concepts in there we have territory pipeline opportunity on the right we have a bunch of stuff about support we have things about defect product versions tickets the nice thing about having diagrams like this again is you get a sense of what the domain might be it was quite useful so there are but there are two things that are shared this customer and product the thing to understand here is that what customer means inside a sales context is different to what customer means with inside a support context although it might be the same person a customer in sales is somebody I have sold to or might sell to a customer in the context of support is somebody's rate a ticket so when you're thinking about sharing information you've got to we understand what do I really need to share what is the information that actually anyone else does care about because let's imagine if these were to service values how I might implement this I have an object which is the customer it has fields so I can see information about it maybe the tickets that they've raised and the defects there there is little field in the object I run my serializer on it to transform it into a highly efficient and very human readable format like Jason slight role and it runs and follows all the references and creates its nice big JSON payload and I send that over the wire and it's 10 along with it the tickets it's sent along with it the defects that person is raised and so on and so forth when you expose internals like that again it becomes very very hard to change exposing information is costly it's easier to expose information you're previously hidden and hide information you've previously exposed so you also need to think very carefully about what is shared and what is hidden that's often what a lot of the bounded context ideas are about stalk they're about more maybe one more the fuzzier ideas here and that sort you know that's about decentralizing all the things the reason this is important is because micro services are an architecture which optimizes around autonomy autonomy of teams predominantly rather than individuals but autonomy nonetheless to do this to achieve that hoped goal of going faster deploying quickly more quickly into production you have to actually push power out when we're thinking about what autonomy is so the definition I use in this context it's sort of giving people as much freedom as possible to do the job at hand and so we need to think what can we do to make our developers make the team's owning these services more in control of their own destiny it starts with things like self-service do I have to raise a ticket to get a machine or provisioning an environment or can I just do it myself um that's a very simple thing governance is also important I actually think governance is not a dirty word necessarily having a place where people can collectively come together how think look at the cross-cutting concerns understand okay to our principles need to change but finding a way for that governance process to be shared as well so rather than having some centralized architect who sits over the whole thing you actually have people that come together collective you know members of the team that come together and talk and share ideas some organizations create this we say things like the shared communities of practice this is actually a reference to the slide here is referencing a blog post from Gil from a couple of years ago the structure they talk about here it didn't really stick for very long but nonetheless is an interesting example of how you can have a bit more of a collective sense of governance and ownership but it also comes into our architectures how many people have an architecture like this hands up anybody a nice simple bus magical bus that communicates the men manages the communication of all our services and it looks like a nice diagram the problem of course is that often is hiding a lot of problems so I have no problems against message brokers I have no problems with things that get messages from A to B and do so in a bazillion reliable way a large amount of my I IT career has been spent using such things but I don't like it when those message brokers start taking on more and more functionality in more and more behavior you know IBM MQ series was a good queue in 1995 but they kept adding things on top we do things like make these buses domain aware we use them to implement consistent data models we start putting more and more smarts into this wonderful magical mystery bus in the middle and suddenly to make a change we need to not just change the service but also the message bus itself which is now being managed by separate team if you're going to use middleware if you're going to use messaging middleware keep it dumb keep it about the pipe keep it about going from A to B keep the smarts in the services and this does not just apply to messaging middleware I think if you look at the current trend around API gateways they are fast becoming the Enterprise Service bus of the micro service era because the reality is when we look inside these things they look nice on the surface but we know there's some sort of hellish landscape of death and destruction lying just within the surface so which is the halfway point now let's talk about probably the most important principle and that is deploying independently this is the idea that it should be the norm not the exception that you make a change to a service and deploy it into production without changing anything else if you have five services right now and you always have to display it deploy all of those five services together fix that before you add a sixth you will thank me later getting this right can require a lot of things but it can often start with even simple things like how are your service is mapped to the underlying infrastructure things like you know how many services per host do you have when I say the word host here I'm really talking about an isolated operating system and sort of collection of resources that could be a physical machine it could be a virtual machine it could be a container we have the model where I have one service per host or the model where I have I skip forward to quickly or multiple services per host over here on the right where I have multiple services per host this is the world you'll be in if your say using java application container this is where you're using JBoss this is where you're using iis this is often an approach that's optimizing about having a small number of hosts this is the world you'll be in if the cost of provisioning a new hosts too high so if you only have physical infrastructure or you have to raise lots of tickets to provision a virtual machine the issue is that world on the right is a world of side-effects that world on the right is where I deploy a service it has a bug that uses up all the CPU on the machine and suddenly all the other services stop working that's where I deploy some prerequisites the service needs on that machine and suddenly those prerequisites don't they clash with the other services on that box and there's other services stop working the world on the right is more confusing to think about from an Operations point of view and doesn't really help us around interdependence you don't have to start with one service per host but I think virtually everybody I've met who uses microservices it's scale whereby at scale I mean more than one micro-service per developer they end up on the left because it's a simple world it's much easier to reason about this is partly why people are so excited by docker and things like it because it lowers the cost of creating isolated operating environments like this but we also have to think about making changes we do want to avoid breaking other services when I make a change to a service and deploy it into production the key thing I'm asking myself is have I broken one of my consumers that's often why people resort to releasing all the services together because they say I've tested these ten services together I know they work together so I'll just release them all at once and that process becomes enshrined as the way to do things but that actually slows down how quick you can get functionality out and makes for riskier deployments but if I want to make a change to one service in this example I want to change the inventory service the key thing is to understand is have I broken my consumer so if I deploy a new version of inventory am I is shipping still going to work in production so the way actually we can validate that before deployment without having to do large end-to-end testing and that's using a technique called consumer different contracts if you think about this communication here the shipping service has expectations on how the inventory service is going to behave the issue is as those expectations are often implicitly modeled there are calls in the application code that we can sort of look through and say ok if we could distill that down that's sort of the contracts that we have but that contract isn't explicit anywhere what we do with consumer driven contracts is we make that contract explicit and we make it executable so we use so we take the consider consumer team for example here we'd create a set of tests that represent the expectations they have of the inventory service those tests are then run as part of the CI build of the inventory team so every time I check in I run my consumer-driven contract test so maybe I bring up the inventory service on my CI know odhh I execute the expectations against it for the various different consumers I've got and if one of them breaks I know not only should I not go into production but I even know which consumer I broken this is a very good technique we use quite a lot now there were some sort of test tools that can be jury-rigged to support this there are a few concerns are quite tricky in this area and so at all I like a lot in this space is one called pact which is built from the ground up for this purpose Bethy runs a project has even got a project called pact broker where you can store the expectations for multiple different versions which means you can validate expectations for multiple different versions of the same consumer before going into production which is actually something you often want to do it's as well worth a look but this allows you to do independent isolated testing of a service validate you're not going to break consumers and go into production without the need for big end-to-end testing but the problem is sometimes you do actually need to break consumers you don't want to do but you have to sometimes the key thing here is if you want to embrace the idea of independent deployability we can't force consumers to upgrade at the same time as we would you produce a new version of our service API so we have to think about different models one of the molds I like a lot is actually just coexisting the endpoint so I'm going to introduce a breaking change and so what I do is I map existing API calls to my version one endpoint this could be a different name space with RPC at different ports even and so that's where my old traffic is going I put my braking API is a new versions version two and I expose that somewhere else very cleanly identifiable I then give my consumers time to upgrade so once they've made the switch which they could do is maybe a separate release and a few weeks from now I can then retire the old endpoint I've used this model quite a few times I've even had that one time we had three different API exposed on one service to allow consumers time to upgrade this works very well in terms of keeping your sort of deployments quite simple keeping service discovery simple and it works quite well when you've got a sort of control over your consumers you have the ability to ask them at some point to upgrade to a new version another model you can use is when you introduce a breaking change is to actually reduce a brat produce a brand new version of your customer service so maybe I've got version 1 I've got version 2 and they're serving different consumers that model works well when you can't change the consumer they just need that API the problem with having multiple different versions of a service live at once is those multiple different service versions are effectively branches in code if I now have to fix a critical bug I may now have to fix it in multiple different places it also can complicate service discovery I now need to find not just my customer service but my particular version of a customer service and if these services are also stateless that can be a bit tricky but nonetheless a mix of coexisting end points like this and having multiple different versions of the same service are ways in which you can break an API without breaking your consumers this in a way sort of a version of the expand contract pattern once nobody's using an old version of my service I turn it off once someone is no longer using an old version of my API I remove that code talk now about putting the consumer first services exist to be called with a user interface I suspect that most of us now are quite comfortable with the idea that it might be a good idea to have a ton of either a real user or a fake user look at our design and help us tweak and iterate that design some of you may even done guerrilla testing you know going out there having watching people use your application filming them why they're doing it getting that great feedback API is are the same API or user interface their user is just another team it's not a set of developers so you actually need to think what is it I need to do to make it easy for them to actually work with my service services exist to be called it's very unsexy but one of the easiest things you can do to make your life easy is have good documentation swagger is winning and if not has won the battle in this space as a way of defining documentation for api's most web api you'll use will support exposing adjacent for this stuff this can be often a very easy thing to do you put a little bit of information on your end points and you can produce nice shiny documentation swagger can go a bit further for you because you can do things like use the swagger UI to actually execute those end points for within your browser if you think about the you know your as the person who's consuming this API you kind of want to explore it you want to understand how the payloads work being able to go to a swagger UI like this see the documentation see example templates of what to execute actually paste them in change the fields and hit execute and maybe run against a developer version of that service that's great feedback for actually someone writing a service to consume your API either other things can help too even knowing what's actually running out there can be useful many of you may have heard service discovery tools I don't tend to like using service discovery tools very early on because I think they're really about scale but nonetheless these sorts of systems give you information about what is running where I tend to favor console in this space this is something really though designed around having one machine talk to another machine not necessary often very useful for human beings but they do expose information that can be useful as a consumer I now could get hold of what's running where then we need to do a little bit of work and actually get that information out and present it in a nice fashion colleague of mine halvard in australia coined this term of the humane registry a registry not designed for other machines but a registry designed for human beings so he started off with a wiki page if you've got information and documentation about your service via swagger i've got actual runtime dynamic information about my services held in something like a service discovery system just create a wiki page your service and pull that information into one place as a consumer I go to that page I can maybe even find out even weird things like who should I email when it doesn't work you know I've been in a few organizations but you can't even you don't even know who created this thing and that's quite scary believe me but I can see the documentation I can see how it's running I could see you know maybe even some stats creating things for human beings is quite important in the micro service world and we'll touch on this idea of sort of making things easier to understand at the moment let's talk now about isolating failure um it's an unfortunate misunderstanding of distributed systems that some people have but they sort of assume that just by breaking up say a set of functionality across multiple machines that your systems will automatically be more resilient it's actually not true it's actually much easier to make things less resilient we'll think about it right if your application is running across more machines machines have a failure rate there are now more machines in your system that could fail there are more Network boundaries they're more networked pop that works that could partition that could timeout you've effectively expanded your surface area of failure and so unless you've also built your application to handle that failure your system will be less resilient true story a couple of years ago I was working at a client they'd taken a monolithic dotnet application that split it up into 12 pieces and they were running this in production and they said to me Sam whenever one of these services stops working everything stops working a friend of mine has likened this to a distributed single point of failure if somebody else talked about it as being like you've taken your brain and chopped it up into 12 pieces and put it into 12 different jars that system was a was you know had I have my suggestion was to merge it all back together because it probably meant more resilient the issue is they haven't thought about what failure means now it's not just other people that do this I also did this I was the lead on a project a few years ago it was for a classified ads website they worked in multiple verticals like you could buy a guitar and a cement mixer from them and so they built up all these old legacy applications to support different verticals over time and we were working to move these onto a new technology stack and we use a pretty common migration pattern actually a pattern you'll use a lot if you are looking to move towards microservices and that's this idea of a Strangler application effectively a strangle application is something that intercepts calls to the old system and potentially redirects them to new to the new code and over time you get rid of the old code until only really your new code exists so in this example we were proxying requests to the downstream applications and that was fine so for the verticals that got less traffic that were less valuable to the organization we're leaving those and focusing on where the money was ran about 10 production nodes normally we would have only about 30 to 60 concurrent requests at any given point in time per node at peak but we'd have about 6 to 8,000 requests per second coming in but most of that was very aggressively cached so these were the by design cache misses effectively the peak was during the day which was very good for me because during one of these Peaks the whole system went down what happened was the load on these nodes went from having handling 30 to 60 requests to handling over 800 in space for 15 minutes and when you've got a request equaling a thread you get some idea about what might happen to circa 2009 here our hardware so the whole system went down and went down very quick it turned out this was an example of a cascading failure the kind of thing you really need to protect yourself against what was happening was that one of the downstream services was failing in the most annoying way anything can fail in a distributed system and that was it was failing slowly when things fail slowly they tie up resources and in a distributed system they have the potential to tie up resources across school chains of deaf we have multiple services that may have resources locked up that's dangerous and that actually took our system down because this thing was failing slowly the thread pool that we were using to proxy cause became exhausted because all the threads were blocked waiting for this thing to timeout the thread pool therefore had no more workers available which was annoying because although the rest of the downstream applications were working just fine no traffic could get through and because the thread pool was full up all the requests coming in from the outside world was still building up and were blocked and were hanging there waiting so that was actually these requests coming at the top were what caused the huge spike in number of concurrent requests and took the whole system down so this was just one of our applications are very old one that one day decided to be slow and is cascaded up and took out our entire system in 15 minutes that's not good so we fixed it enough in actually a few different ways and he's actually ended up I didn't realize at the time a very common set of patterns that you'll use to really you know make these systems more resilient the first thing we did was recognized that the timeouts were hopelessly wrong we were waiting two minutes for these downstream services to respond no human being even in 2009 waits two minutes for a web page to load so people that had requested a page had already gone off and done something else while we were still waiting or something to timeout so we really shortened those timeouts we took them from two minutes to about two seconds anyway you can do to look at the normal response time percentiles and so we just stuck our timeouts in a fairly healthy place we're like ninety our 90th percentile response was fine we accepted that we might be aggressively timing out things but the reality was the application that had started failing on us generated a very small amount of our traffic anyway so we felt it was more important to keep the whole system running so that's the first thing you know when you're thinking about timeouts what are the timeouts what should they be so we brought those right down but we realized even then we still had a single point of failure and that thread pool even if we could maybe put those threads back quickly if we only had one thread pool for all of our downstream services we still had the situation where traffic to one application one down to application could stop traffic going to others and so we added one thread pool per downstream application this is an example of what's called bulk heading it's a very important pattern in resilience engineering the way to think about this is you know you've got a ship big ship you hit a rock water starts pouring in to the hull if you go down into the hole you close the compartment that's flooded that compartment is flooded but the rest of the ship carries on so in this situation now if one of these thread pools becomes exhausted the other thread pools can still serve requests the other downstream applications that's good the third thing we did was we added what are called circuit breakers so the way circuit breakers work in a networking sensor just like they work in your house you get a surge of electricity that comes your house the circuit breakers open they stop the flow they protect your appliances here the way the circuit breakers work is after a certain number of errors or timeouts the circuit breaker opens and requests stop getting sent the downstream service that gives the downstream service the ability to recover if needed especially if you've got say exponential back-off and retries but it also allows your code to fail fast rather than waiting for a time out on error I can say the inventory service is down that allows you to programmatically degrade functionality so in our case that meant when a circuit breaker blew open we would actually close off part of our user interface so related to that verticals or pop up an error message so now not only you keeping the site running you're keeping information flowing your bit you're giving clear indication to the user what's happening circuit breakers are also useful not just for handling unplanned outage they're also good for handling planned outage you know like you've got to set your fuses in your house before you start drilling in the walls you open the circuit breakers to stop yourself electrocuting yourself what you're doing you're drilling so when we needed to deploy a new version of these downstream applications we'd flip the circuit breaker open the site would degrade functionality based on that service no longer being available and we redeploy the new version you know and then reset the circuit breaker and so by putting in something that was there to deal with unplanned outage we also gave ourselves a mechanism to handle near zero downtime deployments all three of these patterns are helpfully described in mic night guards book release it if you buy one book this year buy my book if you buy two books this year buy my book and Mike's book so it's a really excellent book around within into engineering so this is the stuff you have to think about these three patterns will come up time and time again but you have to think what happens if every single thing that I can do fails and I would apply circuit breakers around database connections even as well they were good libraries for this you've got hysterics for Java you've got poly and brighter for dotnet there's probably about 15 different implementations in Ruby some of which may even work so just do your research on that one but do read Mike's book as well once the last principle now that of making things highly observable and I don't mean just in the sense of making it really easy for you to just look at all the machines you've got running this is what I used to do when I used to run production systems I used to have lots of X terms open lots of green on black text it was great hunting top my favorite things they would have top running and all my machines like I was in a matrix have helped various great look look I'm doing I'm supporting the systems and so if I wanted to check for logs the errors and the log just every day or two I'd log onto all the machines do a grep for errors and see if there's any old patterns coming up and that's fine when you've got like six machines and then we got more machines and I was starting to have a real problem I couldn't manage them any more so I got a second monitor so I can have more windows open that doesn't scale so well after a certain point in time you end up really needing to move away from this idea that monitoring and observing and understanding system behavior is about logging into the machine you need to be thinking about gathering all the information you can out of those nodes and storing it in one place where it can be viewed so we're talking really about aggregation of all the things start off by getting logs out of this system you know if you've got money by Splunk you've got a lot of money by spunk it's awesome fantastic log aggregation tool if you want to host yourself using the elks back is great if you want something you know this off premise you can use paper rayul or or CMO logic just get all of your logs in one place makes it very very easy for you to just see what's happening across your entire fleet most of those aggregation tools will also do things like you know reporting on error rates and stuff like that which can be really useful do the same thing with your stats get things like the response rates off every single on machines and so you can look at latency as well across the circuit breakers get those off those nodes getting somewhere central tradition you do something like graphite if you're hosting it yourself I'd go with prometheus nowadays again you know New Relic app dynamics any of those systems handle aggregation of stats really in a really nice fashion with its snapt aggregation what you're often looking for is being able to see things over time so a good time series based system and the ability to drill down with aggregation you want to see the overall pattern but when you want to dive into what a particular service is doing or a particular machine is doing your needs can be able to navigate in so often these systems will come with some kind of query language to make that possible it's not just doing things like aggregation we've also got to think about what how how services are connected to each other we've got to make it easy for us to understand what's happening and how our systems are behaving for example I've got you know some interconnected services I click a button which calls the service which calls a service which calls a service and all the way deep in that call stack again error it's an application developer I might have enough information about the call that was made to cause the error at the service itself but what I understand the context in which that call happened will understand all the other things that led up to that error happening what if that error happened as part of a long-lived business transaction I have now got to work out what's broken as a result do I need to unpick something manually so here we're just reusing an old idea from event based systems Nats this idea of the correlation ID so when I start some some action I click a button for example I generate an ID that ID flows downstream for every subsequent call I record that correlation ID and some information about it so some people use tools like Zipkin which gives you tracing of latency across call I like actually taking these correlation IDs and putting them into log files because then I get a snack trace I see the correlation ID I put that in my log aggregation system I can now see all the log statements related to all the calls through my entire call stack this stuff is really really useful the unfortunate thing about correlation IDs is by the time you've got a system complicated enough to need them the effort to put them in is non-trivial because by definition you've got a complicated system and so I actually often start off by saying we are just going to come up with a convention and we are going to put them in it's going to be a header here's how we generate them here's well expected five-minute logs and even starting that off with simple system is going to be useful one of my clients who was a very overachieving sort of person came up with this really nice idea which he used he'd like me we were talking about this idea of data logs as being like data and so he wrote a little program that given a correlation ID would actually draw pictures of the services and how the services were communicating you know there's having a piece of documentation that says this service talk to this service and then there's drawing pictures of it based on correlation IDs coming out of your live production logs that stuff is just really useful so let's summarize let's talk about our eight principles again modeling things around a business the main leads to services that have more stable API boundaries it's easier to reuse them in different ways to different user interfaces and it makes teams that own them experts not just in the domain themselves but also the service ownership themselves and avoids lots of cross-cutting changes embracing a culture of automation is key to allow you to manage all these multiple different services that are flying around hiding implementation details is essential if you want to evolve the internals of one service without breaking others these centralized things avoid smart middleware see if you can put decision-making into your team lower the barriers to entry for teams to look after and manage things themselves deploy things independently this is actually the golden rule if the only thing you remember out of this presentation actually the only one thing is to buy my book if our two things you remember is this this is what you need to aim for you need to be able to make a change to a service and deploy it into production in isolation of everything else and if you can do that reliably you're in a very good place place your consumer first it's a very soft thing services exist to be called think outside in make sure you understand where your sources of failure are every single communication between one service and another is a potential place where something can go wrong plan for that understand it and know what you're going to do about it as a result and finally make sure you build your systems to be observable you know building things in like correlation IDs aggregating stuff in a consistent and standard way is very important if you want more information about the book you can find it at building market services comm there's a bunch of copies I sign that you can buy in this 40 print off you can also find links my blog where I'm sort of blogging about new patterns and new research I've done subsequent to this but thank you very much your time I think we've got some time for questions I can't see anything but you know there's a question over there yeah yep it's a really good question so if you're slicing vertically what to do about the user interface the challenge here is that user interfaces are normally fundamentally aggregations of functionality so I've seen a few different models for organisations who put a primary delivering over the web who are doing like old-school like not single page app built type web sites there's a very easy way to do that which is your service serves up a collection of pages and effectively then you just use a very very thin scaffolding layer to sort of pull that stuff together and so that's sort of what gilt do that's what Rea do so effectively you know I work in this area this is our part of the UI and then you have someone keeping an eye that it's all coming together correctly at the top orbits actually use micro services that serve up components in their pages that are then pulled together things get tricky around mobile with mobile you can't often just be making loads of calls to these back-end micro services because that's very expensive because of battery data plans and the like so you could do that the single page app which I've also seen so you make the call straight into the services but when it comes to mobile that's often not efficient and so this is pattern called the backend for front ends idea we effectively have an edge service that is there to handle the server-side communication for certain user interface so you make coarse grained call to that back in for front-end that in turn makes the call some micro services the key thing there with that pattern is that BFF is tightly coupled to that user interface often owned by that team if that teams their mobile team so Rea for example they've got one BFF for their Android app I've got a different BFF BFF for that iOS application I've got a blog on that coming out I think next week kind of a big piece that should go into a bit more detail so I hope that useful cool question yeah so the question there was I got a customer service and I've got two different services that need different things from the customer with the idea be here like that they that one is storing some set of information about a customer and another is doing other information about the customers that sort of the example yeah so yeah this is that this is this problem that with something like the customer is a fantastic example of often what doesn't make sense as a service in its own right and I use the example a lot in the book which is where it gets confusing I don't think it makes sense for the customer service to necessary store all information about the customer because if you follow the links that could then be all of your data the way I like to think about it is let's think of the British government I'm always thinking about the British government part of my patriotic duty but if you go to the DVLA they're the driving licence authority they store information about me my car registration my driving licenses Her Majesty's Revenue and Customs stores information about my tax returns which by the way are late and the NHS does information about my medical health records now they will have very very different needs I actually like the idea that not all in one place because a lot of sensitive information what they're working through now is this idea that but they're still me I still have an identity so this I do effectively that the information stored about me is almost federated in those different places for something like a customer service and a very simple system I might be inclined to have most information there but over time I think something like a customer service is actually going to store a very very small bit amount of information about me and maybe just enough to handle authentication maybe just enough about my identity but then those local services that have their own needs of your data they might have a pointer to your identity in them but they'll store their local records about you and I think that's a very natural progression you get when you go beyond say a more trivial systems the customers the customer your user is always a great example of where that thing comes up but I hope that was useful I've probably got time for women to go through the question so how do you handle shared data um I think very clearly I understand if I've got a piece of information save on copying data around that I use caching for that and so I have information about how often I will refresh that I don't like copying other people's data and storing in my database because it's not clear to me who owns what and so I would use things I would just not you know cache headers on the resource I'm sending out allow services to make decisions about whether they cache and we request that information and that's sort of how I'd handle that if you really really really need things to be consistent then you can't cash and in which case you have to go back to a consistent data source for that anyway so that's sort of your kind of trade-off I am short on times I can't really explain that very very well I do talk about it in the book anyway I got I've got Christmas presents to buy okay thank you very much for your time if you want to ping me questions I'll try and do follow-up on my twitter handle at sam yoomin but I've now got to go to Brussels and hopefully get the left hands a flight to Munich so wish me luck
Info
Channel: Devoxx
Views: 270,520
Rating: undefined out of 5
Keywords: Devoxx, DevoxxBE15, DevoxxBE, Microservices
Id: PFQnNFe27kU
Channel Id: undefined
Length: 56min 13sec (3373 seconds)
Published: Thu Nov 12 2015
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.