Monoliths vs Microservices is Missing the Point - Manuel Pais and Matthew Skelton

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
my name is Mathew Skelton no mineral page and we're here today to share with you some thoughts about software architecture and how that relates to teams so the talk today will look a little bit like this four sections first we'll look at monoliths and micro-services will then look at something we've called team cognitive load and how that relates to building software systems manua will then take us through some case studies that apply some of these ideas and then finally we'll have a little look at how to get started with some of these ideas in your organization there we go so we're all we're authors of this book team topologies published by IT revolution press there are copies available in the book stand and this evening we have a book signing so 7:15 I think it is in the Chelsea theater there along with all the other authors from IT revolution who were signing today so if you're interested in what you hear today come and get your free copy signed by us and take back with you so here are quite a lot about monoliths and micro services at the moment in terms of kind of software architectures for cloud native or for to enable teams to deliver very rapidly but we think this is a bit of a false distinction let me try and explain why sometimes seems feels like a bit like kind of Street Fighter or Mortal Kombat or something so over here we've got someone like Thomas Alice a year start with monolith and extract micro services then like on the other side we've got stefan till cough saying like don't start with a monolith when your goal is to microservices architecture and then we've got someone in the middle who's like a guru Simon Brown saying well if you can't build a monolith what makes you think you micro services are the answer something is kind of missing here right there's a there's an angle on this problem which is with which we're missing so where should we focus what should we focus on in order to kind of to make this make this stuff effective and I think Daniel Turtles North puts it very well when he says we should think about software that fits in your head can we understand the software that we're building ourselves if it doesn't if you like fit in our head it's too big for us we've got a problem in the context of the the talk today and the context of the the book that we've written to use apologies we kind of like to extend out what daniel has said and say software that fits in our heads when we're working as a team why is this important who has a who who has a copy of accelerator who will have a copy by this evening every hand in this room should be up okay you need to get yourself a copy of accelerate book there are four key metrics in the accelerate book that there are indicators for high-performing organizations leads time deployment frequency mean time to restore and change failed percentage I'm not going to go into these now that's for Nicole and for you to chat to her this evening however the problem we have is if software does not fit in our heads there's a real danger that each one of these four key indicators is going to get worse so if the software's too big for our heads then the lead time which is depending on how you measure at the time from kind of starting to work on on a on a new feature to being in production there's a danger that that will take longer that will start to extend there's a danger that the deployment frequency will decrease rather than increase we will deploy less frequently if the software is too big for our heads because we won't have a confidence to implement frequently there's a danger that if the software's too big for our heads then we'll not be able to restore service in production as quickly because it's too complicated who's too involved and likewise there's a danger that if the software is too big for our heads that the percentage of deployments that result in failure will increase we're trying to drive that down so that's why we think that this this the framing around Model S and micro-services is sort of the wrong wrong way to look at it and a useful way to look at it is this phrase that that that Daniel tails north comes up with which is software that should fit in our heads software that is too big for our heads works against organizational agility this is the key if you want to go to sleep the rest of the talk just take a picture of this slide that's the only thing you really need to worry about and that's a really that's a really key thing right so this is the thing I want you to take away from from the session today is we do need to think about the size of software that teams and I'm talking about teams not individuals that teams work with because there's a direct impact on organization agility and so how do we approach this in the book we talk about team cognitive load let me just talk you through a little bit of background first cognitive load is a concept that was defined by John sweller in 1988 and he defined it as the total amount of mental effort being used in the working memory so when we're building software systems working with software systems we've got a lot of stuff in our working memory as we're juggling kind of concepts and trying to put those into into into code or into or working out how to shape a data set or whatever cognitive load comes into play a huge amount as we're working with software systems and there are three kinds of cognitive load intrinsic which is something kind of fundamental to the the way we're working or the kind of problem domain and extraneous which is stuff which gets in the way effectively which which prevents us from really thinking too much about about the the problem at hand and germane which is useful stuff about they of the problem domain that actually helps us to to solve a particular problem now in a software development context this would be something like this in terms it could be remembering how classes are defined in Java extraneous would be for example or how the hell do I deploy this application again it's really complicated we shouldn't have to think about it germane if we're working in a financial services application it might be well how the bank transfers work we need to keep that kind of we need to have that cognitive load on people who are working because they need to be thinking about the details of in this case how bank transfers work in order to be able to write code effectively you could sort of see like this in a kind of software delivery context intrinsic is kind of the fundamental skills that we bring as engineers extraneous could be something like the the kind of mechanism which we shouldn't really have to think about and germane is there the important stuff about the domain that we're working with the business domain we're working with a bit of a simplification but you can think of it like that for now what we're trying to do is we have to work with the intrinsic cognitive load that's just that's just the nature of the beast we have to do it we're trying to squeeze down as much as possible the extraneous cognitive load which doesn't add value which gets in the way and we're trying to give us as much give ourselves as much space as possible for the germane cognitive load the stuff that is really kind of business different differentiating try to represent that in this slide here if you want to know more about this by the way have a search for stalks and slides called hacking your head by Joe Pierce you'll find some interesting talks and slides and blog posts and things around that what all this means is if we're if we want to enable organizational agility we need to explicitly limit the size of software services and products to the cognitive load that the team can handle because as soon as we exceed the cognitive load of the team there's a danger that those four metrics if you remember from from accelerate the danger that we're going to be driving and bad decisions we're going to be increasing bugs are going to be making it more difficult to diagnose and redeploy and so on that's not where we want to be so this is a very different starting point this kind of software architecture and four ways of thinking about team responsibility boundaries and so on we've started to think now about what's the what's what's an effective size of software well the size of soccer I should be no more than the owning team can handle based on the cognitive low it's certainly not something that many organizations have been explicitly doing many organizations have really been have been thinking about this but perhaps not exactly in these terms until more recently so again this is this kind of software that fits in our heads concept if it's in our heads been more able to own it as we've kind of build run and build and run it in in production so we're starting with a team this is very much a team focused way of thinking about software responsibilities boundaries architecture and so on when we say team we mean a long-lived group of people with a shared purpose and backlog probably fewer than nine in some organizations with very high trust you might be able to get away with a team being more like 15 people but what but certainly in our book in team topologies book team means something very very specific which is this long-lived collection of individuals who work together over a long period of time you know multiple months years possibly with a common purpose and work together as a team rather than just a collection of individuals with the same manager the reason for that is because a high-performing team is far more effective than just a collection of individuals so if we want to be high-performing organization we use teams to to do the work there's a really important point here that each service or application each part of the software estate must be fully owned by a team with sufficient cognitive capacity to be able to build and operate it there's no there's no applications or services which are kind of shared or which don't have an owner or which you only have like a BAU team kind of keeping it ticking over every every application or service has got has got full ownership from one team that builds and runs it and it has sufficient cognitive capacity we haven't exceeded the cognitive load of that team so we're not just piling more and more services onto on to the same team at some point that team would have reached its limit of cognitive load and there are some techniques we can use these days which we know work to help us do all these things so whole team techniques like mobbing where the whole team comes around a single keyboard brings multiple viewpoints to solving a problem we solve that problem with very high quality we've reduced the likelihood of downstream problems and bugs and so on and then we move on to the next feature that's a very whole team approach to getting work done we can use techniques like domain driven design DDD to help us establish effective boundaries between different parts of the business domain and therefore assign the responsibilities to teams to match those domain boundaries we can emphasize developer experience developer experience sometimes called dev X where we've got a strong inference we've got strong emphasis on the experience of developers and other engineers have of using other parts of the software estate platform tools this kind of thing so that so that we're making sure there's as little friction as possible in using various tools and parts of the platform and so on we also need to focus on operator experience so whoever is running these systems in in production whether it's the same team or whether it's a separate team maybe it's sres or ops people whoever whoever's running it we do need to understand what their experience should be because if their experience is terrible when there's an outage we're going to be hurting that mean time to recovery Road from there from the accelerate metrics we need to be building in operability as a first-class thing for our software so that the operator experience is excellent in the book we talk about something called a thinnest viable platform so this is the concept where we need we are going to need some sort of platform underneath what we're building we might use to ignore it but that will be there and we're not looking to build a platform which is absolutely huge and all-singing all-dancing we're looking at just the smallest amount of platform to Celer eight teams who are building kind of application software and services and make it safe to do the right thing safe and and rapid to do the right thing we'll come back to this this one a little bit later on in the book we talk about four fundamental topologies these are kind of 14 types which as far as we can see are the only four types of team that we need in a modern organization building and running software systems we've tried hard to find more types that are necessary but we've not yet found them so if you if you're sure you've got another team type please come and tell us we'd like to hear about it but based on what we've based on our experience and so on this is what we come up with and the most important one is the streamer line team because we're trying to optimize for a fast flow of change we want to make sure we've got a team that is aligned to the stream of change from coming from the business and we've used things like DDD to help us get boundaries between these kind of different different teams different streams so that that team is able to take an idea or a change from concept all the way through to production and running it so the the streamline teams build and run applications and services and the other three types of team are there to effectively to reduce the cognitive load on the streamline team so if the streamline team needs to understand a new way of a new kind of technology let's say new kind of database type we might have an enabling team shown in green there and the second one the enabling team will come on perhaps our database experts they would work with the streamline team to help them get to grips to help them understand this new kind of database technology for a period of time perhaps it's two months perhaps it's just two weeks at some point they will enable team we'll move to a different team and have them start to help them with this new technology they're not there permanently they're not there is like a support permanently the complicated subsystem team is optional but if there is a part of the system which is really awkward and requires really highly specialists and all then we might give that particular chunk of work to to a team with with that extra expertise and then at the bottom underneath we've got a platform or there's always a platform but we need to define it very well and make sure that the the way in which we build this platform is focused on enabling the stream allowing teams to deliver rapidly and safely so the platform people in the platform treat the streamlined teams as their customers and so in in some organizations they even use things like Net Promoter Score so that the streamline teams can rank can rate aspects of the platform as if as if this were a kind of public kind of service so if we've got let's say we've inauguration we've got three streamlined teams they're running on a platform two of the teams are using a component which is kind of quite complicated so there's a specialist team looking after that last in on the left in in red and the top two teams are having some help from an enabling team to get to grips with some new technology perhaps it's databases practice machine learning something else so you can immediately see that the kinds of interactions between different teams are different depending on what they're doing we don't have exactly the same kind of interactions and needs and it's kind of dependencies between different teams they varies depending on what teams are doing in the organization and the way in which those teams might interact is it can also be different needs to be different the top two teams there that have this enabling team working with them that an even team is going to be facilitating those two teams the way the way in which that those interactions will that that will feel very different from the way in which the the component is being used by the by these bottom two team for example which the bottom two teams just want to consume this component kind of as a service if you like so they've got a nice nice clean interface nice nice easy way to install it or easy way to to test it and access it there's very little kind of additional interaction that's really needed there and likewise all these all these three teams here stream align teams they can just consume stuff from the platform in a very straightforward way there's a nice API is nice documentation for it it's nice and straightforward the team at the bottom the streamlined team at the bottom however is collaborating with the platform on something new perhaps they're moving cloud provider or perhaps they're changing the way they do infrastructure automation or something they need to interact with the platform team in a different way so we've got different kind of team interactions at different parts of the organization at the same time depending on what's happening this is just a snapshot in six months time the interactions will look different because they're team - doing something different so there's a kind of an important point that the purpose of the purple that the purpose of the platform the enabling team the complicated subsystem team are there to reduce the cognitive load on the streamlined teams to enable them to own their parts of the system effectively we're expecting to interact differently with with other with other teams in the organization and this starts to help us to move towards the concepts of kind of environmental scanning in this case it's our internal environment within the organization so dr. Naomi Stanford who's one of the world's foremost experts on organization design talks about an environmental scanning is a really crucial aspect of how organizations should expect to set themselves up for success and this the patterns we're talking about today start to touch on on that so let's have a look at some case studies now Thank You Matthew so I'm going to talk about two case studies the first one is from a large worldwide or Taylor they're still growing into new markets and so they realize we're kind of traditional enterprise our delivery cycles are very slow so we want to do something different so they had a specific market that they wanted to enter and they said we need a new mobile experience so we're going to create a cross-functional team and give them the autonomy to decide whatever architecture you think is the best to do this so this team had all these good practices around DevOps continued delivery using public cloud etc and add this iterative approach so they very quickly were able to deliver something working and then iterate and improve over time so it's a very concrete success story for this organization you know kind of success stories you'd put in this presentation like this so what happened next is that because they were successful they were asked to do another mobile experience for another market so this you can see they start to have a bit more complexity in terms of backends and they needed a CMS to control different types of changes to different markets and this went on for quite a while so about a year and year-and-a-half later you can see the team has grown considerably and the system around them as well or the system they're responsible for so you start to have more back-end services product catalog framework with shared services between different mobile app applications etc the interesting thing here is that couple of people in this team kind of the more senior architects we're realizing that actually our delivery cadence is is slowing down we're actually starting to have more dependencies within the team and what's happening here is as you can see this is becoming a little bit of sort of a monolith that the tree the team is working on and you start having people who are specializing in certain parts of the system so those people become bottlenecks you know if we need it to change this part of system only one or two people know how to do it effectively you start having different work streams within this larger system and some of them are blocking each other so what they decided to do and at first they had a lot of pushback in against this decision to split the team into two to smaller teams you can see the on your right side one of the teams is more focused on the front end experience and the product catalog and on the left side the team is more focused on kind of the backend services so but because the team was working quite well before they were not really very happy with his split but they did it and it turned out quite well because actually most of the time they could work independently on their part of the backlog on their features but obviously there are some that were cross-cutting across the two teams and for those they represented you know between the two teams you can see those two blue bars that means you know they have a very considerable amount of communication between the two teams you could almost see it has paired a pair of teams that come together for specific needs so there will be some features there will be some changes where they need to synchronize and actually work together for a period of time but this is intentional is explicitly designed like that and the rest of the time they can work more independently so this worked out quite well for them they even went on to further split I believe now they have kind of front-end teams almost aligned to a single market so they can go as fast as possible to meet the needs of that specific market and on the backend they also split and they aligned to what almost one service per team so it what was happening here is that as the team grow and the system grow it was you know becoming more monolithic and having flow of work being blocked within the team so but they were able to listen to some of these triggers that okay we need to evolve what was working before and the structure we had before is not working anymore so software growing too large over specialization so people like you know Brant in the Phoenix project who are the only ones who know how to change part of the systems are supported and just overall increased need for coordination spending more time coordinating different changes etc even within the team so the other case study I want to talk about is from our systems they are a local platform vendor and they also grown considerably in the last years in particular they had one team which was called engineering productivity team so they were helping the product teams get better in terms of these domains of continuous delivery test automation build and continuous integration as well as infrastructure automation but this was over time they were acquiring more responsibilities in these different domains but what happened was that again they had people had to specialize in one or at most two domains because it was very difficult although they wanted everyone to be able to work on everything in reality people had to specialize because it was too much cognitive loads and so they realize we're actually getting people demotivated and not engaged with the work because there's so much happening and they were just trying to kind of stay alive and and respond to the product teams that alone was very hard so again they they also decided to split into smaller teams each of these smaller teams is aligned to a single domain and they don't have a team lead anymore so it's a flat structure within the team and this quickly proved to be very useful very successful for them because if you think about the intrinsic motivators for individuals so if anyone has read the book drive by Daniel pink he talks about three intrinsic motivators autonomy mastery and purpose so each of these teams were much better in a better place to have those motivators because they had a shared purpose a single domain of focus that they were engaged with were interested in they had more autonomy to decide okay what are the priorities for this domain where do we want to go what are we missing as an organization and mastery in the sense of okay let's we have the autonomy to allocate effort to improve our knowledge to learn new techniques maybe go to conferences try out new tools etc so this worked out quite well for them and you can see again there are cross cut cross-cutting concerns maybe some requests we'll need people from different teams to come together because the different domains but that's kind of the exceptional and what they do in that case is they create a kind of micro team for a period of time when we're going to work specifically on this request or in this feature that is cross domain but most of the time they're able to work independently as they are aligned to a single domain ironically this team engineering productivity was created to reduce the cognitive load on the product teams but they themselves fell victim of too much cognitive load too many responsibilities so it's not always just about software size think about some some teams are more support teams or productivity teams so they have domains of responsibility that you need to be careful that they're not overbearing for the team so if you aim for teams with this kind of high cohesion internally this shared purpose autonomy and mastery that we talked about that can be quite powerful and between teams there will always be a need of coordinate of coordination communication but that you can try to make that kind of the low bandwidth minimal minimal communication that you need and for most of time they are independent they can work on their own backlogs so again they were listening to triggers for evolution awkward interactions within the team or people not invested some people were at the point of almost burnout and leaving the organization because they didn't like how how the work was being done and frequent contacts switching every time we switch contexts we need to kind of upload to our working memory they need the skills and the domain knowledge that is necessary for that problem or feature we're working on and back to Matthew thanks Manuel so technically we're out of time if you're happy to leave thank you for coming otherwise I'll take about two minutes to run through some few extra things here are some ideas for getting started go and ask your team how confident they are or rather how anxious they are how much anxiety they have about the software that they're working on so I can get a sense for that try and try and get to the point where there they feel comfortable giving you an honest answer because the anxiety about the software they're working on is a leading indicator for potential problems in production and we want to use leading end indicators rather than lagging indicators right so if we can actually assess the lot the the sense of how confident the team is that they understand everything about the software they were working on that can be a powerful indicator for whether we're likely to get problems later on how we exceeded the team's cognitive load do we therefore need to pull some things into a platform maybe maybe not it depends are their skills or capabilities missing within the team so these these things are signals if we if we if we've gone beyond the team cognitive load that might indicate that there are other things we need to change around that team these slides obviously going to be available online think about what is your platform how is that defined how the teams understand what they are consuming from that platform how good is the documentation how good is the developer experience for using the stuff that's in that platform because if any of that stuff is not first-class then you're increasing a cognitive load on the streamer line teams that are supposed to be developing software and why would you do that we need to minimize that kind of extraneous cognitive load that has evolved around how do I deploy this component how do i how do I update the package whatever it is we want to minimize that kind of extraneous load so work out how easy it is for teams to use that platform to understand how to how to how to use a platform and so on so that's the kind of developer experience so here's the book with book signing at I think it's 7:15 this evening in Chelsea if you're interested in training get in touch we have some options available we are also looking for kind of industry case studies we're talking to three organizations at the moment who have started to use the patterns and ideas from the teams apologies book we're talking to a global manufacturing company a large government a Marathi two large government agencies a company involved in kind of global financial services but if you're working in a situation where you think you've got some interesting dynamics in your software delivery challenges then just do get in touch if you find the material useful we've got a newsletter sign up if you like thank you for coming [Applause]
Info
Channel: IT Revolution
Views: 1,781
Rating: 4.8571429 out of 5
Keywords: devops enterprise summit usa, does 2019, devops, does19 las vegas, devops enterprise summit, does19 us, devops enterprise summit las vegas, Manuel Pais, Matthew Skelton, Team Topologies
Id: jyaS1gy1XkM
Channel Id: undefined
Length: 32min 42sec (1962 seconds)
Published: Fri Nov 08 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.