HashiTalks: Deploy - Day 2

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
employ this is amazing well i'm glad we didn't pick friday oh we're live um you don't want to deploy on fridays this is an unspoken rule no matter how much you trust your infrastructure wait are we live we're live this is awesome sorry about that welcome to day two of hashitak's deploy this is going to be exciting um we were just reviewing technology and deployment days uh and also the schedule for today but also the schedule for yesterday if you were here yesterday you had the great foresight of enjoying a day of really really good sessions i can tell you which one was the best one but looking at the youtube stats you all seem to like everything um i appreciate that our speakers appreciate that so i want to start out by saying real quick that without you the audience this event would make sense thank you for being here without our speakers it wouldn't make sense to do this event because you would have to listen to taylor myself and we both know you're not here for that yeah stand-up is a different thing that's uh stand-up is definitely a different thing so with that in mind thank you all for contributing to this my colleague cole posted something on twitter gave me a shout out i appreciate that a lot but i'm not the only one making this event happen i want to give abby rosemary taylor a huge shout out also the rest of our team who helps us review these talks helps us edit things and sometimes helps us catch the things that we want to be catching it doesn't always happen this way but we're humans we learn from this and we will deploy better the next time so before we get into the schedule for today quick word on our code of conduct get the rough stuff out of the way first as with all of our events we have a code of conduct we require you to be welcoming inclusive and friendly this is a professional environment so please conduct yourselves in the same way that you would if you were in the company of your peers because you very much are be considerate and be respectful if you see something that makes you go like hey this is a red flag tag us on twitter let us know send us an email where how she talks at hashicorp.com somebody will take care of this we want everyone to feel safe and welcome and this is very very important to us next up uh if you want to join a conversation for other reasons hashtag hashitalks on twitter the youtube chat most of you have already found the way there i can already see uh angel is there arnold abby wow do we also have people not with an a tobias there we go this is beautiful so many people so much good content and i am very very excited to kick off today's session with part two of what ray talked about yesterday jamie who also works on the waypoint team will be talking about deploying on day one as a waypoint user on day two of hoshi talks deploy with that switch over fantastic we'll get this all loaded up and we'll see you in just a bit welcome to deploying on day one as a waypoint user my name's jamie and i'm an engineer on the waypoint team here at hashicorp where i work alongside ray whose talk on using waypoint with oidc you may have seen yesterday i'm coming to you from my mum's house in the east of england so if you hear the sound of children running around in the background you'll have to endeavor to take it in stride as will i for the purposes of this talk we're going to step back into the world that ray created yesterday and i'm going to adopt the role of jamie intern at cranscorp as an intern i've been given the traditional task of creating the company's intranet i've been told that i need to deploy it to the company's kubernetes production cluster the only little problem is that i don't really know what kubernetes is i don't know where the production cluster lives i don't know how to access it and if we're being honest i'm not sure i even should have access to it what i do have though is this email from ray telling me that when i'm ready to start shipping i should check out waypoints well i think i'm ready to start shipping so let's check out waypoint it looks like i can authenticate using the transcorp identity provider that i'm already familiar with so let's try that the familiar blue the familiar green login screen [Laughter] i'll put in my credentials and we're authenticated okay i mean that was easy before we deploy anything let me show you the app first i know how to boot it up locally so let's let's see that yarn dev and it looks like it's built so let's go visit in the browser what do you think quite handsome i think i mean granted i pulled most of it from tailwind ui but as a starting point i think this will do nicely so let's see about deploying it create your first project here okay uh project name let's call it cransnet grants.net and we're all about get ops at cranscorp so i'll connect to repo for sure okay create git source url okay i guess i can go to the repo and grabs this so let's do that okay i'll copy that i'll take it back over here paste it in okay git ref uh let's go with main rather than head and i've already set up an ssh key for this purpose so let's use that username get and paste it in just here okay waypoint hcl config location well i know i don't have it in the project repository so i suppose let's go with the waypoint server okay i think now is probably a good time to go read the documentation let's see waypoint hcl overview okay rather than watching me read all this let's jump forward in time to the point where i figured out the minimal config and let's talk through it so i've called the project transnet i've declared a couple of variables here docker username and docker password these are variables that we'll use further down inside this config file so then i've declared an app called web and i want to build it using pac which is the cloud native build packs plugin apparently doesn't require any configuration i want to push it to the docker hub registry using this other built-in plug-in docker i will label it jg white transnet i'm going to tag it with this get ref pretty helper function and then i'm going to make use of those two variables i declared above and then once things are built we will deploy and release using kubernetes which also apparently needs no configuration okay i can't believe this will work right off the bat but let's try initializing you know okay yeah there was no way so let's see um unset variable docker username a variable must be set i'll have a default value oh of course i declared the variables but i forgot to give them any values let's let's go do that input variables okay let's add our docker username in here docker username jg white save that and then let's add docket password to go along with that please don't steal my token okay let's see what kind of effect that had initializing again and the web app has appeared i i guess that worked let's take a look what's inside it okay nothing so far but there is a big inviting blue button in the corner so let's let's press it all thing with ssh as we said building web maybe if i have a look in that builds tab i can see what's going on building with pack that's right get some output data okay well this is going to take a couple of minutes so let's skip forward into the future again by which time waypoint has released the app web deployed it i having deployed it having built it and emitted copious build output from cloud native build packs let's go and take a look at that deployment deployment v1 deployed by kubernetes it says it's available all two resources are reporting ready and dare i click on that url cranston it lives it's all there present and correct and even better i've got a real url that i can share with my manager the designers my colleagues ray herself this is wonderful what next um well if we know that waypoint hcl works i guess i could now commit it to the repository let's do that so if we go back to project settings let me go copy this waypoint hcl and let's go now back to my editor and stop that dev process let's make a waypoint hcl file paste that in there yeah it's all there okay let's commit that add waypoint hcl and push okay now let's uh flip back over to the waypoint ui and we can flip over to pulling the config from the project repository and now we might as well turn on automated deploys i don't see any reason why not i feel like i'm getting pretty comfy with the waypoint ui here but i imagine there's even more power to be unlocked by using the cli so let me let me do that i've seen that button in the top right this whole time luring me in looks like it's a one-liner i'll copy that go back over here paste it in okay that's all look like it work how will i try this out maybe the project list command there's my project um how about waypoint status status that's looking good let's zoom out a little bit great we've got that running so um now i'm kind of curious what does my deployment actually consist of inside of kubernetes let's go and see if waypoint can tell me so here's my deployment let's click on in here so deploy by kubernetes it's available we've already seen some of this it's telling me what image it used and that get get ref pretty uh and the commit that triggered this um this build as well and then below we've got some resources so the resources created by this deployment a deployment kubernetes deployment kubernetes pod and then for the release a kubernetes service let's take a look inside the pod i assume that's where the action will be so now i can see the health specifically for the pod and some labels and lots and lots of state chase on and i imagine if i knew anything about kubernetes then i'd know what some of this meant let's take a look at the service just for comparison similar kind of thing okay well um how about let's take a look at it in the cli i know happens to know if you pass the app flag then you get a bit more info and so here we can see application summary deployment summary resources for the deployment and for the release great i feel like i'm really building up some momentum with waypoint next challenge let's change the app a bit and deploy again so let's do something simple really really simple we'll add in a new announcement to the list over on the right here's the data in turn deploys app so let's commit this say like add announcement and then push and then we will magically jump forward in time again at which point we find a second deployment has automatically appeared it's available do we there click the link again and there's the new announcement okay let's check the resources for this new deployment see what that's all about so web v2 deployment webv2 pod looks like it's sharing its release with the the previous one so just that one service called web but that makes sense what do i do next what's the next challenge this is going too quickly um i know let's um let's try loading data from a real database so let's see what do we need to do in here let's take away these statically defined announcements and then we'll need to send them in as a as a prop that should be all the changes we need to make here now let's jump up to the next thing above okay and let's paste in some data loading logic right here okay so i'm importing the client library implementing this get server-side props function connecting to the database and then performing the sql statement and then returning the response next i think i'll need my main like page function i don't know very much about this framework but i'm led to believe that i need to then plumb these props into this right here and then i guess i need to send them further on down into that home component [Music] okay i think that's everything let's try it out locally seems to have compiled let's go visit in the browser this is just using a local database at this point and there's my announcement okay well this works locally so let's commit those changes git commit say like load data from database or load announcements from database okay and then let's let's now figure out how to get that to work on waypoint so let's read the docs again i basically want to set up that environment variable so let's have a look into how you do confidence environment variables so we go app configuration dynamic values internal values let's start with the overview and as we've done before let's jump forward in time to where i've read enough of the docs to understand what i need to do let's add a config stanza to my project now i've been told by the sre team that there is a database for me to use in this production cluster and there is a kubernetes secret that helm puts in place which i can pull the password from my app has permission to do that so let's first set up a variable it just pulls that password so this is a an internal um variable that we can use elsewhere it's going to dynamically pull its value from kubernetes a secret called db postgresql and the key within the secret postgresql password now what we'll do is define the end vars we want waypoint to manage we've got one called dburl which you saw earlier when we did this locally we're gonna uh interpolate config internal db password in there into this overall url and i know the database url because again i've been i've been told it by sre okay let's make those changes db url conflicts at waypoint and let's push okay well again this will take a little while to rebuild and redeploy so let's skip to the future where i've got a new deployment it's available dare i click the url i dare there it is it's live with my one announcement from the database in the production cluster hooked up to my app all managed by waypoint i feeling quite proud of myself at this point what might be interesting to see is if my if my app has access to that data voice database datavoice that database maybe i can access it from within the context of the app let's try that so if i want waypoint exec bash that will give me a shell in the context of my deployment notes heroku in there that's because we're using cloud native build packs which you've built on heroku tech um so now i can use that db url that waypoint is managing and i can see my announcements table and i can select the announcements from that table and there it is so let's recap what we've managed to achieve i went from an app that i only knew how to run locally to deploying it in a way that gave me publicly available urls uh to deploying a next version and getting two distinct deployments which i can both of which i could view um to hooking up a real database and having waypoint manage the config variable that makes that database available to the app all in a completely declarative way without having to know anything about kubernetes along the way i mean ultimately i probably would like to learn kubernetes but for right now my main concern is deploying the application so um i hope you enjoyed everything you saw so far uh that's all from me the the that's all from me and it's also all from kranzcorp the fictional company for this demo the eagle-eyed amongst you will have spotted numerous places throughout the talk where timestamps in the ui betrayed me and gave away the fact that this wasn't really a continuous recording but i hope that the impression was that the fiction was upheld anyway um so thank you so much enjoy the rest of hashi talks deploy and i hope you check out waypoint soon oh one other note some of the things you saw in this demo are from the the main branch of waypoint not necessarily in the current release version but they will be in a released version very very soon so uh with without further ado i hope you enjoy the next talks and see you very soon bye bye ah truly fantastic that was just good uh i i love the dare i part we've all been there the the thing i enjoyed most was ray yesterday hinting at 0.7 features today getting a preview of 0.7 features and now it's now it's here this this is well it's not here but it's it's here on screen and by the time you view this recording in the future 0.7 will have landed this this is all just very exciting with that i think it's time for our first nomad talk of the day very excited to introduce forrest anderson who's been a great help in our community running a community discord helping people out in spare time figuring out how our tools work and bringing people together this is this is my favorite kind of work uh first over to you perfect let's hear more on nomad for students hello everybody uh today i'm going to be giving a lightning talk on nomad for students and so a little bit about myself my name is forrest i'm a student in ottawa um at carleton university i'm a rust station so i like writing rust code and i really like nomad and so a little bit about the the context of this talk i've been using nomad at carleton for several months now to deploy a few applications and i want to take a look through what this process looks like but more from the students perspective and so a few tech specs of what we have we have three server vms going um we have three client vms that are strong enough lots of like a 32 vcpus 64 gigs of ram so not a massive deployment but we can deploy quite a few projects on it we have one ingress ip and we have definitely less than 100 of time that's not the end of the world for us uh when when i look through nomad documentation and it says this is not recommended for production environments i'm like but we're not a production environment and so um from the student's perspective students have projects a lot of the time right like if they're doing something for a resume or if they just want to try something that's pretty cool oftentimes they'll be doing some type of web project or something that will need some type of back-end deployment or front-end deployment or whatever and so this is sort of what the objective of our nomad cluster is is being able to give students the the ability to to do this um but but from a student's perspective who's just come in and just done their first ruby on rails tutorial or react tutorial and they want to get something deployed they're going to hear um quite a lot of things uh when once they got started in the deployment world and so one of the the obviously like the biggest things you're going to hear is like what's this cloud thing right and so like the cloud can sort of be generalized as just a lot of computers that are somewhere else and you can connect to them and you can run stuff on them um but that's really what they are and then i mean the next step that you would take once you go down this rabbit hole a little bit is like this idea of cloud native what's cloud native well um it's sort of like a lot of services that are running together but the cloud native is also like a lot of levels of subtraction on top of that right like we want to be able to reason about a lot of these services how are they being run how are they being started how are they connecting to one another and so there's a massive world of things under the umbrella of cloud native um but it's really about like the art of uh running services like at bulk and so of course like the next thing that students are going to hear about um when they when they go down this path of looking to deploy and finding the um sort of like largest way like the most popular way to do this they're they're going to hear about kubernetes um hopefully they hear about docker first but let's say that like they're going to hear about kubernetes somewhere and so um the thing for myself is when i got started learning about a lot of this this world of um cloud native and this world of deployment um it seemed to me like kubernetes is really the only option that i had um like to be able to use and so i think that for myself i really started by going out and looking in the kubernetes community and seeing that there's like a lot to not just do with like production usage but home lab usage and side projects and a lot in the world of kubernetes um and so this is like a really great thing to see there's like a massive community so many people are using it it's used at these big companies but then as soon as you start going down the rabbit hole of trying to learn it yourself there's a lot you have to do there's a lot of configuration to set up your your first cluster there's a lot of terms that you have to understand even for like the most basic deployments um and if you have if you want to have any production environment at all you have there's a long list of steps that you have to go through um from installing kubernetes on your your machine uh or on your cluster to being able to provision certificates to having uh different like control planes and all of this stuff and from the outside you can go and watch a lot of tutorials but there there's a there's a lot to do and so this is what brought me to nomad is that i was looking for other alternatives out there and i wanted to find the term for that thing like i i just want something to run run my uh my services i have my docker containers i really like using docker compose on my one computer but when i want to deploy i i need what what is it what is it that i need to be able to run these services and so the word i was looking for was orchestrator and there's really not a ton of options out there for orchestrators but in looking around i found nomad and this was sort of my entry into the the hashicorp world where i then found a lot of the other products that are available as well and so um essentially like for me the the biggest things i found really nice about it was that it was just like a refreshing exit of like this like all of this terminology that i didn't want to know right off the bat um it gave me a single binary that i was able to start running some applications with in some simple dev mode and a single purpose and so i don't necessarily have to consider all of the um the the things that i have to do to get this one thing running um instead i can just say okay well i'm going to start by running one service and it ran i'm like oh okay that's pretty cool and let's just try a little bit more a little bit more and then of course that leads you down the path of a lot of different uh tools that can that can work together but the the very core principle of orchestration is uh is what it focuses on and so um at carleton we run several projects um out of out of our our nomad cluster and so uh these stem from projects that students have started um with the uh like something that a student has made for students to use um but we also have a few other services running here and there um to help either with some um introspection into the cluster or some just back and stuff or anything like that and so we also keep all of our code open source and so the idea behind this is we want students to be able to come and learn um how stuff is done and then be able to sort of enter this world of um becoming their own uh sre uh like like people can go and set this up themselves um so just a quick overview of the sites that we have that are running so this one right here discretemath.ca was created by students who wanted to help uh create practice problems for the second year discrete math course and so this is one that a lot of students struggle with and so this site just has previous midterms that the professors have posted all aggregated together um and you can go in and do questions and so students saw this problem students created an application to solve this problem and then needed some way to host it so we have university infrastructure and this is sort of where i bridge the gap is that like you i mean if we take a look at like what's uh what's being run here we have the back and the front end and postgres and you can run that on a virtual machine just by itself but then as you start scaling and as you start adding more potentially microservices or um you want to build a second project or something then there's a lot of technical burden that can that can come up down the road so um this is the second product we have is called merged and this is a way to show off um project site or events that are being run at the university by the different technical clubs and so i can see the events i can see when they are and i can see if i click on it it will bring me to the website um that that is relevant to so um i just want to take a look here quickly at some of the the code that comes along with this or rather is used to deploy this and kind of just i mean if you know nomad this is really not nothing special um we're using traffic as our ingress or our reverse proxy rather we're using connect to connect the backend in the database but what i want to bring attention to is for a student who doesn't know very much about at all about the deployment world um if they were to take this template that is used for merged and then just duplicate it and start working on swapping out the docker image that's used and the host name that traffic is looking for then there's really not that much overhead to to getting something running on the cluster of course there are a few things that would be a little bit different like this idea of like a service mesh and connecting ports um that are running across different clients there's a little bit of stuff there but in comparison to the like the tidal wave of information that i had to try and set up to configure some um some kubernetes uh services this definitely feels a lot a lot um easier to me and a lot more approachable especially from students who are just entering this world and trying to teach themselves all right and so um with this setup that we have so we're running a nomad at the university right now this is giving us the ability to run projects that students are creating and the question that comes up is like well can any student use this if any student makes their own project are they able to to make use of this technology themselves and uh we definitely hope that the answer is yes and we want um really for this project to be able to help improve the cloud native literacy for students so when students are coming into a lot of these ideas how their how are they approaching it and uh what are they able to do to learn it and so in the future we're hoping that we can really expand this out as an ecosystem to be able to teach students through workshops get more students helping to operate their cluster in a safe way but then also offering this as a service to other clubs and societies who have their websites that are being hosted in different places and being able to just keep it within carlton and so that's it for my lightning talk you can reach out to me on github and twitter i also run a separate from hashicorp community discord where i try to encourage people to come and talk about projects that they're working on and help learn together please do reach out to me on uh anywhere you'd like to to ask more questions and i do hope that i can come back for a longer form talk sometime in the future with uh with some cool stuff thank you very much so cool just super cool to see i love seeing this kind of stuff especially coming from students is it scary that we were not at that level as students just moderately uh but that being said definitely a great lightning talk i'm looking forward to watching this one again when i have a little more time to focus on that in the meantime forrest and all the other speakers or many of the other speakers are on twitter we have a twitter list for you which you can follow by clicking the link that will get up in the chat momentarily as soon as we've figured out how copy paste works which is one of the uh the biggest things in technology these days in the meantime we'll also get the discord link up because i realized that while we could click on the slides that forest shared you might not be able to definitely worth highlighting that community and again stop by say hi for us for our next speaker we've got cole morrison joining us and talking about he's a fellow d.a colleague and i'm very excited for him to talk about a general approach to automating your deployment process and the things he's learned along the way one hour really clean friday i know that we have infrastructure as code but this is going to be infrastructure's coal if uh if i do say so myself i'm very excited to see this talk excellent use of words awesome let's kick it over let's do it see you later heidi thank you for the introduction and thank you for tuning in to my talk here a general approach to automating your deployment process so what is this about you know my fancy title here aside well in any given deployment whether for infrastructure code networking some combo of all the above or maybe even more our ideal so our end goal is often the automation of said deployment you know and in this journey we're going to generally see two camps here the first camp is so interested in the details and technology that they wind up missing the forest for the trees and by that i mean they'll lose sight of the end goal of automation why well because they're so caught up in the implementation and doing more faster right that the end result winds up being rushed or even worse confined to some local maximum that discounts the future or a variety of different edge cases but on the other hand the second camp will get so caught up in the higher level phases so planning planning planning more planning uh throwing some business jargon and buzz terms that they'll miss the trees for the forest and by that i mean they'll stretch out and overcomplicate the process and implementation so much the end result is arguably worse than just doing things manually suddenly their pipeline requires more engineering than the applications being deployed but wait what is the end goal of automating your deployment well i'm going to tell you it's not a continuous integration and deployment pipeline the end goal here is not the forest or the trees an automated pipeline for your deployments is just a tool if you will to achieve the desired end goal and that is to save time and reduce error yeah i know duh okay okay right um no that may seem obvious but when you've been in it long enough and you built enough of these pipelines and talked to others out there that have done the same you'll really start questioning whether or not that this is the end goal right you go to any meet up and ask anyone about their automation pipelines and be prepared because the majority of them are going to be either the most complicated long-winded piece of engineering that you've heard about or they'll be so simplistic you'll wonder why they even have one now obviously this isn't the rule yeah but honestly sometimes i wonder if it is so in this talk we're going to go through key points that make up a general approach that you can take to avoid of falling into either of these two camps both of which are forest on the forest and trees but i've often forgotten that we're actually just trying to get out of the dam forest so who am i well my name is cole morrison and i'm currently a developer advocate at hashicorp now i've spent the past 10 years in the industry primarily in software engineering and startups though the previous four have mainly been in devops both teaching and doing now if you followed me at all before taking any of my courses read any of my blogs well then you know i've spent a good amount of time talking about automation pipelines and building them well it turns out that when you're in startups and you're building things from scratch or you consult them you wind up building a lot of automation pipelines and so better for worse i've gotten pretty good at the process and as you can imagine i've got a lot to say on it so let's go ahead and get into it i'm going to show you all the things that we're going to cover here today so from the many years of going through this process these are a list of seven ideas slash techniques you know whatever you want to call them that i use for automating deployments now a couple of things here first these are in order you should do them in this order second some of them may seem obvious but i have some things to say around them that may not be so obvious and third just a terminology thing when i say pipeline i'm referring to our end result so a series of automated or manual processes that complete your deployment okay so first thing that is to do things manually first all right i know this may sound obvious and it is so let's go ahead and skip to the non-obvious part when we're seeking to create a product for users the first step is to what well it's to identify the pain points and problems that the users are looking to solve the product then becomes a solution however when setting up engineering solutions and pipelines all too often we experience the problem once twice thrice and then suddenly we decide that we're ready to build the whole pipeline [Music] how much do we really understand the ins and outs of what we're trying to solve if we've only experienced the pain a handful of times probably not much right and jumping the gun here is what leads to solutions that have an incredibly low or skewed local maximum so what i mean by that is that the solution you come up with may have a ceiling that it can't go beyond think of it like building a factory right a certain size and setup of a factory may be perfect for creating a thousand widgets a month right but the second you need to create 10 000 widgets a month i'll either be super slow or impractical because the factory wasn't built to do that because it didn't consider those things or that growth now this i know this may sound like oh this is only for teams that you know rush and build their pipeline right no because i have another curve on here so the red one that represents overthinking the pipeline so this would be like creating a factory that for building millions of widgets a month but you currently only produce a thousand while operating infrastructure designed for millions of widgets only to only produce a thousand is a lot of unnecessary overhead so the first thing to do here isn't just to do things manually first it's to do things manually many times first all right so you do this because it gives you a variety of experiences and perspectives through repetition so that you can truly understand the pain points and problems you're trying to solve now the more times you go through this process the more you'll experience edge cases unknowns and come up with ideas that you wouldn't see if you just hopped right into trying to build things there's a quote i love parroting from socrates and that's understanding a question is half the answer and that's what we're doing here okay fine cold you know but how long should i do this well the answer is going to vary for each team in project but my rule of thumb is until it becomes routine you and i both know what that means right it means your deployment is just another thing that you do you can't choose you can't of course choose to move on earlier or later depending upon your needs uh but the earlier you move on the less you're going to truly understand things and later you move on well the more you'll understand things but regardless of when you choose to move on from this step there is something you must produce and that is what i like to call a human automation script which is really just a thorough step by step and what do i mean by this what i mean is that before you leave this step you should produce a document that outlines every single thing that you do as a human to complete your deployment process so how detailed should it be well it should be so detailed that it could be a friday before you're going to be gone and unreachable for a week and you could hand this script to a junior ops or developer and be confident that they are going to do the deployment for themselves and it's going to be completely successful so step by steps edge cases uh consider things to look for and how to solve problems that come up human automation scripts and whether you know it or not from this script you've created this solid foundation upon which all of our next steps stand upon and so let's go ahead and move on to that next step and the next step is to define the ideal workflow so at this point we've got the devil you know and that's this manual process right and surely some of you have been on teams where the pro for your project were step one so having those manual steps is what you stopped at but it's frustrating time consuming and since manual good old fashioned human error can throw everything into chaos but before you go building things our next step is to figure out exactly how we'd like this process this new automated process to work from a human perspective and what i mean by this is to ask yourself how could i want to work with this pipeline we already know what the end result of the pipeline should be that's to complete your deployment but right now we're doing all these manual steps in that human automation script that we made so after the pipeline is complete what would that script look like then so for example when i was at field boom trying to define our team's ideal workflow was version control based i wanted to do this i want to make some changes to my infrastructure's code pull requests into our production branch check see these changes on a staging environment if they're good merge them into the production branch and then ultimately have that merge trigger the deployment automatically right so i wanted five steps all decided github all in code with everything else moved out of the way this was the ideal workflow that i wanted to create for our team and the above here more or less became my new human automation script and as you can see you know this was something i could hand off to anyone well-versed in github or get and by the way uh at the time i literally had never heard of the term get ops before when putting this together now when you're coming up with this workflow you want to be pretty detailed what you want out of it and you also don't want to judge it too hard so what i mean is is the whole like is this even possible like that question you can kind of silence that voice for now and let your imagination run wild uh because don't worry uh we'll crush those dreams later all right so we've got what we're doing so our manual step by step and we've also got our new ideal workflow that we want to move towards it's now time to move on to our third step and that is to research existing tools ideas and methodologies again this is not the time to begin building instead it's to see what's out there you see you don't want to start trying to construct the pipeline if your mental toolbox is limited now why well let me just throw up another favorite quote of mine which is if the only tool you have is a hammer well then you tend to see every problem as a nail this is the exact trap we are looking to escape from and it goes beyond just equipping yourself with better tooling but also getting up to speed with better ideas paradigms and also existing workflows that others have discovered you know this is so you don't wind up reinventing the wheel with sticks and stones which would be a massive waste of time the thing is unless you're building some one-of-a-kind pipeline i'm just gonna say it you more than likely not other people out there have walked a similar path and have experiences they can share other teams and companies have created projects with similar needs and created tooling and techniques to get it done and now is the time to begin researching right so why now as opposed to the beginning right because a lot of engineers will hop right into this step just as much as others will try to hop right into building and this is generally the area where our engineer alice will go down the rabbit hole and never be seen again well the difference is that if you follow the first two steps you have a compass and a star to follow you know exactly what you're looking for this makes the research far more productive because you know if you don't know where you're headed then any road is going to take you there so a cautionary tale of my own way back in 2015 with an earlier startup we had our product up on managed rackspace and we were more or less doing everything by hand in a few scripts i mean i guess we just set up on sima4ci so i guess there was that however the ideas of cloud computing as it is today infrastructure as code you know those weren't even on a radar those mental models weren't in our toolbox and so we didn't even consider that as an approach and thus the pipeline that we created for our deployment wasn't what it could have been so maybe infrastructure as code is now on your radar well if you have terraform in your mental toolbox then what you can build is far beyond uh what someone without it can and if you're still spitting up load balancers for every service in your stack oh maybe it's time to go and figure out what a service mesh is and use something like hashicorp console and oh hey ashu talks deploy here right we've had quite a few talks on kubernetes and if you're wondering hey can deploying this uh does to this beast be less of a labyrinth well maybe it's time to check out hashicorp waypoint the more tools and ideas you have the better now the biggest pitfall here is truly analysis paralysis we all know what that is and what it feels like and there really isn't a straightforward trick here because how much time you can dedicate to research is going to be different based on your situation your budget team deadlines all these factors will come into play but you must set a cutoff you must say something like i will spend x days or i will research y number of options and then i will move on and move on to what well our fourth step and that is to pick your tools and methodologies oh duh yeah yeah i know but never underestimate analysis paralysis combined with the paradox of choice because once you bring your head up from all the research you've done you're probably going to find yourself overwhelmed with you know the number of options available to you and decisions are hard after all if you pick the wrong tools uh as we've discussed in the beginning you can wind up with a very limiting or inappropriate local maximum i have all sorts of nightmare anecdotes here from going with the wrong choice because picking the right tools and technologies to invest your time and skills in is a much larger bet with much bigger consequences than we often talk about and why well because if we pick the wrong ones and we invest say a year of our time and the next year the tool is outdated unsupported or becomes the lesser option in the marketplace our value decreases and we wind up having to learn all over again and that's just from a personal standpoint from a business standpoint unless you've got enough clout to make whatever vendors you're going with stick around and keep up support you're likely going to have a painful migration in the future right i mean call me shortsighted but years ago i couldn't fathom of why infrastructure as a service would ever overtake platform as a service i really did think that things like heroku and all those other similar services out there were the future and obviously i was wrong and all of my knowledge around those failed pass options is now pretty much useless anyhow i have some criteria that if you keep up with my other courses and videos you may have seen before but there's five criteria that i use to pick technologies now the first one is how familiar i am or how familiar my team is with the tool or technology the reason this question is important to ask is because the more in-house experience you have with the tool already well the faster and better you'll be able to move however the more important side here is that if you have existing experience you have a much better view of what the thing can actually and truly do because as we know marketing and sales speak can be somewhat misleading now the second is super critical and that is documentation community knowledge and usage levels i don't care how flashy the tool is uh or the technique looks i don't care how cute their mascot is and i don't care how big their latest funding round is if the docs are terrible building it is going to be an uphill process if there is no community answering edge cases and tackling hard problems is going to be a lonely frustrating journey and if not a lot of people are using it you know how can you even be sure that it'll achieve the scale that you want now the thing is when choosing a tool you kind of have to ask yourself do i want to be the trailblazer for this thing or do i want to build what i set out to build do you want to be the one who discovers all of the bugs and oddities or do you want to build the thing that you wanted to build well probably the latter so if these elements are missing from an option it's probably not an option okay third and this is something i've made many a mistake in ignoring and that is the track record of the creators and maintainers so the other two elements are great but now we need to look at who's behind this option is it an individual or company are they active in maintaining and updating their product or is it a ghost town are they receptive to community ideas and feedback or are they set in their ways you have to understand that if you're going to pick a tool to serve as the foundation of some critical aspect of your stack or product well you better be sure it's going to be active you know back in 2017 when i was tasked with doing what we're doing what we're talking about here right so setting up a pipeline but also building on a cloud infrastructure well terraform was uh still in like 0.11 right and as cool as it looked i've been burned before by going with options based on what i want to call buzz factor right so i wound up in cloud formation and obviously i was quite wrong in my estimation of terraform uh however at the time what i was confident about was that aws would be around now i did wind up moving over to over to terraform about a year later but that's a different story for a different time and obviously now much of the world that's using infrastructures code has standardized in terraform so maybe this isn't the best anecdote for this point but you get the picture so next up functionality does it do what you need it to again super obvious like most good advice but it's not always followed we look at the different tooling and we see oh the fortune 500 use it oh all the fame companies use it and are human brains oh so susceptible to social proof i look around and say well then it must be good enough for me but since you have a very clear idea of what needs to be done you should be cross referencing both your ideal workflow and your manual step by steps you've made with the tools that you're evaluating and how do you evaluate that well if you've already vetted it for documentation community knowledge and usage levels it should be pretty easy to figure out if it can do what you need it to and if you're having trouble figuring that out well then you probably didn't evaluate the tool in context of number two on our list here so finally number five here what is more important and viable for you in this tool control or convenience what i mean by this is that most tools will generally fall on one of two sides which is doing a lot for you automatically or giving you all the pieces and expecting for you to set it up another way that this is generally worded is how low level or high level the tooling is uh so are you working with a bunch of primitives and piecing together the larger system or do you have a convenient binary for example that you just pass a few settings to and then move on now the right answer here is of course dependent upon your ideal workflow way back from the earlier step for some teams and people more control is going to be required and for others the auto magical black box is all that they need the trade-off here though is of course that with more control comes far more responsibility and therefore required time in resources however with more convenience you're often left with less optionality meaning that if your needs don't fit the main use cases of this tool you may spend many many many moons hacking up workarounds okay so that was a pretty long segway but this is a very critical step infrastructure and pipelines aside picking tools and technologies and methodologies are one of the most important things that we do in technology both personally and professionally what you choose to learn and invest your time into can either make you the next kubernetes operator or you can make you the next modulus dot io operator and if you don't know what that is it's because it's not around anymore okay so after all that we're going to zoom back up to our larger list of steps here to number five and that is to create a hypothesis and plan of implementation so you've got your tools and methodologies picked out now we need to draft an approach what do we think we need to do and how do we think we need to get all these pieces together to get our pipeline up and running and this is less about creating some scrumified workflow to get it all created you know sure that's a component but there's really two things you need to look out for here so the first is to resolve conflicts between your human automation script your ideal workflow and what's actually possible this is where we take all the dreams of your your ideal workflow and realize that a fraction of them or maybe even more aren't possible now my own experience was realizing how difficult multiple deploys were with confirmation when it came to race conditions updating only parts of the infrastructure and collaborating with multiple people on the project unfortunately i didn't pay enough attention to those aspects of it and our workflow at least for some time wound up with this whole stand in line and weight scenario whereby if you know one fix was going out then all the other activity needed to freeze the next thing is to get your team members and stakeholders input into what should be in the final pipeline again this should seem obvious and maybe everyone else got lucky but if you've never been in a workflow where one sre or lead engineer is the bottleneck for all of your deployments well then you're lucky because it's not fun especially if they're the one who built it and they're absolutely convinced that it's the only way again good advice is often self-evident but truly how many of us talk with our users regularly aggregate team feedback and dog food our own products yeah anyway the end thing you're putting together is a plan of implementation the more people involved well the more project management you'll likely have to impose but going into that aspect you know and how you might manage this is beyond the scope of this talk but you have the plan it's now time to hop to this step that many of us just jump right to and that is to build the automation pipeline now what else is there to say here at this point you're going to implement your plan and getting this and get this wonderful pipeline built however there is one main piece of advice that i hope you'll take to heart when doing this and it is this to build it simply the leave room for extensibility the biggest trap here is over engineering your pipeline into something you might need years from now instead of creating what you need for this year if we take another look at that chart i showed earlier again this is the equivalent of building a factory design to produce a million widgets when you're only producing a thousand right now however the opposite is just as bad which is why the second part of this statement is just as important and that is to leave room for extensibility if you're going to over-engineer somewhere it should be this aspect the aspect of extensibility so let's say you know you're building your infrastructure as code with terraform well it might be tempting to do the whole thing in a single project in the root module uh but it's gonna be problematic as the infrastructure grows whether because you know everything has to be maintained as one deployed as one and there's many reasons however a pattern that's going to leave you with extensibility while still keeping simplicity is going to be to just go ahead and start splitting things into smaller modules so you know maybe you've got one for your network one for your compute one for your data and so on uh although this is a bit tangent about terraform when constructing your pipelines for deploying your infrastructure being able to maintain the different aspects of it as modules goes a very long way it allows teams to work in parallel deployments that happen independently and of course more reusability however what's more important here is the thought process right if you're already building with the module mindset then you're already thinking about creating code that can easily integrate with other code this means you can start with a simple set of modules and you can add more as you need them versus keeping everything in one big blob and inadvertently playing infrastructure jenga but anyhow that is the one thing to take to heart here build it simply but leave room for extensibility okay so we are on our final point which is what comes afterwards well that has to evolve our automation pipeline through iteration and feedback so now we're actually back at the beginning of our list here you see what happens is there's still some manual component of our pipeline more than likely so i gave my ideal workflows earlier and maybe maybe this is what it winds up looking like after our whole process is said and done the first time well if we're trying to evolve this what should i do well if you're following our general approach here it's to do this process many times as i do this many times just as when i began i'm going to continue encountering pain points problems and oversights that maybe i didn't see before and as i familiarize myself with this new workflow maybe i define a new ideal workflow something that would come about by talking you know with the different stakeholders and aggregating all of our experiences into this and you probably see where i'm going here and that's what this approach is recursive when i hit step seven it's time to go all the way back to step one and begin again and this is how you move forward with continuous improvement this is how you know after defining our ideal workflow you know we do more research figure out what's changed and now we're going to be asking better questions now because we have more information and from there we'll pick our new tool and methodologies to further improve our pipeline form a new plan of implementation and build it and then the next thing you know we've completed the process again and our pipeline is even better i bring this up because once we've completed this process the first time we have that initial version of the pipeline it's very easy to become complacent i myself am guilty of this even though i complained about being uh dealing with lead sres and stuff that are bottlenecks i of course you know i've certainly played that role myself and it's never intentional and we stop here you know we get this up so we can move on to other things and our focus is just change but the question of when to do this next iteration let alone when to do this in general well is it an alignment with our original goals of automating our deployment process which is what again when it saves time and reduces error and how much when it's so much that the time that you save from building this pipeline outweighs the time it takes to build it so these are the steps for a general approach to automating your deployment process there are generally two camps that we fall into those of us that miss the forest for the trees because we're so zoomed in on implementation we're doing more faster and better and then those of us who don't even realize the force is made up of trees because we're so high level over thinking and over engineering it that it takes forever to build but neither are the end goal the end goal is to get out of the forest in our case save time and reduce error if your automation for your deployment isn't doing that well then what was the point of any of it thank you so much again my name is cole morrison and i hope you enjoyed the talk welcome back welcome back that was sobering words i'd have to say cream i think that really taking apart what it takes to build a deployment process or automate something i think coal is really able to sum up well um i i haven't thought of it like that before and uh usually it was kind of jumping to that sixth step and then just like where's the code you know let's get this rolling let's get this out there but really nice to take a more thoughtful approach to measuring twice or 800 times and then only cutting once i mean you know what it what it is right measure twice cut once measure once cut multiple times try to resolve it then i i think there were a lot of really really great learnings in this talk i think my personal favorite is probably to evolve your automation through iteration and feedback day one doesn't have to be perfect day one will never be perfect and hey you know what day 1037 will also not be perfect but that's okay as long as we keep improving that's all that matters with that i think it's time for our next talk we've got mike from the nomad team here uh yesterday mike talked about the nomad pack contest in the second iteration of that today we're getting a walk through on how to build nomad packs i'm excited for this one this this is a nice abstraction and simplification of workloads i agree and with that we're gonna pack on up and get out of here see y'all in a bit hey everybody my name is mike nomic and i'm the product manager for nomad at hashicorp today i want to talk to you about a new tool that we're really excited about which is called nomad pack it's a new package manager and templating tool for nomad in this talk i want to give you an overview of what nomad pack is i want to go over the various ways that you could use nomad pack within your organization and then i want to dive into the structure of what is in each nomad pack and then we'll go over the process of taking a nomad job file and turning it into a pack that could be consumed by anybody so by the end of the talk you should have a good sense of what the tool is how you can start using it and how you can contribute to the pack ecosystem alright so let's jump into it so what is nomad pack nomad pack is a cli tool that can be used either in your terminal or in a continuous integration pipeline and it sits between registries and a nomad cluster in these registries we have nomad packs which are groups of templatized job files and variable definitions nomad pack will pull down a pack from one of these repositories allow a user to input variables and then send these job specs to the nomad cluster to run them and manage them over time so in this example we have two github repos one could be public the other could be private and to get lab repo nomad pack will pull these packs down and then send jobs to nomad in this case running traffic jenkins and postgres so let's go quickly over what it looks like using nomad pack so right now i have a nomad cluster and i have nomad pack locally so if i run the cli command i can see the various commands that i can run and right now i'm going to list out the various packages that i could deploy so i use the nomad pack registry list command and so this shows me all the packages that i can deploy this right now is pulling from what's called the nomad pack community registry so this is a public registry that anybody can contribute to and we really encourage people to contribute to it uh so you can open a pull request and then your pack will be added to this list and then when people use nomad pack in the future by default they will have access to the pack that you've written so this is what we have right now as options to deploy to nomad so i look at this list and i say okay i can deploy some cool things i could deploy traffic i could deploy nginx i can deploy prometheus but for the purposes of this demo let's just deploy hello world so i'm going to copy that and the first thing i want to do is see okay what are the options that i could pass to the hello world pack so i will run the info command nomad pack info hello world i run this it gives me a brief description of what this pack does deploys a simple application with an associated console service and it gives me various variables that i can pass into the package so that when it's deployed to nomad i can customize it for my own needs in this case we'll just focus on the message variable so now the next thing i want to do is see okay what are the jobs that this would be deploying to nomad each pack can have one or many jobs that it deploys to nomad and so let's see what that looks like for that i'm going to use the render command so i'm going to say nomad pack render hello world and now this puts out what the nomad job spec that this pack defines looks like so i get a sense of what it's doing i say okay this is a server a service task it's a docker driver and it's using this mnomich hello world server with this environment variable hello world now the important thing about pax is that everybody can use them how they want by passing end variables so i saw that there was a message variable so now if i pass in the var message equals something uh let's say hola mundo now when i render this file oops there we go now when i render this file this message says ola mundo and now that's a very simple variable that we can pass in but as we go we'll see that we can get really complex with the logic that we insert into packs so now i have an idea of what this is doing i can now test what this looks like against a nomad cluster so i'm going to run this nomad pack plan command and this says that this will deploy a single job to nomad and it would be successfully allocated all right so it's time to actually deploy this so now i use the nomad pack run command i give it my hola mundo message i guess it doesn't like exclamation points all right now it says the pack is successfully deployed and now what i could do is i could check the status of this pack i can make sure it's up there this now lists all the packs that have been deployed to nomad the status command and if i want the status of a specific one i can check hello world and this says i'm running the latest deployment of hello world it has a single job called hello world and it is running so let's check that out in nomad all right so now if we go to nomad we see that the hello world job is running if we click into it we see that it's a pack and we see some pack details along with the standard job details so we see that this packs name is hello world it's coming from the default registry and it's the latest version of the pack and then if we jump into the actual uh server that was deployed at the port it's deployed on here we see that it's saying hola mundo so we properly took a variable from the cli we passed it to nomad pack that put it into the nomad job for us and then it deployed properly so that's a really simple uh version of running nomad pack with the simplest possible pack now you could do this with a ton of different jobs and really quickly with you know a few key strokes be deploying common applications like wordpress like nginx like aj proxy to nomad now let's jump into the details of what is actually in a pack and how do you make one okay so right now i have opened the hello world pack uh directory so this would be the directory that gets pushed up to uh the community registry so that other people can use this hello world pack so let's just jump into the contents one by one and we can explain what's in there as a brief overview we'll have a readme a metadata file a variables file then an optional outputs template and then also we'll have the actual template that makes up the job spec and potential helpers so first let's check out the readme so in the readme when you're writing a pack you're going to want to write a very simple description of what your pack is um and then how one would use it and then also call out any integrations or dependencies so here i say that there's an integration with console and so that someone who uses this will know oh you know i will need console up and running to potentially use parts of this and then i call it other integrations that this has so that people could really use it successfully i'd say in your readme just try to set up whoever the end user is for success and then also we have a list of variables that can be used and this gives people a general idea of how they could um customize this application for their needs so that's the readme file now let's jump into the metadata so this metadata defines uh some general information about the app you know what the url of the app itself is so if this were let's say you know wordpress this would point me to the wordpress homepage and the wordpress author and then this pack stanza says this is the name of the pack gives a general description and then a version and so if i were to um you know run nomad pack info it would be pulling this data from the metadata.hcl file also in the metadata file you can call out pac dependencies so we won't go into this uh in detail in this talk but packs can depend on other packs so you could have let's say reusable code to keep your packs dry in some sort of helper pack and then inherit from that in a variety of other packs so that's the metadata file now let's jump into the variables file so if you're used to terraform this will look really familiar we tried to model it based on the variables that you see in terraform and this is running with hcl as well so here we have there's a job name variable a region variable data centers variable account message whether or not to register a console service etc etc and now whenever i run this pack i could actually pass in different values for each of these variables so we saw earlier i passed in the d the message of ola mundo it overrode the default hello world and then that rendered that to the screen in this pack i see by looking at the variables file that i could pass in a different account so i could you know potentially run 10 of this pack and then i could also say whether or not to register a consul service and what that console services name is so it's really important when you write a pack to figure out what you want to create variables for how you want to parameterize the pack i would say that it's important to strike a good balance between parameterizing too much or too little if you parameterize too little it's really not that useful to pack you might as well be sending someone drop somebody a job file and having them figure it out from there because they can't customize it to their own needs if you parameterize too much um it's also not that helpful because they have like you know 50 variables to read and try to figure out how to actually use the job so i try to strike a good balance there so that's the variables file and then lastly we have the actual templates themselves that define the nomad job specs so these are created in the templates directory and any job that you want to deploy to nomad as part of a pack gets a dot nomad.tpl extension at the end of the file name so if i had multiple jobs that this pack wanted to deploy i would say nameofjob.nomad.tpl let's jump into the hello world job so this job you can see looks roughly like a normal job spec except you'll see some delimiters the open brackets and the closed brackets we'll get into that in a second but you see that normally it looks like hcl except for that now go template is what's being used with these delimiters so whenever i open a double bracket it says that i'm using the go template templating language here this allows me to do things like inject variables in this case i'm injecting the message variable it allows me to do things like call helper functions so here i'm using the quote helper function that says i'm going to take this message and pass it into quote and then it's going to render that into the job spec so in this case it's going to take whatever my messages wrap it in quotes and we'll have message equals quote so that's a really simple use of go template but we have actually more complex uses so here we have some conditional logic so this is saying if this variable exists then render anything in this block to the hcl so now i can get a little more complex i don't just pass val variables in but i could actually you know completely omit or skip certain stanzas there's also the ability to use helper templates so here in go template we're using this region template and we'll see what that looks like here all right so this is the helpers file and what this does is it defines templates that can be used elsewhere in your code so for instance if you have let's say a console health check that you know knew that every job was going to be using you could define a console health check template and then insert it into various jobs in this case we have two very simple templates one that defines the job logic for a job name if the job name is equal to nothing give it the name of the pack and otherwise pass in the job name variable as the name of the job and here we have another one the region uh template that says if the region is not equal to empty string insert the region value here and so this logic gets inserted right here into the nomad job spec all right so really quickly let's see this in action let's use the register console service variable so if i am the author of this job i can use the render command to see what this job spec looks like so i'm going to give it uh the render command with the var register console service true and we can test out the go template logic so i do this and i see that hey yes this service is being rendered so everything in here was properly rendered now if i give it a false value instead we see that the console service was not rendered in the job and when this gets pushed up to nomad no console service will be registered so that's a really simple use of go template logic to parameterize values and pass in variables into packs so that you could customize them for various use cases all right so now that we know you know what the contents of a pack are how to use nomad pack generally let's go to go through the process of actually writing a nomad pack ourselves so the application that we're going to packify is something called nexloud which is a productivity suite it can kind of be thought of as an open source self-hosted google g suite or google drive we're going to take that and nomad job spec form and we're going to convert that to a nomad pack so at the end of all this what we want to do is push something up to the community such that anybody now can go install nomad pack and say nomad pack run next cloud and get this uh you know productivity suite up and running on their uh self-hosted cluster um this is something that's been really popular for people who you know run home labs or really into self-hosting but it's also something that's also often used by big enterprises if they want an alternative to g suite okay so let's get into it all right so here are the basic steps to writing a nomad pack first you want to start with a nomad job spec and we're going to assume that you know how to do that uh if you don't know how to do that there's some resources online you want to start with something that you can push up to nomad with a nomad run command yourself and then we'll work from there then you want to get some basic scaffolding we'll show you how to do that then you want to input the metadata and the readme information for future users and then you want to go through this process of identifying parameters so these are all the things that could be variables in your nomad pack and you want to figure out okay what are the things that i actually care about allowing people to customize in this pack then we are going to go through each of those parameters and parameterize the template we're going to make a variable and we're going to pass that variable through to the templates and conditionalize what we need to conditionalize input you know the values where we need to input them so we do that for each of these parameters then we're going to test our pack locally and then lastly we're going to contribute by pushing up to the nomad pack community registry all right so let's start with our nomad job and i'm coming into this with this job already lit written and uh let's go through it really quickly just to see what it contains so it's job next cloud has some basic information on region data centers namespace etc it has a constraint that says that it will only run on linux and then it has a task group and this task group has a couple of tasks within it they are connected via a bridge network so that means they share a network namespace we have a couple ports that are static and this is because i know that the virtual machine i'm running it on has these exposed so i want my http port to be exposed on on 4001 and my db to go to the postgres port that's conventionally 5432 then i have my application this is running nextcloud's latest image it mounts a uh a volume from this directory on my host to this directory within the task this is conventionally where nexcloud expects uh this information to be has a resource stanza and then it has some common environment variables now uh nextcloud can be run with a postgres database or a mysql database in this case we're running a postgres database so we have these environment variables exposed as you can see it's very secure and then we have the database tasks this so this is running postgres again uh i have kind of an older image being run because i know that that works someone else might want to use a different image we'll get into that in the future and then i'm mounting another volume here for the postgres data again on my machine i know this volume exists here but on someone else's machine it might exist somewhere else and then more environment variables and more resources so it's a relatively simple job and i know that it works i've tested this before i'm able to run this job so let's just start from that assumption so we've written a nomad job step one is done let's get to step two and let's get some basic scaffolding so i'm going to pull down something uh here this example nomad pack registry so it's a revo we host within hashicorp and it could give you just a very basic registry to get started if you wanted to write your own packs this is where you could start you just clone this template and right now we see that the hello world pack that we're familiar with from earlier is in here so i'm going to clone this and we'll start using this as the basic scaffolding for our pack so we'll take this and we'll actually convert this from a hello world pack into a nexcloud pack okay so i'm back here and i've now taken the working nomad job i've plopped it into this pack that we got from that scaffolding and we see that it's a hello world pack let's see what we now need to change so first we need to go and change this hello world name to nextcloud we also have uh some variables that are referencing hello world variables um and then also we have uh this output that is specific to hello world we have this readme that's specific to hello world and we have variables so i don't want to you know make you sit through all this so what i'm going to do next is i'm going to change the names of all these references to next cloud i'll change the name of this file i'll delete any of the variables that we don't need and then in the readme i will get rid of everything that we don't need so with the power of video editing i'm going to do that right now all right so i removed all the information that was specific to the hello world job i've replaced next cloud i've renamed it next cloud where it needs to be renamed i've gotten rid of all the variables except a few and i've cleared out the readme so i think step two is all good all right now it's time to input metadata and the readme this should be pretty quick all right once again through the power of video editing this is done already so i just dropped in a brief disclos description of what next cloud is uh some caveats about how you would uh run it successfully i called out dependencies saying that it requires linux and it may or may not need console more on that later and then i left the variables blank because we haven't yet identified exactly what the variables will be variables will be i have some ideas that i spelled out here but we'll go into that soon so that's the readme and the metadata similarly quick drop a brief description a name and some links to attribute the author all right so that is step three we're done with that now it's time to move on to step four identifying parameters all right so i've taken the working job spec and i've dropped it into the template file but not much is parameterized right now so i think what's important is to go through this and see what assumptions are we making about our environment that might not be true for other peoples and then we'll make a list of those and those will be our parameters that will pass in his variables into this pack so the first thing that jumps out to me is this constraint this linux constraint so i might have only this constraint but other people might have more constraints you know they only want to place this job on you know certain nodes in their system so we should allow people to add their own constraints that's one the next thing that jumps out to me is the network stanza so i've assumed that i want to network stanza and i want these static ports these ports are static on my machine because they're opened up to the outside world but on other people's machines they might not want static ports and actually we advise again static ports on nomad so this is again something that we'll want people to be able to customize with the variable so now we have uh the network and the constraints as our two parameters let's keep going so now the next thing that jumps out to me is the image i'm using the latest image but a lot of people like pinning images so we might want to pass this in as a variable after that i realize that this mount is making some assumptions about the system too i have my data at this directory but other people might have their data at some other directory and so we should make this mount uh customizable same thing for the postgres task that has its own mount and also its own path for the source data so there's that and then next up i realize that these resources are custom for my system so i'm planning to run this in a home lab i can run this with you know relatively low memory and relatively low cpu other people might be running this for you know an entire enterprise and they might want much higher memory or cpu or they might want lower memory or cpu so that should be customizable as well and then lastly these environment variables i'm you know making some assumptions about the admin and the password um and also i'm hard coding in the postgres password which isn't good a lot of people want to use vault they'll want to pass in a stanza that says or a call to vault saying that they want to grab that password from vault um so that's something that should be parameterized as well and then lastly there's this question about whether this database task should exist in this pack or not now i want this database task because i'm running this on a single node i want this database running in the same name space but in the future you know i might be running a database let's say on rds or on on some managed database service and i won't want to um you don't have to deploy this database on my node myself i'd just rather point to it in the future so we can probably identify that some people will not want to run this task at all and they'll just want to run this application task so that's another parameter alright so now we've identified our parameters in step four let's take a look at the list okay so i've added these two variables next cloud image tag and postgres image tag i give people a description and a way to find all the options a link and i inject my defaults as what i was using previously now that these variables are here i can inject them using go template into the template so here for instance our old code looked like this it was using a specific image and our new code looks like this it's using the variable let's test this out really quickly so i go to the terminal and now i can render and pass in this image tag as latest when i do that we see it's using postgres latest when i don't use the variable it uses the default great now let's move on to resources and for this one we'll use a helper template all right and now we have two more variables we have this app resources variable and db resources this is a little bit more complex than our other one it's an object variable so it takes a cpu and memory value for each and you can see that these are the values that we were passing into four before as the default now when i use these i'm going to use them with this template and so our old code looked like this and our new code is calling this template and it's the same thing above in the application task so this template gets an argument which is the database resources variable and then you can see how it uses it it takes the argument it assigns it to this value resources and then it prints out the resource stanza and this is a nice dry way to write our code now next up volume mounts okay so this is our most complex variable so far so this uh postgres balance variable defines a list of objects each of those objects have a lot of keys type source target etc and then bind options has a list of objects itself and our default is right here and this is what we were using previously in our template so let's look at how this is used so in the template we're passing this variable into the mounts template the mounts helper template so let's look at that this mounts helper template defined here takes the argument it iterates over it and for each mount it creates a mount stanza and then it it injects the values from the variable into each of the keys in the stanza and iterates over each of the bind options to create a bind option stanza so that was volume outs and i'm actually just going to skip past the others as it's a very similar process you add variables you add logic you templatize et cetera et cetera so let's just go to the end now and the last step we have is to test this and then contribute this is the fun part now we have a fully parameterized job spec we have a constraints parameter passing into a constraints template we have uh the network being parameterized we have environment variables being parameterized as well optional console services being passed in and then also this conditional saying that if you include the database task you'll add a database task but if not don't render a database task and then lastly i've added this pre-start task that creates data directories on the host using raw exec this is optional again this is off by default but if you run this it'll make it easier for people to use the pack without doing pre-existing setup and that's really important so let's test this out let's go to our terminal let's render it once just to make sure it works okay that looks good we'll run a plan [Applause] okay that looks good too and then we'll do a run and this is now pushed up to nomad excellent we'll give it a second for the images to download and then we'll check back in all right the images have downloaded so we'll pop in here we'll check out this task group we have one allocation we have our pre-start task and then we have our application and database task this application if you remember was running on port 4001 so now if we go to port 4001 we see that we have this login and we gave it admin and password as the admin password and now when we log in it works and now we have an instance of next cloud up and running and now we just have one step left pushing this up to the nomad pack community registry once we push this up and it gets merged anybody will be able to run next cloud on their nomad cluster with just one command nomad pack run next cloud so here's my pull request i give a brief description of what next cloud is and i create the request and there we go so that's it we've gone from a nomad job to a nomad pack and i hope throughout all this you now understand what nomad pack is how it works internally and more importantly how you can contribute going forward i really cannot encourage you enough to contribute to the nomad pac community registry it's something we're really excited about we've already had a great response so far it's something we're going to continue investing in in the coming months and years so please stay tuned for more around nomad pack and its integrations with nomad have it going welcome back thank you this was amazing mike's explanations of how nomad pack works is are just beautiful and just as a reminder if you were here yesterday you know this but if you weren't nomad pack is not just software nomad pack is also this beautiful backpack that you can win by taking part in the nomad pack contest we'll have a link to the blog post and the contest up in the chat later to help you figure out if you want to contribute and of course the answer to that is yes because you should because it helps everyone it helps you to bring your workloads in a more useful way and helps everyone in our audience and everyone else using nomad pack taylor any comments or thoughts on this session yeah i'm really excited to see what people create with pack honestly i know that uh talking about how we package up our applications for our favorite container platforms and orchestration frameworks has been difficult you know there are a lot of options out there so being able to have something that you can instantiate with nomads specifically it's gonna be a lot of fun to take a look at and to see you know i know i've worked in the world of kubernetes for quite a while and so helm charts have been the way that i've done that and templating has been difficult in some aspects so really excited to kind of you know take a look at pack and and see what we can do there as well as use waypoint to get that deployed too so quite excited on all fronts there absolutely and i think chantano has a great call out here saying that we need graphs to see contributions before and after the talks by mike i i like this idea i think we we just need to keep track of that and see how this evolves with that i think back to you taylor and our next speaker awesome our next speaker uh is mark leblanc and his title is here for deployment time not a long time a waypoint and vault story always excited to hear when people can kind of fuse all of these uh different hashicorp tools together to create really interesting workflows and with that it's no secret that we'll be kicking off this next section this next session in this next section have a good one see you soon just point us to the way hey everyone thanks for stopping by to listen to my hatchy talk here for deployment time not a long time so does that make you think of a song it should because it's intentional um great song great band if you don't know it go look it up so who am i my name is mark leblanc i'm a delivery leader over at arctic work with a great team of consultants where we focus on automation ci cd micro services and the lot part of that job quite often finds the hash corp stack part of the discussion the talk today is around why ephemeral credentials at deployment time matters and it involves the two tools hashicorp vault and waypoint um waypoint's a little newer i've been using it for a little bit but um i think it's really interesting it has a good part of the story so let's um hide my face and get into the talk see if this works all right so what is the problem that we're talking about um i have this kind of lengthy worry description here but really what we're talking about is the fact that secrets management has become so complex the systems we're dealing with are so complex they're so sprawled um things are very dynamic in nature things are coming up scaling down they're scaling out and we have very known uh risks associated with poor secret management we all know you know we're not doing secret management properly there's the risk of making the news for a data breach for a database breach for an infrastructure breach someone's stealing your i your ip right so what are the risks with poor or like what what are these risks what what's going on here so anyone that's worked in operations or production or any kind of i.t role really we know some of these things that i've put on the screen here we know there's too many static passwords and secrets we know there's too many plain text secrets we know that secrets are being mishandled we know secrets are not being rotated i'm sure if this was a live in-person event and i said who here is aware of a password that has existed for more than a month i'd have some nods i bet if i said who here is aware of a password or a secret that's existed and not been rotated for more than a year i think there'd still be some nods it probably could go even beyond that the next one i have here is apps and developers are not ready so what i mean by that is when we talk about hashing for vault yes we all know it's a secret manager but how ready are our applications to consume these how ready or for those are those developer teams are they aware of vault and what it means to to integrate with that um quite often they're not the list goes on and on all of this puts your infrastructure your data your ip it's at risk these are the things that keep us up at night these are the things that we worry about and we hope we never have to deal with and if we do we wonder how we how are we going to handle that so that's kind of what we're talking about when we talk about secrets at deployment some of the benefits that i see when we talk about secrets at deployment time are the automated secret generation so i don't have to go to a sysadmin or a cloud admin or database admin and say hey can you can you generate me an api key or can you generate me database credentials it's all automated the other thing that's a really strong benefit is the automated secret rotation so when i think about the vault dynamic database at the secret engine for gcp for example um i can have it so that the password is rotated automatically the username is rotated automatically at whatever interval that i want so specifically in the demo we'll show you in a few minutes here i have it set to rotate every 30 seconds so that time to live for that credential is very short a very narrow window so even if someone did gain access to my system the chances of them being able to grab that credential and do anything with it it's much harder the other thing that i really like about this is the automated revocation of those credentials so if i want to pull back all my leases from vault i can do that and then that those secrets are done it's gone it doesn't even exist it'd poss be an attack vector so really strong set of benefits and those are just a few i'm sure there's a hundred more of you could think about them so the toolbox for this talk uh the demo is vault which is secrets management tool and waypoint now i kind of pulled a silly description of waypoint out of uh wikipedia definition which is very literal what is a waypoint but waypoint with hashicorp is really a nice tool to improve developer experience for the sdlc so from it'll help you do your build it'll help you do your deployment and it'll help you do your release so specifically why i picked these two tools for this talk is there's integration you can do with waypoint involved there's a couple different ways to go about it i'm going to show both one is a very native integration where waypoint kind of handles everything and another one's where um you know i wouldn't call it a cheater mode but it kind of is where we're annotating uh container pod deployments with the vault uh agent sidecar both are fantastic and there's benefits to both and i'll explain them once we kind of show it so why are we doing this again think back to those benefits one it's ephemeral it's a short-lived secret we're gonna grab it for when we need it and then when we're done with that deployment it's gone it's automated i'm not gonna have to call up anybody i'm not going to have to put in a ticket this just happens and it's just there the automated generation the automated rotation and the automated revocation it's fantastic finally i keep talking about like what if someone gets in it's audible so because it's all token based by vault you can see when someone grabbed the token where they came in from what they did with that token and it lets you kind of narrow in on you know what was that blast radius okay so let's just talk quickly about the demo agenda uh i have a very simple go application it's vault unaware so it doesn't have any native calls to vault it's going to rely on waypoint to get that information into it it's going to make some assumptions that things are available and it just does its thing we're gonna do a deployment where it's broken so we haven't set up the vault integration then we're gonna in the second point we're gonna update that deployment so that it includes dynamic credentials and that'll go out to vault uh say hey give me some database credentials it'll spit them back it'll go to its thing the third thing i'll show you is very native waypoint and vault integration where uh the pods that the app is running on is not necessarily aware of the secrets it's being injected through uh an extra layer of waypoint called entry point that will get built right into your image and then the last thing we'll do is we'll just take a little poke around at some of the visibility gains that we get we'll just sort of check the audit trail that we can see and see some of the things happening see some of the calls back to vault we'll see when uh the username passwords get updated um and those sort of things all right so let me switch some screens here and we'll get right into it just want to hide this one pull up my vs code um okay so on the right we're not gonna look at the ui waypoint but i just wanted for awareness there is a ui you can log into uh i don't really use it too much i tend to be more on the cli um okay so just to lay out kind of what what this demo is so there's a google kubernetes engine i've pre-built that um we have vault installed on a namespace part of that installation on gk is we've installed the vault kubernetes auth method so what's going to happen is we're going to build this application i've called hello creds super simple when hello credits comes up it's going to be running as a specific service account the vault knows about and it's going to say hey yeah you have access to some stuff specifically we gave it access to uh a database engine and a kv pair with a little bit of host information okay so what are we looking at here i have on the screen a waypoint config file for an application and so i mentioned that it helps you manage your build your deploy and your release so i'm going to show quickly i want to point out one thing right now because we're going to make use of this in a moment this config section here for end this is for defining environment variables so right now this is just empty we're going to come back to that in a minute and this would be made available to the waypoint the entry point layer that runs with your application um next thing it does is it can do the build so i've already pre-built this image because i don't think we need to sit and kind of watch it build happening um but we can see that it's pushing up the gcr giving a tag um marking it as not local for the deployment we're going to say deploy to kubernetes into a namespace called hello creds uh we have some empty annotations same thing we're gonna come back and see that in a few minutes here's that service account i was talking about that it's gonna rely on when it boots up and we're gonna say run on port 80. okay so that was the build and the deploy the last phase of what it's going to do for us is it's going to do a release same thing we're going to say use kubernetes i'm going to do a release in the hello creds and what it's going to do is it's going to push out a kubernetes load balancer for us so let's just go ahead and kick that off i'll just pull this up so you can kind of follow along this can take a couple minutes sometimes it takes a second or two for it to grab a an external ip address from the load balancer so we can see that it's created the deployment successfully rolled out so the deployment phase is done so notice it's switched from deployment it says release now now we're waiting for um that kubernetes load balancer to come up be ready while that's doing that i'm just going to switch make sure i'm clicked into the right browser window because a couple things are going to happen here when this is done one of the neat things that waypoint offers you is it does offer you a proxy service so when it first build deploys your application there's a proxy service that you can kind of use as a preview window it's not meant for any kind of production level traffic but you can kind of go and you can go and see your application without having it released so we've done the release on as part of this but we could have just seen it here so here's the very dry vanilla version of the application we say we see that it's it's complaining about no database credentials and it's complaining about no google maps api key so let's just take a look for a second so cube cpl let's just check out our services so we see that it built us a load balancer grab that external ip address you can see that it's got our pod running there not too exciting okay let's go back and the first problem we're going to take care of and take first take care of this database credential uh error and how we're going to do that is on our deployment specifically here we're going to add some annotations and they've got them pre-cooked over here so for anyone that's done this before these annotations are going to set up the vault agent sidecar and set them up in a template so i'll show you really quickly here so it's going to go to a path and vault called database threads my role it's going to grab some database credentials and then it's going to create a template for the pod it's going to basically dump the username and password onto that the uh the path the other thing it's going to do is it's going to grab database host information same thing going to dump it onto using a template into a specific file so let's just save that and i want to just get rid of that old deployment what did we do wrong let's look try that again fun with demos and away we go so again it can take care of the destruction too so let me just go back a little quickly let's look we'll see there's no no pods in that name space and the load balancer takes a few minutes to come down but that's okay we don't really care about that at the moment so okay let's just review what we did we did one deployment it was broken we've added the annotations we're going to deploy it again and well that does that i'm going to make this app a window so we see this other one when it's ready deployment rolled out bidding waiting waiting we can actually show you that this proxy url is dead so remember after i did that first deployment we did the destroy this is gone just waiting for this become ready service is ready there it is so it's a new one specifically you'll notice that it's got a new version so there that's cool so we have a working call to a database now so you can see that it's got the working code over here we just added those annotations um and what's neat here is i want you to watch so right now i have these these credentials are going to rotate every 30 seconds so what we can actually do to see if we can do this okay check this out so every 30 seconds these you this username password is going to rotate so if i refresh this now we're going to see that it's changed but check out this log in from coming out of a vault so what i have on the left hand in my bs code here as i'm streaming the log file from vault um we can see that it had an expiration of revoked lease oh just refresh again but my point is here uh you can watch this very clearly you can see the um the request for the new credentials you can see that it's revoked you can see um the new ones being requested let's get rid of that okay so we've gone ahead we fixed the database error but we still have this other issue with the waypoint api being missing okay so the other way that i want to show how we can do this as i talked about earlier the very native integration with waypoint and vault so what we're going to do here is we're going to actually inject some information into the environment variables for waypoint this will be consumed by made available through entry point which isn't accessible from the pod itself um and we'll log in after and i'll show you kind of what i'm talking about there okay so for the dynamic values it's going to pull out of vault so i have this prepared here so kind of looks like this so take a look so it's going to create an environment variable called maps api key it's a dynamic config variable that i can get from vault the path is kv hello creds and the key it's looking for is map api key okay so let's just go ahead and destroy that again service deleted destroy delete successful great great and then let's just do a new deploy waiting for the service to come available again we can show you this is dead got two dead ones now so we see version 10 verse 11 they're dead just about there here it comes all right so this time around what i want to show you so we deployed we had it released with the load bouncer we're gonna go ahead and grab this ip address you see that's available oh but something didn't work isn't that interesting again fun with demos let's just check gotta make sure that it was actually not working did i put it maybe it's not there what are we looking for it's there well isn't that curious so what was supposed to happen here uh this application should have read this in it does work you can go check it out i did it at hashicomp uh a couple months back but i'm not sure what's going on here failed to read vault path i know it's wrong i know it's wrong okay we'll fix this we'll fix this so i just know because of this error right here let me explain what i did here hold on so waypoint can get logs it gets logs from that entry point uh later i was talking about when i check this log i know that this is not looking in the right vault spot so what i suspect has happened is that uh we haven't actually configured the vault um plugin let's fix that all right so in my readme i've got an example of how to fix up those credentials so this right here so waypoint has a plug-in for vault this is how it works so we have it set to use the kubernetes auth method um we tell it where the what the path for that auth method is we're going to define what the address is we're going to tell what role to be looking for and because this is a demo environment we're skipping the ssl so let's go ahead and just copy this okay and then i just can't recall if that's going to synchronize that uh yeah it did okay perfect we're going to remove it off token let's take a review of this cool so that was the issue i just didn't have the plugin to find let's just recap real quick so when i first showed it to you it was still broken because i didn't configure the vault plug-in configured the ball plug-in using this stuff right here and it was able to go and get the uh you can't see it here we can see that it did grab it and then if we do our let's see if we can see it here let's try this yep there okay cool so we can see that it's coming from here we see that we can see that it grabbed the maps api key using that token um okay clear so that's that um what i want to do really quickly the last thing i'll show you before we get out of here and uh stop talking and someone just get into this pod and show you [Laughter] okay so i've exact into the the uh the pod what i want to show you is this have bash it does great sorry i just forget where i've got the uh so if i look i just want to check i think we're done with that other window for now okay here so what i just did here is i just printed out you can see that it's mounting the db cred so remember the db creds credentials are being populated by the vault agent sidecar through the annotations the other thing i talked about was the fact that with the api key what we were doing is we were setting an environment variable what i want to show you is i look at the environment variable um on the pod if i log into the pod i do not see that map api key okay not there if i get out of here and i look at the waypoint environment okay it's there so remember i keep talking about entry point entry point is uh part of the image that gets built into your paw but it's not accessible from the pod waypoint is doing that communication to vault through entry point um makes it even more secure so if someone gets a hold of your uh pod they get into your pod they can't just go and print out these things that it got a hold of for example this api key the way that it's being injected um so let's just recap what we've shown you because i know we've kind of talked around in circles um we showed a very simple go application we showed it in a broken state we did a deployment by updating the annotations we showed it going and grabbing the dynamic database credentials that were rotated every 30 seconds we showed you um adding the environment variables let's just bring this down and show you again we set up the environment variables here so that would go and grab the uh some information from a key value path did that deployment got a bonus showed you how to configure the uh the waypoint vault plug-in that's really all i wanted to show you let's just um get rid of some stuff here bring me back up find where my demo is uh so that was the demo that's why i think um looking at how we can get ephemeral credentials at deployment time it's going to help with that security discussion it's helping push that security discussion left further down the chain it's letting people that are um aware of how to handle things like security and bake it right into the platform it's easier for a developer to move quickly push my application out and i know that's taken care of at the platform layer the other nice thing with that is we can build some constraints around it so that even if they didn't um think about that that people can't get in just by other things that are natively available um finally the last thing that's really cool is if someone did get in there's a cool audit trail that we can find out what they did how they did it and uh just identify that blast radius hope you enjoyed the talk um this is my info over over here that's me on twitter and that's me on github give me a follow if you want to have a chat love to hear from you thanks so much fantastic thank you so much for that talk i feel like it's always magical to watch vault in action to see people's credentials get rotated like that and just generated on the fly uh really really fantastic to see you know the other thing that's really fantastic to see is when people actually point to the right side of their screen in a mirrored situation ad hoc it always i always find this fascinating i know you can train this i know we can launch you know multi-cloud multi-uh cluster without any problems but this this is a really hard stuff also still a very amazing demo i love it this today is going very well i'm i'm enjoying everything that's happening so far we have next i'm excited about this one i agree geo geo targeting easy stage left ugh not a number um next up we have deploy federated multi-cloud kubernetes clusters by two uh who's actually number one in my heart so very excited to see this one get kicked off i know that federation of kubernetes that's also a hard thing to do so very excited to see what two has to show us on that front let's get him all keyed up here and we will get that kicked off faster than you can say cluster well maybe not uh cluster it's like here we go my name is tu and i'm an education engineer at hashicorp welcome to this talk where i'm gonna go through how to deploy a federated multi-cloud kubernetes cluster using console this talk is heavily inspired by a tutorial written by rita a fellow teammate i'm going to be referencing this tutorial throughout this talk so just so you're aware to get to this tutorial first go to our learn page and then type multicloud from there it should be the first search result with that said let's get started so let's assume that we're megacorp and we have a kubernetes cluster running on aws we're doing so well that we acquire a startup that also has a kubernetes cluster but it's running in azure now we're tasked to configure the kubernetes clusters in a way that services in one cluster can talk to you and interact with services in another one way to do this is to spin up console data centers in each cloud provider and then federate them console provides a secure service mesh that facilitates encrypted communications between your services um ensuring multi-cloud infrastructure because of terraform's large collection of providers we can use this to spin up everything from the console data centers in each cloud providers uh to federating them to spinning up services in each kubernetes clusters after we federate them just to confirm that they can indeed talk to each other so with that said let's get started so this is the actual configuration and in this repository hashtags deploy multicloud kubernetes we have um the aks directory the console directory and the eks and hashicaps directory first i'm going to open up the eks directory so cd eks and in main.tf you'll see this is a pretty generic configuration to spin up a vpc and an eks and then in outputs.tf you'll see that it should return the cluster id region and custom name because i already spun this up right before this talk if i run a terraform output it returns the cluster id name and region same thing for the aks directory this spins up a resource group and a aks cluster and then the outputs returns a resource group name and a cluster name so if i run terraform output it should return those two items and it does so now what i want to do is configure coupes config to access and interact with these customers using coop cpl if you go to the tutorial and go to configure coupe ctl there is this nifty command where if you click on it and then paste over in the sorry the eks directory this command will automatically update your coupe config with um the eks information and store it in the eks context you can do this with the aks directory too so first navigate to aks and then paste this command and it saves all the information you need to access the aks directory in the aks context i'm going to overwrite my existing one and we're good so now i can run group cpl get pods context and eks and there's currently no pods running so that's right and then i can do the same thing for aks and that's also right so moving over to the console directory this is where the interesting stuff happens i'm going to run a teraform apply and auto approve because this takes a while to spin up but the console directory is where all the interesting stuff happens this is where we defined the teraframe configuration to spin up the primary data center in the eks directory uh this is also where we um add the kubernetes secret into the secondary data center and spin up that console data center in the aks cluster and last but not least this is where we also create the proxy defaults um i'm going to cover what that does as soon as we get to it but as you can see here it will add three new resources so first in the eks file at the top you will find a remote state and what this does is it pulls this date from the eks workspace that we provisioned our eks cluster from and with that it configures the aws provider and that allows us to pull the cluster information from that eks cluster we that we provisioned and we use that information to configure both our kubernetes and our home providers notice that both these providers have an alias to the ek to eks that's because we also have uh two another set of providers for kubernetes and home in the aks file but instead of eks3 for the aks cluster next up we have the helm release and what this does is it spins up a console data center in the eks cluster and it and the um the console data center configuration is pulled from this dc1 yaml file i'm going to open up this dc yaml file so we can go through it so the first thing you see here is this data center name is dc1 it has tls enabled because tls is required for mesh gateways which is required for console data center federation next up this federation field is enabled and the create federation secret is set to true so for if you want to federate multiple console data centers you must designate one of the data centers as the primary and the primary one is the one that creates the federation secret within the console federation secret it has information for the certificate authority which signs the certificate that console uses for the enter data center traffic so that's what we have here since this is the primary data center this is why we have this set to true next up you'll see that acls are set to false which is completely fine for this demo and this talk but please please if you run this in production set this to true next up is the connect inject field which is set to true and it's enabled by default connect inject um tells console to or tells the kubernetes classes to automatically add sidecar proxies to each part give me a second so this configures your cluster to automatically add sidecar proxies to the pod since this feature is enabled by default all pods will have sidecar proxies unless specified otherwise um and i'm going to show you what this looks like when we spin up the actual microservices after we're done with federation the sidecars route traffic to upstream services located in other data centers using the mesh gateways uh speaking about mesh gateways last but not least we have the mesh gateway field which is enabled so now that we're done reviewing dc1 the next data source in our eks file is this bks secret like i mentioned earlier up here since create federation secret is true because this is the primary data center the console helm chart will create this console federation secret that we need to import into any secondary data centers so in this case it's the aks cluster also notice that this data source has an explicit depends on what this does is it tells terraform to wait for this helm release to completely finish before pulling this value because this helm chart creates that secret so um you can't pull the secret if it doesn't exist all right since now we're done reviewing the eks file i'm going to jump over to the aks one so the aks at the top it's very similar to eks but instead it pulls the information from the aks workspace down here this data source pulls that information from the research group the resource group name and the kubernetes cluster to authenticate both the kubernetes and the home provider notice here that the alias is for aks and not eks and then down here this is the kubernetes secret resource that imports that um console saturation secret that we created in the primary data center the data here points to the to this data source and since it points to that data source there's an implicit dependency so terraform knows hey we should wait for these this resource to finish creating then propagate this data source and then once this data source is propagated we can begin to create the kubernetes secret in the aks region next up after this kubernetes seeker is created terraform will spin up the home release for the second data center in the aks cluster this helm release will will wait for this kubernetes secret to finish creating in the aks cluster before creating because the um the second these data center requires the kubernetes secret so the um dc2 yaml file contains all the the configuration for this the secondary console data center so dc2 is very very similar to dc1 um but there's a couple key differences notice that the data center up here is now dc2 instead of dc1 if you plan on federating multiple console data centers together the data center name must be unique um under tls this is also an eight volt but unlike dc1 it's using the cas key inserts from that console federation kubernetes secret that we imported from the primary data center down here federation is enabled and because this is the secondary console data center it doesn't have that create federation secret attribute and then everything else is pretty much the same as dc1 connect inject is enabled and true by default and mesh gateway is also true so now that the console data centers are created and federated through aks through the aks and eks files i'm going to go into the default proxies and uncomment out these two resources so what proxy defaults do is it tells console to or it tells and configures the um the sidecar proxies where to route their traffic um as you can see here under spec in mesh gateway the mode is set to local so what this means is that the pod or the sidecar proxies will send traffic from the pod to the local mesh gateway before sending it to the remote so the secondary data center's mesh gateway to its final destination the other mode is remote so what remote would do is it will traffic will start from the pod but instead of going to the local mesh gateway first it will go directly to the remote mesh gateway before going to its final destination it skips that remote mesh gate way hop so i'm going to run terraform apply and auto approve and the reason why we have to apply this after the eks and the aks ones are are spun up is because these two resources use the um the crds which get created during the initial console helm chart spin up so whenever you apply that helm chart it will create the kubernetes proxy default crds and if you don't run this the second time it will air out because those crds don't exist so terraform doesn't know what to do so now this is complete everything should be set up so just to confirm back in the tutorial itself let's go to this section and then scroll down to verify cluster federation so first thing that we want to do is list the pods in the default namespace in the eks cluster to confirm that the console pods are running so i'm going to copy this command and paste it in and i should see the console pods in the eks cluster and i do i'm going to do the same thing for aks and i see the console pods are running and that looks good so i'm going to go down and run this command so this gets the proxy defaults from the aks cluster and this since this is true i verified that terraform did applied the proxy defaults and everything is working as expected last but not least i'm going to run this command what this does is it verifies that the clusters are federated by listing all the servers in the console's wide area network so i'm going to copy this command and then paste it over awesome it looks like everything is working so i'm going to move on so in this step i've spun up the console data centers in both eks and the ats region and federated them the eks region is the primary console data center while the the aks cluster is where the secondary data center dc2 lives now that we've done that i'm going to go open up hashicorps and this is our demo app and i'm going to spin hashicaps up in both the eks and the aks region and see if they interact with each other so like before i'm going to run a tara from apply auto approved and while this runs i'm going to explain what this configuration does so in hashicorps i'm going to open up the backend file first so hashicobs-be so in here like before um i'm pulling all the the remote state from the aks directory because i want to deploy the the hashtabs back into the aks cluster i'm retrieving more information about the cluster here and i'm using that to authenticate to the helm provider and notice here it's using an alias for aks because in the front end we're deploying to to the eks cluster so the first home release that we have here is the product api db we also have the product api the payments and last but not least the public api so i'm going to go through the spec for each one of these home release because it shows you how the dependencies work with console so first up is the product api db so i'm going to go into templates and then the product db here it defines a service and a deployment it spins up a deployment name postgres and then down here under annotations you see an annotation called console.hashicobs.com forward slash connect inject and that's set to true i didn't need to include this in this configuration because if you remember in dc both dc1 and dc2 we have connect inject enabled and set to true by default but then for some reason if you want this pod to not have a sidecar proxy associated with it this is where you would flip this from true to false but we don't want that so i'm going to keep it as true next up we have the product api and the product api uses the the product api db so i'm going to do the same and open up this yaml file and it defines a service a service account a config map and a deployment the deployment is named products api in annotations you'll see the connect inject set to true um but here you also see something new the connect services up streams and this is pointing to postgres and the the port for the postgres service this is what the product api dv service is named so that's why we have postgres here next up we have the payments microservice the payments one is very similar to the db one because uh the payments one doesn't have any dependents so um here it defines service service account and deployment and as you can see under annotations it's only the connecting jet is equal to true last but not least you have the um the public api so i'm going to open that up and as you can see here there's a service it's running on port 8080 and you have the deployments and the deployment name is public api down here connecting checked is true and then since the public api um sends requests to both the product api and the the payments one under connect service upstreams it has the products api service running on port 9090 and then payments running on port 8080 so this is how console knows where to connect these services now i'm going to review the friend so at the top remote state is very very similar to the back end but it's pulling from eks and then it uses it to authenticate to the helm provider notice that the alias here is eks and then it only has one home release and that's for the friend and then it uses this values file just to um configure it so i'm going to pull up the the front end yaml file so we can review it it defines a service a config map and a deployment the deployment is named front end like the others connect inject is set to true but unlike the others the front end interfaces directly with the public api so connect service upstreams is set to public api which is running on port 8080 and then i specify the second data center so dc2 because that's where the public api and the rest of the backend services are they're in the aks cluster which corresponds to dc2 where our secondary console data center is set up so now since everything is up and running i'm going to run coupe ctl get pods or get deployments for the eks cluster and i see the front-end service running i can do the same thing for aks and i see all the services for the the um the back end running on the aks cluster so what i'm going to do now is i'm going to do coupe ctl port forward service front end and i want it to run on port 8080 on my local machine and it's running on port 80 and contacts eks so now if i go to localhost 8080 i should see the hashicobs app and everything is running as expected if i refresh this it takes a second but if you look down here you will see give it a second you'll see that it connected to the api and then the api is returning a value so everything is working um the the services are interacting with each other both within um the aks directory all the backends microservices are talking to each other because the public api is talking to the the product api um and then that in turn is talking to the front end service which is how we're seeing this because everything is spun up in console itself what i can do now is run a coupe ctl port forward and then give me a second and then i can also run console server 0 it's running import 8501 and 8501 and then contacts dks i should spell that correctly so i'm going to close this real quick so what this allows me to do is connect to the console dashboard so now if we go to localhost 8501 advanced and proceed let's go to the main dashboard and since we're in dc1 there's only one front-end service that we should be seeing and that's what we have here within this i can also select dc2 and then in dc2 we have the payments postgres products api and publix api as we expect so the neat thing about this is since we have console running on both data centers and we're using it as a service mesh we can define things like intentions so for this if i tell console i want a new intention that blocks traffic from the front end to the public api so front end and then public api and then i deny it and then click save what this tells console is please block all traffic from front end to the public api so if i go back here and run this command to connect to the two hash cups and then refresh this give it a second this takes just a quick second and as you can see down here api is not responding and that's because under intentions we've blocked it so if i now go back in and connect to the console dashboard and click on this and set that to allow and click save and rerun the front end let's reload this now since the so now since console is allowing traffic to flow from the front end to the public api hashicob is working as expected awesome awesome thank you so much too that was really uh it was great to see that you had good intentions uh at heart and in your demo so thank you very much on that front these are definitely good intentions the thing i really regret is not having access to an api that can get me cough caffeine needed drinks at any time of the day it's the apis might be open 24 7 but you're right the the stores might not be and that's a very sad thing yeah so with this we're nearly at the end we've got one more talk left this one is a little bit different this one is more in culture and i'm excited about this but before we kick that off i have a question for you taylor of course for a hundred bucks which set of principles and practices hit 18 years of age earlier this year interesting well i know it can't be devops or kubernetes or nomad most recruiters won't know this is it is it sre it might just be i think you know what we're gonna do is we're gonna switch over to martin and let him talk about reliability as a product feature i mean that sounds like a pretty uh reliable thing to do but uh okay cool with that we'll kick offline thank you yeah so um i'm martin smith i'm a site reliability engineer at hashicorp and i have this idea that i'd love to talk to you about called thinking about reliability as a product feature and it's really not a new idea it's something that goes kind of back to the beginning of what sre was supposed to be but um maybe something that we haven't thought about as much uh lately and i'd love to bring it back to that so um again that's me my pronouns are he him you can find me on twitter at martinv3 um and i've had lots of roles as a devops engineer and a site reliability engineer in the past i've been a tech lead for an sre team i have had many opportunities to define the role of sre um what activities they do and more importantly like try to figure out what the impact of an sre team should be um so i've worked on that a bunch and hopefully some of these ideas will give you some ideas about how to do that on your own team so um first kind of just wanted to talk about what sre is [Music] so sre at its basic is treating operations as a software problem and use software engineering techniques uh i'd say this isn't surprising given the timeline you know you sort of had agile in the 80s and 90s being applied to software development and it was basically time for people to start thinking about software engineering for other parts of the tech world aside from software development um so here's sort of this timeline of 2003 when google started doing sre is the first place that did that um usenix in 2014 started doing an sre con google published this sre book which a lot of folks follow pretty closely i put a note there because even google says not to follow it exactly adapt it for your organization um and by 2021 a lot of data shows that like a fifth of of organizations doing some kind of technical role have an sre role in it um an sre is as much a philosophy as a set of skills we'll talk about that some too um so thinking about what sre does there is one like a downside to sre that a lot of folks talk about including the gentleman that originally founded the sre team at google which is um if you if you let sre constantly sort of improve uh continuously improve um they're slowing down the firefighting that used to happen uh before sre was a thing um so it sort of can be a discouraging thing is the sre is just kind of like slowing slowing things down or sre wants to write code instead of fixing the outage or things like that so a lot of books would say you can't really succeed with sre without thinking about devops it's it's totally cultural and also you can't really succeed with sre without thinking about things like um cloud native architectures right they go hand in hand um if you can't go back to some software that was deployed in windows in 2003 and uh just put sres on it if they don't own the source code that's not going to work for example so sre has a lot that it has to do right it's not just a philosophy but it is a philosophy and a skill set and there's a lot of skills in there um so i kind of wrote some of these down i looked up some definitions of sre um it's really an overwhelming set of things uh if you look at just the list of what sre is responsible for um for some organizations that's practically all they do uh every you know everything from emergency response monitoring efficiency performance latency so much even just picking one item like observability look how many tools just googling that i was able to find that are on observability so holy moly in sre is not going to know all of those things right um but i think we also know that sras do a lot of things and in our industry those those things you know are this long recitation of things like uh incident management retrospectives um what else the all of these observability tools all of these like chaos engineering all of these just this whole set of practices um that sre does that if you look it up i feel like we might have lost the force for the trees or the trees for the forest so to speak um reliability at the end of it and at the beginning of sre was about a product feature in making a product reliable um i've also found that uh when i'm trying to explain what sre is to uh business leaders often um i i go back to this definition too right which is uh if i can't explain how what sres do affect customers directly and the reliability that they're thinking about um it's usually a sign that i'm not working on the right things as an sre so one caveat to that is sometimes an sre is about making uh that role is about making people more efficient internally because we know that that makes our customers have a better experience but ultimately um you should ask yourself like can i tie this back to why the customers are having this good experience right um am i doing something that helps there so shifting that focus back to thinking about reliability as a product feature um so one thing that this does if we just keep thinking about it as a product feature is um it helps organizations really even just think about what reliability is for their product right that could be like resiliencies that could be like fault tolerance uh that could be something like scalability can i throw a bunch of work on it as a customer and expect that it'll actually give me results um scalability uh so uh observability rather i think understanding the internal state of a system from its outputs um security does do the customers trust that the thing is doing what it wants these are all product capabilities that aren't often well understood um i put a quote at the bottom of this slide because a friend of mine likes to say that uh reliability is a product feature whether you devote engineering time to it or not right so your customers know how reliable your product is whether whether or not you've actually developed uh any kind of actual functionality to make it reliable um i'd also say like it's really important to argue uh that reliability benefits from product management support so product managers we love them i wish i had more of them in every world i've been in but they do things like they help communicate with stakeholders they build road maps um they help with prioritization like for example thinking about reliability as a product feature um what do you know who your internal stakeholders are for the scalability of your product like do you know the development teams that are throwing work at your product or adding features and then expecting it to actually be able to run those for customers um what's on the roadmap for the observability of your product right i think that's a real popular discussion these days is like what are we doing with logging what are we doing next with logging what is six months of of that improvement look like what does two years of that improvement look like um and most importantly like what metrics are you collecting to be sure you're actually delivering on that roadmap right so that's like classic product management and we often don't get it in sre as product management sometimes technical folks are in uh end up wearing the product management hat um it's also important to think about how it aligns with other features and roadmaps right if you're if you're planning to release some feature for your product that doubles the amount of work it does well guess what observability matters uh the performance and scalability matters right if you're even if you're developing a feature that's going to bring in 10x the number of customers like that's that's a huge thing and i hope you're thinking about reliability when you're going to 10x your your audience um so yeah again if you don't plan for uh reliability specifically your customers are going to get what they get and make their own assumptions about your reliability this happens all the time if you see out in the industry sites or products where people know that that they're unreliable and the customers get frustrated because customers know about the reliability of your product whether you do or not so um it starts to sound like any other product feature right i've already argued for uh product manager uh internal external stakeholders uh that's all by design like making it explicit as part of your organizational's planning organization's planning has many benefits i think um i put this quote uh there's this thoughtworks technology radar and the most recent one came out in october and um they actually talk about the trend of thinking about internal teams having products and thinking about themselves as product teams just like external teams um they actually cite the team topologies book which you can check out for how to organize your teams but when i've looked up what teams are doing around team topologies i also find this framework from simon wardley called pioneer settler town planner or i'll call it the pst framework um so i want to kind of tell you about that framework and actually i think it's really helpful to organize ourselves as sres and think about what we do as sres so um again i think this is worth reiterating there's no one-size-fits-all approach to reliability as i've seen just just showing you all the things that sres do for reliability you should know there's just there's not depends on the stage of the project depends on the team it depends on the organization all of these things um but what i'm going to do today is break this down into beginning projects growing projects and established projects and then i'm going to show you the psd framework and how that ties together so um here's the psg framework um and basically this is saying uh at different maturity levels you need different kinds of behaviors for a project right so um pioneers are doing things that are inherently uncertain lots of experimentation sometimes poorly defined things um maybe we're not sure how the product is gonna fit into the rest of the work uh there it's okay to fail at that stage there's lots of experimentation there's lots of like delay of things to do and accomplish later that you might not accomplish right now um and they have to typically be pretty agile uh and you know this is the this is uh graph here kind of or this diagram here shows kind of what that looks like and what even the life cycle from pioneer to seller settler to town planner um their how what their relationship looks like um so settlers same thing uh sort of like growing uh differentiation us like starting to uh constantly improve from just experimentation start listening to how people use a product or a feature and then town planners are kind of like uh probably the least popular type if we apply this to sre type of work which is uh defining process defining um doing operational efficiency stuff uh doing modeling this is where chaos engineering probably falls um so this is this framework of pioneer settler and town planner um i encourage you to kind of read through it look at the graph at the for the line at the bottom about how something comes becomes like an idea to a product to a commodity um over time and how they relate to each other um so now uh what i'd like to do is kind of talk about some engagement models that sre can do so um one example of this would be consulting so you might have sre office hours or you might um be available for questions or you might you might do something where there's sort of a short engagement it's quick it's defined it's in an hour a lot of sre teams actually kind of funnel work into this model because otherwise they're overwhelmed by it um so you know you say gosh this other team is asking for help constantly let's pick an hour every week and we're going to get together and that's consultation and that works really well for some for some levels of the pst model there so also process improvement is another way that sre can really impact um the organization so that's like looking for reliability or scalability patterns uh that that show up and then trying to implement processes to improve them so um this might be an incident management process or an incident action item follow-up process um it might be production readiness you might have a be building a process and framework for releasing new features um it could even be um something like uh slo and sla programs and and what a process and like a repeatable program for that looks like um there's also project collaboration is another one um this is usually uh really well scoped like we're gonna deliver this product and we need an sre to help us um so this is like uh maybe you might take a project to implement rate limiting and like sres and other software developers might pair together to propose how to do it and then implement that functionality uh there's there's lots of other ones here basically anything where you're sort of pairing together i would call project collaboration uh and then there's two others one is embedding uh which is sort of being a member of the team that sre is joining to embed onto uh you you see this mentioned in the sre book a lot um i'll just call out i put a star there because i want to call out that the sre book makes it really clear that when you embed you're a member of that team embedding is not dropping in for a sprint embedding is spending time to onboard and be you might not be the most senior developer on that team but you're um you're definitely a contributing member you've lived but that team has lived you've walked in your shoes um and then uh for an sre doing an embedding relationship you're sort of leveling up the team being a multiplier on the team so once you're a member you sort of have some free time compared to the team's roadmap but you're you know what the team is going through and you then you can actually say all right i'm going to teach the team things that they need to know or help the team accomplish things they couldn't do by themselves but you want to be a multiplier you're not just another warm body to throw road map items at and the last one um that i want to mention is ownership uh so a sre could introduce components that improve the reliability of a system this could be a kubernetes operator to improve fault tolerance or it could be like that rate limiting project we talked about before and at some places sorry might even own rate limiting um but these are the kinds of things that when there's ownership sre and the teams that are affected should be like regularly reviewing them to be sure that it makes sense for sre to own them long term because if sre continues to just take on ownership of things that eventually becomes unsustainable because they uh either end up with a lot of toil or they can't like do some of these other engagement models because they're too busy owning stuff so we've talked about some engagement models we've talked about this pst framework so i want to say what would a pioneer sre do and what engagement models might they use um so if you think about pioneer that's like new projects right that's um a small change could have a large benefit on a project early on experimental work early on might be completely discarded like a prototype that's discarded um it's really important to have sres in these early stage projects because they ask questions about reliability early and they can help teams make decisions about reliability early right this could be as simple as the making a product vendor choice that that you know if the team didn't have sres involved they might not actually pick the most reliable vendor um but also sres can help with that experimentation agile decisions uh failing faster but always thinking about reliability you always want somebody at the beginning of a project involved that is thinking about reliability so that can be sre um so have you ever seen a project get close to being released without thinking about reliability or even just operational concerns like work you have to do every week to maintain a product um pioneer sres are really a great uh opportunity here so there should they should be part of a team they should help do the delivery they should help evaluate vendors um they should build out proof of concepts with your other development teams and they should be also talking about the architecture and what it means to make the changes for the project and how it affects your larger architecture at this stage like any work to sort of cover reliability gaps could be identified by esters or again you can make a totally different decision in a direction based on concerns raised by the team um embedding you know talking about those models embedding is the is a great opportunity for here um brief consultation for reliability doesn't always work um because often the final output doesn't actually reflect those concerns about reliability or operability um so embedding is really a really great uh opportunity for an early project with some pioneer sres um their success how do you know whether it's working well for pioneers that are embedding is um the how quickly new products and features get launched uh how quickly vendors uh implementations happen how quickly something moves from exploration to concrete proposal oh that's another way to measure how successful sres in this model are doing um probably the largest risk in this phase is uh having sres end up being uh owners so this is that we talked about back in a few slides ago ownership being something sre can do if sres end up owners of a whole project or product um they'll end up getting treated like sort of the ops team for a product and that's that's all the toil that sres don't want that's the um ownership that's unsustainable long-term uh i'd say if you could help it do really well scoped embedding engagements with sres and and products and emphasize that they're sort of a multiplier resource or a training resource they're not coverage for the team once this product goes into production um that's probably the biggest risk i've seen with embedding um now you think about a product that's sort of gotten off the ground that you're building there's customers starting to use it it could be in beta or alpha or something like that it's great to have this think about sres as sort of settlers in the psg framework for a project that's in the sort of middle phase here um so what could sres do in a project like that um i think that they could help build and mature things right they can look for new operational burdens they can find toil and help automate it away you can continue to embed at this stage too if you think about what sorts of engagement models sres do for this stage it's actually great to have a series like involved in the launch of some product or feature consulting for sres also works here especially if it's a brand new team or a brand new service that's launching um sres can really help ensure everything is production ready and it meets the requirements for reliability as well as like operational best practices like you know for example sres in this phase might encourage a team to do automated database migrations instead of manual ones on the command line um i think uh at this phase uh having an idea of production readiness is super important this is this sort of drifts a little into the process type engagement but um sres that are thinking about like operational readiness can create sort of a minimum bar for a team to say like um you have to at least be able to be on call or be able to respond to something uh esseries at this stage may even help build some of the automation or on call things or even like auto scaling that kinds of things when you're sort of productionizing something um the success of these sres in this stage could be measured at like how many new services are going ga or taking customers um what does observability look like is their effective logging and metrics and monitoring success in this phase is also about establishing patterns that make projects successful right like um so sre might say we've seen 10 other teams we've we've helped in this phase do x and therefore be y successful on some particular dimensions so um sorry's in this phase quickly become town the town planner phase in my opinion just because they're constantly looking for the processes and the patterns to improve a project um again there are risks at this phase two which is that um sometimes at this phase things show up at the very last minute as far as need that could be unfortunate um coordination can be a little harder in this phase especially if a team is racing to get something to production but a um sre team might not be really the timing might not be right there so thinking about town planner sres so this is sort of like probably the the black sheep of sre activities a lot of us series don't love this uh this type of activity because it's really hard to see the impact you're having because it's um process is such a hard thing to see the impact of um in this most mature phase products and services are already ga there's already customers um announcers are starting to look at uh sound planters are starting to look at systemic issues so overall architectural problems or developer tooling that impacts reliability um i think sres can influence here like what kinds of things can they do they can influence continuous improvement overall they can identify systemic issues like repeated incidents or poor slo choices or maybe lack of an on-call process driving continuous improvement it gives like one of the best ways essays in the town planner role are effective in addition they can often find ways to eliminate operational pain or toil especially large scale toil whether that's through like implementing automation right which is the biggest tool in an series toolbelt automation uh or even things like architectural changes or uh this is the part that's less exciting for a lot of engineers is by developing process or knowledge or skills or training tooling helping people learn which can be a plot like this kinds of learnings or tooling or process or whatever can be applied large scale repeatedly to projects to make them successful there can even be as i said this stigma attached to it but the impact of it i can't overstate um sres really act as a true multiplayer in this phase just like a town planner would for the impact on every neighborhood in a town right um some ways to sort of uh ameliorate or or doubt like address this concern about sres not doing as much technical work is to pair them with um like technical program managers or uh product managers again thinking about this like any other product um pairing them with people who can help drive process and strategy is really helpful because then sres can focus most on the technical aspects of what they're improving while these other roles like a technical program manager can help with the organizational changes that are needed to improve something or implement a process or execute on something measuring the success of these let's see it's especially tricky for town planners i would say because simple metric improvements are often misleading right like fewer incidents reduced incident impact or the reduced pages that you might see or improved slos um sorry impact is more than that i think in this stage um qualitative impact is really important to figure out here talk to the sre teams internal customers the teams they're collaborating with um and understand qualitatively if it's a stories at this stage cause paradigm shifts right they help people think about on call totally differently or they help people think about um some incident process totally differently or they help they help with things that are really big like you know mind-blown kind of changes for the organization here and often that's not even true of the other sres right an sre might help a team do incident management differently and the other sres on the on the sres team might not even know um so tell planner is tricky but all engagement models do work here i would say um so uh just kind of like um looking taking a step back from that sort of like pioneers town sellers town planners model and then some of the engagement models i hope this is useful so that you can kind of pick what project is at what phase and then pick sre activities that really work at that phase um i will say that like sort of taking that step back and looking at uh reliability as a product feature it's not a magic bullet right like it doesn't help an organization that doesn't understand where its product fits in the market or like what kind of value the product delivers or like if product management isn't available or is not helpful or has a bad relationship with the development teams sre throwing sorry in the middle and treating it like a product is not going to help um it could also be that uh if a team really can't like design and deliver their features adding sre to the mix is not going to help there either right so um this ties back to sort of what i said at the beginning which is there is no one size fits all uh approach to sre so you kind of got to pick for each project like what does it need and use these tools uh so um it might even be as simple as uh using some rules of thumb in addition to these this is really popular without series right so a lot of sre teams do things like they say no more than 50 of our time will be spent on toil or um maybe every product feature that goes live must have some sort of like reliability metric or something like that but um combine those rules of thumb with some of these strategies for engagement and then at particular project stages and i think that should help focus a team to make the biggest improvements they can to reliability um i did i did put some other uh things on this slide here because i've i've done some research on like how different teams are doing sre and how organizations are doing sre um continuous improvement is a huge theme on all of them uh sre collaboration with other teams is a huge theme that collaboration can make it or break it for the effectiveness of an sre team um it can jeopardize a whole product launch and i put some other things that embedding is very popular having some of those best practices is very popular um check out this how they sre repo too and actually if you're doing sre do a pull request there and submit uh how you're doing sre i'd love to hear uh about how other teams are doing it and what works for their organization again no one size fits all and i but i do like this quote which is um from simon wardley that developed that framework p the pst framework uh that says like take a highly effective company and the psd framework will push it towards being more continuously adaptive right so looking at where projects are in their life cycle and then doing the best things for those projects at that stage um so yeah i that is the end of my presentation i hope me sort of combining these two themes about pioneers settlers and town planners and then engagement models i gave you some ideas about how to do your own sre team or things that you may not be doing today or things like embedding which uh maybe you're not doing as well as you'd hoped and maybe it's time to start doing a regular retro on it or something to kind of figure out uh how to do better um in addition to giving this talk i've started gainesville florida hashicorp users group check that out on meetup there's also some other resources here if you want to start your own hashicorp users group or the doc site the learn site or the discuss site if you want to post please post and talk about some of these cultural things uh in on the discuss site uh there there's lots of folks that are using hashicorp products but are also going through organizational or cultural change um so yeah thank you thank you again very much um i again invite you to write about and share your own experiences whether they're good or bad about taking reliability and thinking about it as a first-class product feature at your organization um and thinking about it uh not just as all of these activities that sre does but like what is the actual impact and outcome of them and that should be i think reliability as a product feature uh so again thank you so much um i'd love to hear more about what you're doing do a pull request to that how they sre repo earlier and um thank you again all right wow well two days just breezed right by uh it felt like a lot of time planning uh getting these speakers coordinated and and reaching out to them and working with them to get all these talks but uh it's really fascinating uh thank you so much martin for kind of walking through all of the things that are worth considering when it comes to sre i know it's something very difficult to to right it's it's difficult to get correct uh just like we talked about earlier you know with coal it helps to measure twice and only cut once especially when you're only the budget for once you can't redeploy everything we deployed for two days today yesterday and thanks to y'all we had some really really amazing sessions all of this will be on youtube we'll have playlists for this event available follow the youtube channel click you know the like and subscribe button if you haven't already we gotta say this just to make sure you can find them later because people do email us where can i find this it's on youtube follow us on twitter we'll talk about this there if you would be interested in speaking at one of these reach out to us as well we've got new themes coming up next year and we are we're bringing back old themes i think the next ones are on connectivity in the widest sense of the word uh security strong focus there on everything i am vault boundary things like that if that's of interest to you and you have a story to tell or learning to share let us know absolutely and if you have any fun use cases or anything like that that you can think of again please let us know we're always happy to kind of talk with you and and figure out the fun solutions that you've come up to interesting problems i think with that we're at the end of today so if you're still here just close the browser tab we're gone we're out of here we really appreciate you coming by uh we had a blast i'm seeing and producing and working with you a big shout out to all of the team uh for making this possible and uh again catch us on youtube we'll see you around the internet thanks everybody bye everyone
Info
Channel: HashiCorp
Views: 507
Rating: undefined out of 5
Keywords:
Id: iSHq6d38veQ
Channel Id: undefined
Length: 190min 58sec (11458 seconds)
Published: Wed Dec 08 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.