Terraform with Multiple State Files

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hey everyone mike here at the devops lounge now you might be wondering why i'm now called the devops lounge when in fact it was previously called the cloud coach because the channel is going to be getting re-branded to the devops lounge in order to meet up to match the discord server that i've set up which was originally also the cloud coach and i'm actually rebranding so that i can create more of a open open community that's more about the community and less about me as an individual so you'll see a lot of rebranding coming up soon and we'll be switching over to the devops lounge so what's today's video about well i'm going to talk today about taking a terraform state and splitting it up into multiple states and there's a good reason why we'd want to do this the first reason is speed of execution so we want to take our terraform save files which can be extremely large in fact i've cr i currently work for a client who has one environment and a very basic version of that environment has over 300 resources in order to build subnets and route tables security security groups transit gateways transit gateway attachments and lots of little tiny components that make the state file relatively big and it's a very good practice for multiple reasons which i'll go into to actually split that stave file up into several states that are isolated from each other so that means that you would have code in one location that produces one state file and that code could handle the networking side of things and then you have another bit of code with another state file that produces another state file for you separate from the other state file and that could be the compute side of things so that could be all your ec2 instances your your albs it could be anything to do with the actual compute layer and then you could have a third state that produces a third bit of code it produces a third state that could be for your your data your data your databases your location layers your cloud front and lots of other different things that you could that you can do but the idea is that you want to take these large state files and split them up because if you have a state file like i just said with hundreds and hundreds of resources in it every time you want to change a little tiny part of it for example you might want to add in a security group or change an existing security group the whole terraphone's gonna run the whole thing it's gonna it's gonna go to over to aws or google or both and it's got to work out the current state of everything within the provider up in the cloud provider and then it's going to it's going to match them to what's in the save file and then build the data the directed a select graph in order to work out well what's changed just what's changing the code and that takes longer if you split it up so let's say we've got 300 resources and networking makes up say 100 of them and then the compute takes up 50 and then the data takes up 50 and then another 100 is for security ion policies you could have four states there and that means that if you just need to change network inside of things then what you can do is just go to the networking code make that small little change and then telephone only has to refresh and understand the current state for 100 resources not 300 and so it executes faster another benefit that you get from splitting up the state files is security so if you imagine within most organizations we have networking teams very large organizations especially we have a networking team we have we have a database team we have you know dbas we have developers we have operations and so we usually have cyber security team as well so what you can do with state files is when you've split them up the state files when the answer split up are controlled by separate pieces of code if you put those and you should put those separate pieces of code into their own repository then you can control access to those repositories individually so anything to do with your i ion policies in aws for example or your active directory policies or your group policies within azure ad you can say that only senior engineers can have access to those repositories because they're to do with security and even then they only have read-only access and actually the only people that can make changes are the cyber security team and you can do things like code reviews and pull requests to ensure that any code that's being written is is meeting company expectations at the security level and then the same applies for the database layer as well you can say only the dba team can make writing changes to that repository or you could actually say no one can access a repository except the dbas and no one can access the network inside of it except the network administrators it all depends how strict your policies are within your organization so splitting up our code splitting up our state files also means we split up our code as well and that gets us out of just that it gets us faster runs it gets us that separation of concerns which is actually a programming concept of keeping code just doing the one job that it needs to do and we can do that with our terraform code as well we can split it up so that it's we get that separation of concerns with concerns we get our security code we get our networking code data code and so on and so forth so they're the benefits of actually taking the terraform state file and splitting it up so that's that's why you would split up a terrifying state file but obviously we got the question of well how do you do it how do you actually go about splitting up your terraform slave file so what we're going to do now is we're going to look at a bit of example code that i have here on the screen as you can see it's pixelated behind me but if i switch over to coding mode you can see here that i've got a vs code running and i've got some code in here already so we're going to take a look at this right now so we look on the left here we can see we've got application networking and prerequisites so we'll just have a look at the prerequisites because this is very very simple we're going to need an s3 bucket to store our individual state files and so they're going to be stored remotely so all this actually does is just has a simple aws provider and then it just creates an s3 bucket that's all it does it's very very simple so if we have a look at the state file here we can see that it exists on disk so we're not actually storing this one remotely and that's just because this is just an example so this is kind of a chicken and egg situation we can't store our state in s3 in a bucket that doesn't exist yet so this prerequisite stay here on the left just creates our s3 bucket for us for the other two and that's all we really care about at this point if we head up here or we go up a level we go to our networking we can see here that we have our providers we've got our aws provider and now we're actually configuring the networking code to use this s3 bucket that the prerequisites actually creates for us okay and we're creating the state file called networking.tf so it's going to go in that bucket at the very top of the very root of that bucket it's going to create an object called networking.tf and we're going to be doing that in sydney because that's where i live in australia i live in brisbane australia but sydney is my closest aws region so that's the network inside of it and then we've just got a very very simple application side of it so we're in the application tier as well we've got our application provider and again we're just configuring the exact same terraform back end to use s3 we're going to the application dot tf key this time and again we're in sydney so same bucket just a different file name for the for the state so that means that the application state will be separate from the networking state which will also be separate from the prerequisite state and so straight away we've already got these two separate states stored in s3 which means that when that means that we have to run multiple terraform commands terraform minutes and terraform applies in order to get these things in place now the prerequisite one actually already exists so what we're going to do is we're going to look at what we have here on disk so there's exactly what you've just seen there in vs code we've got our application we've got our network in we've got our prerequisites so before we create the the application we have to create the networking first and so we can demonstrate that so we come in here and we do our our terraform in it in order to get our plugins and configure our back end and then we do our apply we should see that it comes back with an error so it comes back saying that unfortunately there is no state in that back end it doesn't exist at this point in time because the application state relies on that networking state to exist and i'll demonstrate that i'll demonstrate why in just a second so if we go back one so that just proves that the networking has to exist first so if we go back if we pop into here and do a terraform apply we don't need to do an in it because the code is actually ready to go so we do our terraform applying here and it's going to add nine things and it's a very very simple architecture and the actual architecture isn't really relevant to this video but we just have a vpc we just have uh three three subnets we have our route tables associations we have an internet gateway and so on and so forth so it's really really simple and actually none of that really actually matters that much so i'm gonna go ahead and say yes here i want you to create this so we're going to see those resources getting created and there we go now we've created a new state here we can see that we have outputs coming directly from the very root of that code let's have a look at them so we come over to here to networking and we come to these output over here we can see that we have environment name we can see we have vpcid subnet aza and we're actually getting these resources and we're actually outputting them now this is actually really important so what we're actually going to be doing here is we've created our networking infrastructure which is going to be stored inside of a state file which is going to be pushed up to s3 okay it's going to be stored up here in s3 what we're going to do from the application side of things i just saw that error before when we attempted to create the application code what we're actually going to be doing is we're going to be grabbing that networking state file from s3 and we're going to be pulling data from it and the way we do that is we will we rely on these outputs we need these outputs so what you have to do in order to make the remote state file accessible by other state files is you have to have outputs at the very root here you can see here it's at the very root it's not inside of a sub module it's not inside of any other module it's at the very root of this state here so that means just as we can see in the terminal here that these outputs are available now to anything else that can access that state file so if we summarize we want to create individual state files because we want that separation of concerns we want to split up our large infrastructure into lots of little manageable pieces but how do we make if we've got it all split up if previously we had a big monolith didn't we we had a big huge amount of code and everything can reference each other but if we split it up into a lot of little pieces how do we reference the vpcid from the application state if there's a hard barrier between them that's what these outputs are about they're over that way that's what these outputs are about these outputs allow anything that can access that state they're allowed to pull the state down access those values and then use them as if they were just like local resources so let's take a look at that now now that we've got our network instead of let's take a look at the application side of things so here in the application providers file which we looked at before we just have our aws region we have our terraform backend configured to use our application.tf but we want to access the networking.tf so how do we do that well in this data.tf file here we've got terraform remote state we've got a data block that accesses the type toe from remote state we're going to call it networking we're going to say that the back end is s3 because it is right because we got this back end here remember we configured s3 we did the same for the networking module so our back end is going to be s3 and we're going to configure it with the bucket name the exact same bucket and if you remember the key was networking.tf so that's the key we're going to access and this is the region where that bucket is located so it's going to go to if you think about the path here it's going to go to s3 slash and then networking networking.tf so that's actually what that's actually what it looks like the s3 path that it's going to go in and fetch it from but we don't it doesn't expect to see the configuration in that manner so we'll just undo those changes so the application state is going to go and fetch the networking state but then how do we use it if we look here in the ec2.tf file we'll get an idea of how we take the outputs that we have from the networking state and how we actually bring them into our application state in the code so as i said the architecture is actually really simple all we're actually doing here is we're fetching a ubuntu ami and then we're just creating two ec2 instances it's not really that difficult at all now it's the subnet ids that we're actually going to be dragging in from the networking state so like i said very simple example not complicated at all and you can see here on line 21 you can see that we've got this network id and what we're doing is we're using that data block that we defined of type terraform remote state we gave it the name networking and then we're fetching the outputs remember those outputs that we created earlier in the networking state we said that in order to access information inside of that remote stage we have to have those outputs at the root level well now we're accessing them this is the key part of it this is where we're accessing them and inside of our ec aws ec2 instance here we're saying that we want to access the outputs of the networking remote state and we want to access subnet underscore az underscore a you can see that just above my head there and if i scroll down you can see b here as well so we're creating a second instance and we're also pulling in that value from the remote state called networking we're grabbing an output called subnet azb and so that means that the application state file then becomes dependent on outputs from the networking state file but once you've got that remote state once you've actually pulled that remote networking state in from the s3 bucket you can then access the outputs and so that allows you to create that separation of concerns without losing the ability to reference things in the different state files so if we actually go ahead now and we come back out we come back into our application side of things and we do our terraform applying here terraform is going to use that data block it's going to fetch that state from s3 and as you can see we don't have on our error and now we can see that it's actually going to go ahead and add two resources just two easy two instances this is a very very simple example and as we can see look at the subnet id seven let's remember that e seven eight e that's the subnet id for our web azb instance and then if we look at here we've got a second different subnet id here for web aza now if we go up to those outputs there's e seven eight e right here there's a z b and there's subnet a z a so you can see there that it's getting those references from the set from the other state it's getting them from the other state file which is great so if we go ahead and say yes here that'll now create those instances in those subnets using the values that are pulled from the other state so that state file has now gone ahead and been created so that's now been installed up in s3 and we can see we've got our two e32 instances here they've been created they will have been created in the subnets that were created in the networking state so if we if we have a look at the code so if i come out a directory and i just look at that tree structure again we can see here that we've got little tiny small code bases that manage separate things for us but if you imagine this on a much larger scale this makes it much easier to first of all identify where you need to be what does i need to fix well i need to rename the instances inside of the dev environments well i need i know i need to go to the dev environment and they're going to be instances so they're going to be they're going to be here at the application level so i'll go into the application folder and then i know they're easy to instances i'll go into ec2.tf and then when i eventually make the change and push it when the ci pipeline runs because obviously you're using ci cd right it will do the terraform apply and of course we'll get a much quicker much faster execution and that is one of the key reasons why we want to split up our terraform state files now of course this kind of work obviously is going to introduce some levels of complexity and i'll show you why let's go to the networking state and let's destroy everything so we're going to do a terraform destroy we're going to instruct terraform that we want to delete everything in here what do you think might happen i'm going to hit yes it's going to destroy the 9 resources that we created earlier so i'm going to hit yes what do you think might happen well i'll tell you first of all it's going to successfully delete a lot of these resources as we can see here it's actually got rid of the route table associations it's got rid of the internet access route table itself it's got rid of aza so it's got rid of subnet aza we can see here it's deleted successfully the internet gateway the primary internet gateway that we had set up but what it hasn't destroyed so far is the vpc and it's still currently destroying subnet aza and azb now do you reckon that that might timeout yeah yet it will timeout the reason being is because those resources are in use by the application state because the applications they created easy to instances in aza and azb and there are subnets so has now issued a command to the it with this api to say delete these subnets and obviously the api is going to come back and say you can't do that because there are resources relying on those subnets so how do we resolve that well we built it in three-stage process didn't we we did the prerequisites first so we got our s3 bucket then we did the networking so we got the foundations we needed and then we did the application which then plopped two easy two instances on top of those foundations and so when we come to decommission an environment we essentially go backwards so we've got to get rid of the application layer first then we get rid of the foundations the networking layer first and then we get rid of the prerequisites if you want to delete that s3 bucket which is unlikely you tend to want to keep that stuff there all right so let's look at actually unfolding all of this then so let's actually go back into the application side of things and let's do our terraform destroy so let's actually demonstrate decommissioning this infrastructure and going backwards so it's obviously a very small simple bit of infrastructure so it's going to be quite quick here on screen so we're just going to delete these two ec2 instances that shouldn't take too long and once that's done when we go back to the networking side of it we'll see that we don't actually have nine sets of resources anymore because we did actually successfully destroy a few of them what we're going to actually see is we're going to have about three to to destroy we're going to have the virtual private cloud the vpc to destroy and we're going to have those two subnets which will now be available to destroy once these two ec2 instances have been marked as destroyed because of course they're not relying on the subnet anymore so that means that the terraform can actually release those resources so that's now finished so if we go back now to our networking and we do our terraform destroy again we'll see that it'll do a refresh state again of course it'll go and look at the uh the remote state of these things and we can see we've now got three to destroy if we hit yes on this we should see it destroy the vpc and our two subnets and there we go so we've now decommissioned that entire environment and of course the state files are actually still going to be present in s3 they're just going to be empty they're not going to contain anything and so that's what it looks like to take a terraform state file and break it up now before we go i just wanted to demonstrate just very briefly what this actually looks like visually so i've just got a very simple diagram here which will get a little complicated once we start seeing lines all over the place but i think hopefully it should it should help so what we've got here we've got our prerequisite state that we talked about earlier and what that's going to create for us is an s3 bucket up in aws now what i said is that i'm going to store the tf state file i just store that for this one in the git repository and that just secure that git repository if we move over to the next stage we introduced our we introduced our networking environment didn't we so we brought on networking statin so this arrow clearly shows the actual dependency of what direction things go in and so we've got our networking state and our networking state's going to create vpcs subnets route tables and internet gateway it's going to build those things for us inside of aws but here's the difference it's going to store its state file inside of a reference to this s3 bucket that's why the lines are dashed just to say this isn't actually an s3 bucket it's this reference to this s3 bucket up here yeah so we're creating this reference here to this s3 bucket so we don't want you to store create yesterday bucket and store it in here use this background uh let's have a look what we're looking for here we want to use this don't we we want to use this remote state back end configuration in s3 that was what i was just trying to say there we've got our bucket we've got our key etcetera so that's what this is that's this is referencing referencing this s3 bucket here so what's the final stage the final stage is obviously our networking stack here so we've got our networking stay file here which is producing our ec2 instance and what we're doing here is two things actually so first of all we're also going to store the applications also going to store its state inside of a reference to that that same s3 bucket up here but we're going to also be pulling as we can see here we've got a line that draws between this s3 bucket here and this data block okay so we're looking at these three things here that's because what we're doing with the application state we're actually pulling the networking state out of the s3 bucket in a data block and we're actually going to reference some of it here inside of this inside of this easy to instance here we're going to create a reference from the networking stack here and then in this case it was the subnet ids that we needed and then of course once that's done we've now got a dependency on the networking state which is why when we can't try to delete the network and stay we got that timeout because it couldn't destroy the subnets because the ec2 instances already existed inside of the subnet so what we do here is we reverse this we destroy the application state then we're left with just the network and then we destroyed the networking and then we're back to square one again so that's the decommissioning process going backwards so hopefully that illustration was useful and hopefully that helps visualize the actual flow of what you need to do and so just to summarize very quickly you essentially we're going to be creating our stay file inside of an s3 bucket we've got it configured here there's the bucket name there's the name of the key so on and so forth and then in another state file in another code base we're going to go and retrieve that state file from the s3 bucket and then here in the ec2 we're then going to reference the subnet id here data dot terraform remote state so we've got a data block it's the terraform remote state that's the type we called it networking we will access the outputs and we want to access subnet aza so that there is the code equivalent to that entire graph that we saw here with everything in place that's that reference here going into the ec2 instance is essentially this code here that's that code right there and that's how we access the outputs from the remote state hopefully that's clear hopefully that's simple if you've got any questions then we have the devops lounge discord i'll put a link in the description and i'll put a little link here on there on the video as well i hope that's been really useful for you i really hope that you join us on the discord be great to hear from you if you've got any questions any advice that you'd like to give that'll be really really great and of course feel free to just join if you just want to chat with other professionals don't forget to like and subscribe to the channel because that tells me that you're interested in seeing more of this content and feel free also to join a discord and tell me what type of content that you'd like to see going forward okay thanks very much all the best enjoy your day

Info

Channel: Michael Crilly

Views: 4,420

Rating: 4.9775281 out of 5

Keywords: devops, thecloudcoach, training, devsecops, terraform, infrastructure as code, iac

Id: m8ZCmWokrgk

Channel Id: undefined

Length: 26min 14sec (1574 seconds)

Published: Sun Nov 08 2020