CloudInit: The Good Parts

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

[Music] so everyone there thanks Melissa sorry for anybody that came expecting a talk about vault I will make my gdpr joke about British Airways now and then I'll be done with gdpr uh-huh so just I have an idea of who the audience is here because I know this conferences like attracts a lot of different backgrounds who he runs cloud software on a day-to-day basis cool and who's familiar with cloud annette and who would call themselves an expert on cloud annette okay cool that's a great audience for this thing and so the genesis of this talk and the reason that i changed it like kind of at the last minute was the last year i did a talk has she gone for you called systemd the good parts and I got a bunch of good feedback about that and I was kind of thinking well what other tools are there where everybody has passing familiarity but no one really digs deep into them or necessarily takes the time to go and explore all of the options so here we go this is cloudy hope the good parts and it's a surprisingly long talk given the title so cool so what is cloud in it it's pretty much the de facto industry standard for early stage initialization of virtual machines in the cloud so by early stage I mean on first boot or on subsequent boots of a particular virtual machine it's used to specialized a generic operating system image that might be provided by one of the cloud vendors or one of the OS vendors it's used to specialize that at runtime to you know do whatever job the virtual machine is supposed to be doing in your infrastructure it was originally developed by canonical when ec2 was first announced and back then the blue 2 was the only thing that ran on ec2 so you know that's all we have to work on but it's now you know prevalent across every major cloud and every operating system pretty much so you can kind of find it pretty much everywhere all of the all of the Linux distributions images ship with it by default FreeBSD Smart OS everything ships with it by default these days and then furthermore not only operating systems but also across every cloud almost every cloud vendor has really good support for cloud net built into their provisioning plane so why do we want to do this why is it necessary to runtime specialized machine images if you've heard me talk about at a previous Hattie conf I often talked about image based workflows and package management as being the solution for this but that can be costly in terms of time and if you're trying to get software out quickly then taking the time to boot an instance build an image from it can take too much time it can take anywhere from five minutes to an hour depending on these or the underlying platform and if you're doing rapidly evolving software you're trying to release multiple times an hour or something like that then that just might be too much of a cost to pay it might not be but you know there are trade-offs to be made but both approaches definitely have their place and it's worth knowing about both and actually correctly constrained runtime configuration can provide many of the same benefits actually as an image based workflow just a bit quicker oh and finally I see we can still use cloud in it in the image base build process so we'll take a look at that shortly so we are look at this by example and I'm gonna go through a whole bunch of examples one of them is just you know running scripts the simplest thing that can possibly work then we're going to look at changing default user configuration installing packages suppressing some of the default behavior which might be undesirable writing arbitrary files configuring SSH keys and then we're gonna take a look at how you can find out more about what this thing can actually do which is it's kind of not discoverable so to configure cloud in it I just installed on all of those images that I listed earlier in system debase Linux it's usually a service that runs on boot before most other things when it's started with the init sub-command there are some others as well it runs a sequence of modules that specialize the machine in different ways a configuration comes from two places the provider the clouds provisioning plane suppliers a bunch of metadata about the particular machine its network configuration its disks all of that kind of stuff and then there's also user supplied configuration which is what we're going to be talking about mostly today in our examples so generally whenever you provision a virtual machine obviously everybody does that with terraform and not using the portal but if you are using the portal there's usually a thing called user data as your calls it cloud in it directly but you know every cloud has some different way of specifying a bunch of data that's going to form the configuration for cloudiness ultimately so let's look at the first example this is something so the simplest thing that can possibly work for doing anything was clapping other than what it does by default so we have a shell script and all the shell script does is echo some files F doesn't text including the date stamp into a file when it's run so if we run an instance in ec2 we can specify at the bottom this user data and then use a file reference to the file on disk so the CLI expects that to be what so the API expects that to be base64 encoded data and we'll see why shortly so not just a not just plain Tex but the CLI takes care of that for you of you if you reference it in this way that's true of most cloud vendors CL is okay so once we've run our instance we SSH into it we get this annoying message about the host keys not being trusted but we accept that on blind faith and carry on as everybody does um we get the message of the day out of it and then you know if week at the the location that we expect we can see that our scripts run and it's run this time ending in 273 which is just see you next time stamp so if we reboot the machine now actually we can see it ran as root so that's interesting so a shell scripts run as root when we start up if we reboot and wait for the machine to reboot itself reconnect then we can cat the file again but what's interesting is it hasn't run again right so this is a one time thing that runs the first time you boot and not again there are other modes that will allow scripts to run again but for the default you know if you just provide a shell script it's gonna run once when you boot so it's great for things like installing packages doing any if you're starting machines you don't need this analysis but in other clouds you might need to configure machines to act as routers or you know specific network roles and these scripts are great for configuring that kind of thing there's no need to get any more complex than that so some reminders on bash writing correct bash is hard almost no one myself included write bash which actually handle all of the error conditions properly so in lieu of doing that put that at the start of every script and that will catch most of the problems and prevent things from just blindly continuing in the face of failure shell check one always use that I'd say so so utility that will lint shell scripts and pick up almost every type of common misunderstanding in the syntax there's no reason not to use it on everything so some other reminders on bash not every OS has bash all the latest version of it Mac OS the particular offender here ships bash three so if you write your scripts on a Mac run it with the default bash and then run them Lex you're not running on even the same major version of match and actually a fallar not going to upgrade it they're going to ship seashell instead because of licensing concerns and if you're on FreeBSD for example the root shell is different from the user shell and neither of them bash bash is available so doing this isn't necessarily the most possible thing you could do but actually the reason that I script run under bash was because of the shebang at the top that told her to run under bash and actually we can provide other things as well so we could provide a Python script in that the you know they're calling Python possible is a bit a bit of a stretch I guess given that you have to decide whether your own Python 2 or 3 and then how its installed on whatever operating system you're on but for a Bunter that that works if you view reference Python 3 instead so if we do that now we can boot nuisance in the same way and then SSH into it and now we get the written with Python with that crazy date format we had to specify explicitly python 3 because yeah you have to specify the interpreter so we're still not doing anything that's particularly portable not every MIT not every image is gonna have Python installed anyway and even if it does have Python installed you have no real way of knowing whether python 3 is actually just gonna be the Python binary or you have some other crazy combination and actually have no way of installing packages either so this doesn't really solve the problem of bash being not possible so the whole bunch of things that are common that you want to do and actually clouding it has these modules already built in and already configured so what we can do is just configure them using a format called cloud config and then we don't have to write brittle bash scripts to do all this configuration hopefully clouding it does the correct thing and handles all the error cases and in most cases it appears to so here's an example of cloud config it has a you know shebang like thing at the top that identifies it as being a cloud config file I don't think that's strictly necessary but good practice so in this particular example what we're going to override some of the default configuration if you create it's a cloudy net does some work regardless of whether you can figure it or supply any user data at all whenever you start an ec2 instance and this is also true of most other clouds the image doesn't have any users created of the route and you probably know that if you start an image a different image you get a different username so ec2 user for all the CentOS derivatives boom true for a bun - I don't know what Debian is offhand but they're all different and actually cloud and it creates that user and then it pulls down the SSH keys that you've configured in the key pair configuration into the home directory for that user and the default configuration in the abun - image for lint - LTS is to call that user a blunt - and that's fine I guess if you like that most people want to specialize that to something that's either a bit more generic more generic and cross operating system if they're running more than one or you know - the company name well then we're going to create individual users so one of the things we can do is override that default using this bit of config here and because it's yeah more you just kind of let the IDE indent for you and hope that it gets it right because it's completely impossible to figure out what's a list and what's a map and what's anything else but this I tested all these and this one does work so what we're gonna do is say instead of creating user called upon - we're gonna create a user called opps and it's gonna have it's gonna be in a group called ops it's gonna use bashes at shell it's not going to be password accessible it's gonna have password 'less pseudo pseudo and gee coughs I never knew what that stood for until I was writing this slide and actually when I looked it up it stands for the General Electric combined operating system and it covers things like what room that user sits in in the General Electric building sir that's obviously useful to have on every cloud instance uh-huh so once we've once we've written a amel file we can provide it basically the same way so now instead of providing a shell script here we're providing the animal file and then you know once we rare instance we can SSH in we still get the annoying host key problem and so let's try out root instead the abloom tree user clearly doesn't exist like permission denied public key so if we SSH in as root instead we can see that as appears good we have a information leak because it's told us what the default username is which is probably not a great default but so we can see that the default user looks to have changed to ops if we SSH in as ops then get the directory entry for it we can see that the the information that we set is it's all there as that best we've worked let's talk about cow config schema it's a yellow structure which is like what's valid is not even defined really it's it's a combination of the version of cloud in it you're on and the modules that have been installed and you can install custom modules if you want so it's not very discoverable the documentation is somewhat hit or miss in case like a lot of people think I'm Australian I'm not I'm English and this is an example of English understatement the documentation is not hit and miss it's bad actually most of the information is kind of in the docs if you read it right and sort of interpret it a bit funny all of the informations in the code and generally if you're trying to write this and get all of the functionality the only way to discover is to go read the Python code there are usually you know multiple different ways of achieving any particular task that you want a feature feature we'll see in a bit and the problem with this is writing the config files is going to be an iterative process right it's very rare to write one and have it work first time ultimately what people end up doing is having this little library of snippets that do particular things that they're going to want on a regular basis and then kogo cult that throughout all of their projects and any company they work out and all that kind of thing cabinet does have some schema validation right there's we mentioned the inutes of command earlier there's another there's another command called schema so we can take our ml file here and run route in it davell schema with the comfort file it tells us pray we've got Valley Cloud config and if we have say like an extra quote on line three will tell us you we don't have valid Cloud config unfortunately it doesn't actually go beyond being a yam or validator so when we have this clearly valid config it's valid despite being clearly garbage so it's not that useful as a tool to actually validate what you're doing it's just a yam or validator and every editor has one of those anyways it took me a long time to get Cloudant installed on Mac OS to be able to run this do you have to wait basically compile it from source because no package manager has it in because who's doing cloud in their own maps and it was totally not worth while afterwards it turns out let's look at one of the other things that you might want to do commonly on startup might want to install some packages so we can write some the Amal for that and we can tell it three things well you know install docker and docker isn't in the Ubuntu eighteen package repos at least not under a name but I can find so we're going to install it from the docker registry and docker have you know they have a debian repository that ships the docker Community Edition as a package that you can just store but we need to configure that so generally what you do there is you know write out a list file and a bash script to Etsy app sources D or something like that and then apt-get update and then apt-get install the thing you want it's kind of error-prone especially if you want to verify the keys as well apps is a bit better at this but Young is terrible at it and just it will tell you it's validated for keys but actually it hasn't so they let you do it I'm like the third first day of some year that has a full moon on it as well and it will not validate here check sums and it will tell you a house but we can configure the source using this app sources section of config and then we can tell it on boot you're going to install docker seee it will do that obviously after it's updated the package sources so that you have access to everything you've configured and then finally we need to restart the machine before docker will come up I'm not sure if that's even true anymore it was true at one point so I'm gonna do it anyway so we can tell it after it's done everything else before it runs the user scripts it's gonna reboot the machine and then rather than running all of the user scripts which can be supplied alongside this at first boot like we saw earlier what we configuring it to do here is to run them on second boot instead so that we have an opportunity to pre configure some things at that point if your script needs to use docker or whatever other package you've installed is gonna be available and ready to go by then so if we run an instance with that config and we SSH into it after it's rebooted so it takes a little while it's gonna reboot twice and we states us on docker you can see it's running which is great because it's not installed in the image normally and if we run the docker CLI we'll see that's there on path for users as well which is also good so you do this for a bunch of things one of the common use cases is if your CI pipeline omits Debian packages for your software or yum packages or whatever you can install them as your machine comes up by adding an s3 bucket as a repository and then using the HTTP plug-in to make sure that you know man-in-the-middle on the way down so let's take a look at a slightly more complex example and this is going to end up demonstrating the way that there's lots of ways of doing anything and no clear best way to do any of them so let's come back to this message when we are SSH into the machine we blindly accept it on faithfully you know we're talking to to the machine that we asked to talk to and we're not talking to mi5 or the NSA or something like that in the in the middle and the only way we really can verify that yeah we can verify in future that we're talking to the same machine we were talking to but we can't really verify we're talking to the correct machine and if you're doing something like spinning up a bastion host that's gonna access say your database you probably want to know that you're talking to the correct machine now there are other ways to achieve this as well but one thing we could do is set our own host keys if we say yes to this message then we see this kind of scary thing permanently added this IP to the list of known hosts that's in the cloud world a completely useless because they spin up and down like crazy I emptied mine the other day to test that this worked and I had 140,000 entries in my own host file over not that long like two years maybe something like that so this can build up pretty heavily I also worked on a cloud provisioning plane for a long time so that's why I have something it's not that common but you know thousands is not uncommon and we're effectively accepting these at face value which is not ideal so one thing we could do if we're in Amazon is go to the instance console once we've booted and actually prints out the SSH host key fingerprints here so we could go and compare that to the hash that the thing gives us and that would be good now we can also get this via an API if we wanted so we could probably pass it and work out what the host keys are and make the entries automatically that's fine as an approach but instead what we're gonna do is set our own host keys because that would be too easy it doesn't really involve cloud in it well the cloud mean it does print this actually if you look through the system blog you can see as cloud in it runs now this is really useful for debugging things that affect your access because if you're doing things with the SSH server for example you quite frequently lock yourself out of the machine the other downside of doing this is certainly in Amazon it takes a couple of minutes after the machine is booted for this log to actually appear so you end up with pretty big cycle time on it so let's look at setting our own SSH keys there's my built-in way of doing this I'm putting the star there because it turns out hash there is but it just wasn't called the right thing or anything sensible so I found it after I did it the hard way so we'll do it the hard way and then we'll look at the easy way so we need to build ourselves so breaking this down we'll need to do a couple of different things right we need to generate some known host keys and get them to the virtual machine we'll need to move the keys into NC SSH before the server starts right otherwise it will complain that it doesn't have any or worse couldn't generate some of his own so to do that we need to know we're in the cloud init process we can hook to get all this logic in place so clavinet runs in three phases as you boot a machine there's an init phase which is before the SSH server comes up there's a config phase which is in theory supposed to be stuff that doesn't affect boot but actually everything affects boot so the config phase Kay useless and actually they're only really two phases and the final one is configuration that you want to run after everything else is run so this is user scripts that might want to use install packages or installed configurations from the from unit and config stages the configuration as to what runs in each phase lives in at least on one - lives in Etsy cloud cloud CFG and it's obviously yama because why wouldn't it be one of the pieces of configuration in there is what module runs in which phase so if we get a look at this we can see I've snipped it around a bit to make it fit on the slide but we can see the modules that are going to run in the init phase and one of them is this thing called write files which looks useful since we need to get well the tasks we identified was writing some files into SC SSH and the other one if we get a look at the docs for that writes arbitrary content to files optionally specifying permissions which is good because we need to do that - this is the doc format is in case you can it's all generated from the Python source code so it's also Fink's with a slightly nicer stylesheet and strings normally has and then finally there's this SSH module down at the bottom which configures the SSH daemon before it starts so if we write files before we do that we should be good with our desired ordering and actually we can verify that clavinet runs before SSH so if you look at line 12 there you can see it's before the SSH daemon so we're good to do that then some more about write files the file needs to be provided embedded in the amyl config which is annoying if we want to provide them from a remote source we could verify the checksums download them from s3 or something like that and for each file we can set all of these different things so here's an example you see the top one is has some base64 encoded text right ses config selinux with I don't know rewrite everybody else read and the others are all configured in various different ways so it's actually generate the case and one way we could do this is using the SSH key gem program that's built into most UNIX lake's that's kind of fine what if you're on Windows though does anybody deploy from Windows Wow no one ok ok and in that case you can just do this but Windows generally does not have SSH key Chen and doing it any other way like there are a bunch of tools all of them are awful like what the hell is this the random are I will give like a hundred pounds to anybody that can explain to me the purpose of this random are and why it's supposed to use it like I'll be around for a few minutes afterwards but instead what we can do is use terraform for this so I used to work on terraform and actually just before I dig into this late take a minute to congratulate the terraform team on zero twelve which is a fantastic release and the most enormous lift i've ever seen in an open-source project while retaining compatibility so great work for everybody involved so I think this would not have been possible in zero 11 in such an easy way so this is all 0 12 config so we can generate some SSH keys using the TLS provider turns out private key is a private key regardless of what format you put it in so we can generate one for RSA and one for ECDSA which is the two commonly used key formats using the private key resource and then one of the options on that resource one of the attributes is the OpenSSH for massive public key this is a slightly more portable way of generating these keys if we apply that there we go we have the keys formatting kind of us we expect at the bottom so that's good the next thing we need to do is get it into the right files configuration line so we have a terraform up here and we have a yell template that's going to set filename the path the permissions for the public and private key for the host of each key type so we need to somehow get the private key bits into some structure that we can provide to this template that looks that has the bits in the right place so we can do that using this which is it took me a few minutes to figure out but actually pretty nice for each private key we can provide a keys object in the keys parameter and pull out the public key the private key and the algorithm name and that will give us everything that we need to do this so if we apply that then we get a right files thing in the correct format which is nice so we can use that directly as the argument to user data when we go and provision a machine we need some additional configuration though so by default yeah we right this took me a long time to figure out by the way you write the keys and then Claven at least some again because that's obviously why wouldn't it do that and then once it's deleted the more it recreates new ones so it just looks like it's not working at all if you set SSA to delete keys defaults then it won't do that and furthermore won't try and delete them in the first place but now we've got like three different things that we want to get up there right we want to get the user comfort we want to get the additional config for cloud in it for the ssh initialization and we want to get our write files thing up there so we need some way of providing more than one file as user data and the way cloud annette deals with this is somewhat baroque but works its multi-part encoded effectively mime encoded emailing coded files I wrote a documentation about a post about this the last time I went looking for this information and if you look in the address bar that was 2015 it basically hasn't changed since then and this is still the only documentation about how it works I found myself reading it the other day thinking who wrote this and then realize that it was me so terraform can generate this stuff they can generate the multi-part bits because you don't worry that yourself so there's a data source called cloud init config and you can give it each part of the config you want and give it the content type so if its clout config you you give it that if it's a shell script you call it you know shell script or something equivalent and we can render our right files config directly into the into the template and that will generate that has a property only called rendered when when the datasource runs so we can provide that as the user data to an AWS instance and then we can do two more things we can provide the public IP of the machine and we can format the known host entries that we need to put into a file so that we can connect into the instance so let's run this we apply these our output so we have the public IP of the machine and then we have two entries that we could go and put into a known host file truncated size and make them fit so then yeah we can try and SSH into the public IP and we can tell well we don't trust this fingerprint so doesn't they were a better off but if we do that I forget what the set is for now but it isn't necessary oh it removes the new lines off the end of the the case that's right if we output that into a non host file and then we try and ssh instead bear in mind we didn't accept the public key into a non host file from ssh directly then we trust that case so that's good it's nice to go from not being able to trust the machine we were talking to to being able to trust it via by this config so about those dogs if you go read it in a funny slightly funny way when people talk about SSH keys like they mean use it keys right right no and I probably should have worked that out because the hint is public and private key and obviously for a user you don't do that so it turns out actually there is a built-in module for this and instead of instead of the template and we rendered earlier we can just render a template directly which has the key type and then private and public under SSH keys and it will do all that for us so we don't need the extra config and we don't need to write the files directly ourselves and know about the paths and actually this is a bit more portable because not every operating system keeps its host keys in Etsy SSH so if we run this you know we basically do the same thing we only need one of the clavinet parts other than the template now so we can run this and we ask to get the same result as before where we can SSH in once we've taken the known hosts entries the second minute to talk about debugging topic I became intimately familiar earth while there is looking at that previous example this is the debugging experience there is a log file that emits literally everything it does this is line 600 and we're not even an SSH service yet it is detailed and it's exhaustive and actually if you need to find out what it's done like down to the number of bytes it's written for things then this is the place to go look and that's basically it so the debugging experience not great but it's not nothing which is nice so some other use cases yeah I went into three because we're somewhat time constrained but here's a bunch of other stuff that you could go and do and actually sometimes instances will do bits of this for you so if you attach a volume to an instance or virtual machine and another cloud then the operating system doesn't know what to do with that right it's kind of by luck for the ends up with with a file system and that the file system is sized to fit the disk and you can change the size of the root volume as well right and that's clearly not in the image so actually clouding is responsible for working out what disks are attached from the cloud metadata and then creating a file system of an appropriate size on each each volume and then mounting them however you configure so you can figure a bunch of that stuff you can configure like down to the partition layout four sectors in the in each drive which is useful we actually looked at installing docker you can one of the things that you can do is pull down chef configuration and just run it in solar mode when you when you boot machine which is quite a nice way of writing the config you know in the whole service of way if you prefer that if you use SSM parameter store in Amazon the tree function is 95% done and there's no way of specifying where in a tree a machine should look so maybe you could specify that through cloud config you write it in a file and then whenever you go read from the SSM parameter store you can you prepend that to the tree and then you're looking in the right section for the environment that you're in I've used in the past to join those surf clusters hey there's a whole bunch of stuff I think you could do so in summary there's a huge amount functionality available that very people will ever gone bother digging into beyond like sticking a shell scripting user data it's not very discoverable which is a shame because there's a lot of a lot of good bits in there if you I discover the entire range of options the only thing you can do is go read the source because the docs don't contain it in 99% of cases you can just copy and paste the code from somewhere else like Stack Overflow or something like that and it kind of works just make sure you know opening up random ports without any security on the core everything like that but it's generic enough and common enough that is worth knowing at least the basics of this so that if you don't want an image based workflow or image before you just doing something quickly then maybe this is actually a good alternative it's also worth learning if you worked with someone that liked cloud in it and you have to be responsible for maintaining whatever mess they made so another motivating reason so with that I think we have like two minutes literally for any questions that have come up if not I have to Ralph fairly quickly after my talk could have to take the train back to London but I'll be around for at least the next hour or two if you have any more questions with that thanks for listening [Applause]

Info

Channel: HashiCorp

Views: 11,283

Rating: undefined out of 5

Keywords: CloudInit, Linux, HashiCorp, HashiCorp Vault, Vault, Cloud Computing

Id: 2_m6EUo6VOI

Channel Id: undefined

Length: 33min 17sec (1997 seconds)

Published: Fri Aug 16 2019