cloud-init: Building clouds one Linux box at a time

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
all right it's 11 o'clock go ahead and get started so hello hope you had a good kind of free day and enjoyed Montreal my name is Josh powers I work at canonical on the abused server team and as a part of the abuse of server team we maintain cloud on it and so what I'm here today to talk to you about is of course caught in it and so before we begin I'd like to do is kind of talk show of hands if you have just heard of cloud and maybe you've never even heard of cloud in it before and you've never used it or you don't really know how to use it which you throw up a hand okay all right good if you know something about cloud in it like you write a cloud configure you know what user data is my raise your hand okay and then if your contributor if you've contributed to cloud net all right one person two people there we go so my presentation hopefully will be helpful for everybody in the room first half of it I'm gonna spend talking about what cloud and it is how it works how you use it and in the second part I'll talk more about kind of recent developments we have in clad in it and things that are coming in the future so cloud in it cloud instance initialization software right you can see we're getting it from the point of cloud in it is in attempts to automate mastering and image without rebooting all right so if you think about all the times that you've set up a system you launch it you apt-get update Apted upgrade you saw a bunch of packages maybe you start mounting things you have to reboot to get things that mount properly maybe or you want to set up networking and you get your networking all messed up it's a reboot again the point of clouding is to try to do that all in one shot right customizing an image our own kind of marketing material right let's say giving it a personality and doing it all in one shot so what does it look like inside of a cloud but before that I just wanted to point out even Debian zone cloud images use cloud in it right it is used all over the place and will should see more examples of this here in a second so what does it look like in terms of a cloud right what does the cloud it's somebody else's computer but what's actually going on when you're launching an instance if you the happy user you generally when you go to a cloud instance you choose your image or your ami if you're on AWS and you have this option to do user data if you even seen that before and between those two things is your inputs into creating an instance alright every instance in the major clouds runs cloud in it and it takes eventually all four of these things and produces your instance for you so images we know lots about those right those are your operating system whether it's stretch or Jessie with a set of predefined packages hopefully it's small so it's fast and it's easy to move around user data is what you're giving it in is it's your inputs into customizing that image all right and then metadata and vendor data come from the cloud host and we'll talk about those a little bit more in a second but the beauty of cloud in it is that it's used across various cloud hosts and it's used across all sorts of cloud images right so no matter where you're launching images whether it be AWS Asher or any other kind of cloud or even using something like Lex T which I'll demonstrate here in a bit you have the same kind of experience with using cloud in it and you can still use the same kind of cloud config across those hosts or even across devices this is a list of just a number of them not all of them obviously but where cotton it is used right like I said already all the major clouds use it Alexa T uses it mass uses it so again across all the cloud cloud platforms it's heavily used and so it is kind of a de-facto standard in terms of getting at that cloud instance up and running so user data let's talk about that that comes from you and then I think I have here we go here's an actual example of one type of example of user data so as I said it comes from you it's your way of customizing a image there there are many different formats that user data can come in all right probably the most simple way would be a shell script write something that starts with the shebang and you go and you customize your your instance and you can do whatever you want and so you can pass it'll hold bash script that would say do this go do these run these commands mount these things you can do it all for you another popular way of doing this is to pad passing and cloud config and so cloud in it has these different modules and this is just a gamble right just standard animal syntax the syntax of saying I want to set my time zone to Toronto right we're on these coasts right now I want its hostname to be Deb Kampf right and and basically it'll go through these keys and values and based off these keys go do the operation for you it it will know how to do that so on a debian host it will know what command to type to set the timezone to America Toronto or will know how to set the hostname and we'll do it for you rather than you having to go figure out what all the commands are yourself so if you actually go through this example why I mentioned timezone hostname we have the ability to import SSH keys right in this case we're importing my SSH key from github directly we can do the same thing with launchpad or you can even provide it your public key right in this CMO and it will import it in there for you in this case I'm setting up a customized apps repository pointing at a PPA I'm installing a couple packages and the bottom there I'm justifying that I want to do an update I want to do an upgrade to the system as well so I can have a fully running system like I said without doing a reboot this is a very simple example you can even go into doing ok like I loaded to earlier setting up your disks mounting them formatting them getting them set up in a specific manner that you want so if like I'm Azure if you're adding multiple disks and you want to have one running you know ext4 one running ext3 you want to partition this way the cloud can fake and handle that sort of detail as well as networking can go into setting up bridges setting up VLANs etc quickly skipped over metadata and vendor data but this is an example of metadata taken from AWS a while back effectively metadata is data that comes from the cloud host itself ok the cloud provider is providing information to the instance about saying what it is so you see here right there's an ami ID right the idea of the image itself a host name for it the general things that you would see from an AWS instance if you're familiar with using it right the size of its IP address what zone we're putting it in and things like that that's metadata again from the cloud provider and then going back and kind of looping back around there's vendor data vendor data is exactly the same thing as user data it's just coming from the vendor itself so then you start to wonder well what about precedence who who who wins user data always wins so why would you want to use vendor data well if a vendor wants to make sure that everyone's NTP is set up rather than having to make sure the users deal with it the vendor can always say hey so the time zone we know are our data center is in the east Eastern time zone always set it to these mean time zone or maybe the users have it already set up in the preferences you can go grab that preference and stick it in there for them so they don't have to worry about it in the first place mass metal as a service uses this again kind of as alluding to with NTP uses it to set up the time zone as an example of how vendor data can be used so we're gonna try doing a demo so what I have here is some user data it's kind of an abbreviation of what I just showed you on that previous slide actually you can think it's exactly the same so if I'm setting the password in this case I'm setting the password or the default user making sure it doesn't expire and so what we can do is maybe you don't want to go try cloud in it on an amazon instance because you're out of credit so you don't want to pay for any money you can't you do this with KVM alright and so what I'm gonna demonstrate we're here is using basically the Debian stretch cloud image there is a command where am I going called cloud locale D s basically what you're doing is you're going to generate a local data source for cloud in it to go look at and the data source is what's going to tell it the user data and the metadata and so C the command right there and then I showed you the user data and then for the metadata I just have a simple instance ID giving fake instanceid value all right so if I generate this disk which will be used as the data source that's all it prints out it created a seed image file system and then to boot it what I'm going to do oops respectively just run cumulus 'tom with the image attached to it and then the seed image also attach to it so we can get that booting all right so stretches booting up we'll start to see some output from cloud in it kind of scream by here and now you see it's already starting to do a apps update luckily this is a lot faster than the hotel Wi-Fi so it's not going to take us long but again it's read that user data the user data said do do an update now as I started doing upgrade because there were some packages that need to be upgraded from this image it will go grab those those and then it'll finish doing the rest of the user data while this is real I'm gonna let this run I only have one console maybe oh let me in so if you remember from the user date I set the password and it accepted the password we can see the host name from the command lines been updated based off a date we see that we're in the Eastern time zone but it keeps spitting stuff out while it's doing that invar run cloud on it is where these files are kept and so there is a so now you can see it installing a CH top entry and it just generated SSH keys and you can see the last comment is clouding it finished and it took 83 seconds to do an update upgrade and install some packages and be complete what I wanted to show you sorry was this in var var run cloud and net there are two kind of important files one is status oops it's basically a JSON to suggest a JSON file and what it lists is current obviously the current status with cloud in it and so if you have a script of cloud in it that takes you know many minutes where your mounting lots of images you're formatting them you're sawing lots of packages this is a way of checking on where is kind of knit at right now there are four different stages and I'll talk about what each of those mean here in a second but in it and it local you can see when they start and finish and at the bottom it will say what stage you're on at the current time and so a later demo I'll have this printing out repeatedly so you can watch it kind of go through the different stages the second file is important is results result gets printed out at the very end let me do this when once cotton is completely done and any errors that you run across will be printed out in in this file and so in this case rather than having an empty array errors actually ran across something a cessation port ID failed and that's because the Debian image doesn't actually have that's the stage import ID by default so I command failed so this is one place to come look for things another important area is invar log there are two cloud init files there the first there's cloud in that output and this just prints out the generic output that cloud and it produces you'll see all the SSH keys that are generated in this case if there's any other kind of errors as well as you know here's here's all that stuff that was getting printed button in the terminal right all the apt update napped upgrades and this one we see running module as his Asian port ID and it failed but it doesn't give a whole lot of detail as far as debugging information just what ran and went all right the other file cloud on it dot log is the actual debug the debugging log and then here's where you'll find each module printing out when it starts when it ends results if there any stack traces this is where it will be so more for your awareness that those two files exist so we now have a debian instance it's been customized with my clip with my metadata and I didn't have to pay anybody's cloud to try this out oops second demo I wanted to show was with Lex D if you and we see the Lex T presentation by Stefan yesterday all right so it looks these really cool way to quickly get system containers up and running and we've caught in it right again if you want to have things go quickly and be able to go through some iterate through maybe a cloud config or some user data without going to the cloud you can do that with Lex T what I can do is I have a little script here there's a little set up it's going to create I'm gonna use in a boom to image real quick because this has cotton had already installed and ready to go what it basically just did was it took my user data that I had there and injected it into the configuration for this container alright so there's a there's Alex D config util our config option for user that user - data and again and you can dump in a shell script you can dump it in a cloud config here and if cloud in it is installed it will recognize that oh I'm running Lex d I'm using the Lex D data source let me pull the user data from there and let me go run with that right and just kind of show you the command it's really simple like C config set the name of your container usually that user data and then it's pass in that user data if we started up well this is going to do is going to start the container and it's going to go and you remember that JSON file that I had with status okay now we're in the modules config stage we've already passed in it and it local every few seconds this should update once status changes and so we can watch how it goes through the different modules so what are these different modules an it is the first one and what it's doing is as soon as slash comes up readwrite it's going to try to block the boot all right what we're doing there is trying to set up any networking so if you pass any networking in to as your as your user data it's going to try to set up any of that and then again like I said that could be VLANs bridges or could just be a standard you know e0 DHCP or whatever it's called then once that is up and networking is up that's the second phase gets kicked in and after that we're gonna go check if see if you have any network data sources we need to get to that's particularly important on the actual clouds if you're running there and you're going to set up any block devices or file systems right so again very early in boot we'd have at least networking going we haven't really set up or configured any other part of the system but we are again trying to get the system set up without having to reboot over and over and over again right one one one boot let's configure networking storage and all your other configuration items that you're interested in modules and final are the last two stages and you saw that modules config is actually where a lot of the time was spent now it's because there was a there was a bit of an update going on that's where Apps was trying to go out to the network so most of the modules are actually run and in the final module final is kind of like an RC local right it's the end of the line for for cloud and net any final things that need to happen are going to occur there and in this case you see it also printed out the results JSON in this case there's no errors so we have just an empty array so our Alexa T is set up everything's working and cloud and it did this job so again this was these are the commands that I used to basically generate in the first example with KVM create a seed data source and boot it using KVM so really again it kind of a cool way of testing things out and then this was the example I use with like C containers and then I just briefly over went over this again this is the boot sequence for cloud on it starting with an it local for networking blocking boot and it doing storage modules config mode of the storage happens there and then modules final any kind of final things that need to occur I point this out because in the the versions of the sorry it's not the versions the words that are in brackets that's what the log will show so if you're going through a log and you're seeing modules can figure that one you know you're at that third stage service all right so another demo one of the things that people have concerns with obviously is boot time right everyone doesn't want to sit there and wait right with advancements in hardware and and you know very very fast storage now the last thing you want is be sitting there waiting for your system to boot well if you pass a very large cloud config to cloud Annette its kind of interested to see where is it spending its time and so one of the advancements over the last year that one of the cloud Annette developers Ryan Harper worked on is this thing called basically cotton it analyzed and the goal this was kind of just show a very similar to systemd analyze or systemd blame to get an idea of where is cotton it spending its time where in the modules is it spending its time doing a boot so this is an example output of it if you've ever seen systemd analyze blame should look very similar to that where you know almost 30 seconds we're spending it in apt configuration well okay that can make sense right one of the nice things about this is if we can come in through here and we can see we saw how locale generation was taking a long time and after a while it's like you know maybe we should be generating all these locales maybe we only need to be generating one unless the user says please generate this one and so trying to shorten down those those times and so I have another demo let's see here I think I have one up right I do so I have a container set up already it's been booted a couple times and I have there are cloud event analyze branch here this is not where I wanted to be cloud and it analyzed blame alright so here's an example using kind of a similar user data that I did last time fifty seconds to run apps package update and upgrade to twenty three seconds and then you can see everything else took less than a second but it does add up after a while and so getting this data and being able to see it and record it not only for each boot record right this is for the first boot but then we come down here let me see the second boot record okay this time cloud and it took far less time because everything has already been run but you already have all the packages everything's already been up to date updated excuse me yes they ran out three times alright another command this is blame and then there's I think show right so we're blame was saying which modules are taking the longest time for each boot record show is going to give you more of a Timeline view so again I mentioned the four stages so in it local write out right away quickly found I'm getting my data sources from the no cloud data source I don't have to check any other data sources move on to the next stage no networking configuration they're required so just move on and it network right it's applying some of the initial data source and it dates with so user data and vendor data quickly goes through this and so again like this is a timeline as you go through the different modules there's all the config modules that I ran here's all the final modules it ran and then okay it's at the times and this is a bug right now what we're working on but because it's at the time it's skewed everything so we need to fix that we record too similar thing right so again purpose of this is to gather data on the different clouds gathered on different data sources and see opportunities and areas we're clouded and it can be making improvements to speed up your boot time even further so that we're not slowing you down so yeah you can see by the third boot right it's ticking less than a second for clouding it to run because everything else has already been run so this is blame again showing top to bottom this is show the next improvement kind of that we've made over the over the last year is cloud ID we have a number of data sources you've heard me use this term a couple times it's it's cloud in its method of determining where it's being run all right so I mentioned there's all those clouds out there that are using cloud in it well each of those clouds will run slightly differently right with a jury you might get this ephemeral disk or you know AWS you might not get that the cloud providers have worked with the cloud and net project to create their own data source for each of them and so when you boot an instance that's running cloud in it it has to go figure out where am i running should I be running the ec2 data service data source or should I be going off I'm looking for a no cloud or am i running on likes T it needs to figure that out and that can take time especially if you if you're not running on anything like you're running on KVM it might timeout waiting for some of these things and so cloud ID is giving us a way to positively identify what cloud we're running on and the result of that is faster boot time we're not going out trying to sniff network resources that aren't actually there or shouldn't be there and so just a better all overall user experience in terms of quickly getting to boot quickly finding out where I am and moving on from there the other big area that we've been working on over the last year has been the integration test framework again we're supported across all sorts of clouds we run across Debian a boon to Red Hat Susa and so because it's been a dog with stout and that's been adopted by so many different data sources and operating systems the test matrix is potentially huge and so first off what we've done is with merge requests we are now doing smoke and nightly test runs every day using a daily image and using master and we're able to more quickly and more readily find stuff earlier and earlier with the integration test framework what we've done is given ourselves an ability to go through these scenarios these are the three very common ways that we have to deal with cloud in it first is smoke test and that test runs a kind of lewd to this running the daily image writing running the latest versions of cloud in it this important for smoke testing and gatekeeping but also in terms of development if you have a developer is sitting there hacking on cloud and net they want to be able to run something this integration test framework gives us an opportunity for that and then as far as stable release updates and debugging gives us an opportunity for that what we've been using is Lex D is a back-end likes the Alexi it's fast it gives us a big coverage of the cloud configure modules right away and so it was a really good way of first starting off we're working on KVM right now as a way of giving us ways of testing networking and storage configurations I'm on the remaining cloud config coverage of modules so what we do is we take base off of the backend we start with downloading an image with like Steve just downloading whatever that images with KVM we're downloading a cloud image we then customize it at the very least we're going to go find out what the cloud init version is in the image but we could also inject a new version of cloud in it we can also build a version of cloud in it like in the developers tree and inject that in there and customize it and let me take a snapshot snapshot could be like in like Steve's case it is literally a snapshot so that things are a little faster but in terms of KVM it's just basically copying an image and saying we're gonna use this image and then what we do is go into this boot and collect loop we're reboot an image with cloud in it we wait for that results file to come through and say everything is done we check to make sure there's no errors and we start collecting things we go through and say oh especially the host name to change respected this sa SSH key to be there and so we go and through and collect the output of all sorts of usually shell commands and gather out the output of things that we expected to chain and things we expected to be there and so we do that for all you to the test we're going to boot a configuration we're going to collect the results of it and go through this boot and collect loop and then once all that is done we've got all these results we go through and verify and these are just Python unit tests we're going through and grabbing those at the output and we're basically running making sure that the output matches and so this is kind of how the integration test framework operates and it has given us an ability to kind of really feel better about where things are at with Claude Annette in terms of merge requests and releases so previously there was no CI that was officially done on merge request it was all done by hand and so our goal has been automated you know better and faster right so anytime you submit a merge request to the cloud and it project it runs all the unit tests via tox include some linting it actually will go out and build the package make sure you haven't broken the build and then run some of those integration tests that I just mentioned and make sure that things nothing major has happened this has improved both our quality right in terms of oh shoot I broke things last night I didn't realize it that was a reactionary not having to deal with that catch it earlier better and we've also as a team tried to get a lot better in terms of responding to merge requests the graph is actually the active reviews and the yellow line is the 28th day moving average it's going the right direction a lot of things were just kind of sitting there waiting for people to review and you know admittedly there were times where we dropped the ball trying to get faster reviews and faster feedback to people over time as well as releases the 7/8 release was last year was first release in a couple of years I think we had a 7-9 quickly come after and 7/10 we're working on right now so we're really trying to do better in terms of participate with the community have them get feedback faster and be able to get things in more quickly if you're interested in cloud in it and you want to get involved you have more questions run free node and pound cloud in it the source is on launch pad there is documentation on read the docs which is all based off of stuff that's in the source code and there's kind of the marketing website for cloud and yet we can learn a little bit and get links to all these things as well if you have any questions after this I'm going to take them and then I'll be here the rest of the day thank you any questions going on yeah end of last year we had to Debian cloud sprint and we had some problems with cloud in it I think for Debian we want to use cloud in it just for the first time boot to generate the SSH keys setting hostname and so on and what I get now is that you also or it's getting bigger and bigger it's not the initial part anymore since cloud config is now something like a configuration management you build so what do you think is there still the separation between the first boot initial things that needs to be done and the configuration management part so you can you can run cloud in it without using any of the configuration stuff right you can you launch an instance without any user data which I think most people do and it will generate the SSH keys for you and be done right and so you can you can use it in that that matter I'm trying to highlight more of there's other things that they can do for you as well right in terms of you don't have to be running shell scripts for bash scripts to go do all this stuff for you can and pass it to cloud in it and how it go wash it for you and help automate your instance right going back to that first boot customize everything as much as you want but if you don't want to use all that you're fine to not use all that so that does any answer question okay any other questions cool thank you very much push it [Applause]
Info
Channel: DebConf Videos
Views: 11,625
Rating: undefined out of 5
Keywords: debian, debconf, debconf17
Id: 1joQfUZQcPg
Channel Id: undefined
Length: 30min 37sec (1837 seconds)
Published: Tue Dec 05 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.