AWS re:Invent 2019: Security best practices for the Amazon EC2 instance metadata service (SEC310)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
good afternoon everyone my name is mark Ryland I'm the director of the office of the CSO at AWS we're here this afternoon to talk about the instance metadata service welcome to the session how are the energy levels out there 4:00 p.m. late in the week everybody yeah all right I have a lot of energy this is my finishing Sprint's my last talk of the of the week I have a couple more customer meetings and then redeye home tonight I live on the east coast so this is my finishing sprint so we're gonna I'm gonna be high energy I hope I hope you are too we're here to talk about a very interesting topic instance metadata service is a very important part of the ec2 sort of sub service within the ec2 service here's our agenda will talk about the instance pivot a service itself kind of a level set so we're all in the same page it's well known but hopefully we'll make sure that we all have the basics then they'll talk about some new capabilities we've added just a few weeks ago which we call instance metadata service version 2 then we'll talk about best practices which apply to both version 1 and version 2 most things that you want to do in order in terms of taking care of the data stored inside the service is applicable whether you use the old approach or the new approach and then a call to action and sprinkle throughout this talk I've got some screen capture demos I've getting old and and and gray I don't do live demos anymore I've had too many of those the demo gods not smiling on me but I've recorded some demos really exciting CLI demos so we'll get to watch terminals doing things so very exciting but it helps communicate I think the basics and drive home some of the ideas and the points that we want to make as we go through this this topic so let's reintroduce the good old instance metadata service the IMD s has been there from early days in DC - it's a special link local address that you talk to and you do an HTTP request and you get back a lot of really valuable and interesting data now that network address is a magical at network address okay it's really just a way to talk to the ec2 control playing so the implementation details over time change and have varied they will vary in the future but one thing that you can count on as a kind of reliable semantics is that when you call that IP address you're talking to ec2 you're not talking to any other node on the network or any other instance or what have you I suppose you can set up some crazy fire a local IP table rule to translate it before it hits our our network to do something else but in general this is a way to talk to ec2 the HTTP request model is a I'll call it a universal language binding every language in the world without you know has libraries that make HTTP requests very easy to make and to take those whatever comes back whether it's XML encoded or text encoded as this service is and do something with the data that you got from that request so it's just a very general easy way to access data from any any language or from the command line is will often demonstrate in the day and the demos the concept here is introspection so there's data in the metadata service about the instance where you're running where your code is running or whether you as a user are logged in if you happen to be running interactively in ec2 there's a ton of really interesting data there all kinds of stuff about you're anywhere from some information about the kind of cryptographic signature of your environment there's some information there there's you know things about your IP addresses the virtual MAC addresses of all the en eyes that are attached to your instance I you know public IP addresses are available from there you can go through this documentation I have linked here and you'll find just a ton of really useful information for writing software typically again we can we're gonna demo this interactively but that's cutting out the normal use case the normal use cases you have software running there that needs to understand what's going on in its environment and customize its behavior accordingly my favorite is lunch index it's kind of a little obscure but as you know when you do it easy to run instances the API you can say default is one but you can say 100 or 1000 and you'll get a thousand instances with one API call there is a concept called a reservation which is the the indicator for the call and then as the instance is spin up they get numbers so if you call for a thousand instances with one API call and you get one reservation each of those instances will have a unique index number from 0 to 999 and you can reason about like cluster building like you can say hey I'm yeah eventually I have to have something fancy like paxos to decide who's the leader of this of this cluster but to initialize my cluster I'm just going to say whoever's 0 you're the early you're elected as the leader initially so there's some some cool capabilities in there another so that's kind of use case 1 introspection stuff about your environment use case to super super important use case is this thing called user metadata this is data that you pass in when you call the API for run instances that then shows up in a special place in the metadata service moreover not only can you pass data but you can pass code so you can put shell scripts in here or in Windows PowerShell and that allows you to do anything completely arbitrary customization of the environment run any code or any script download things I'd get a simple example we're gonna return to this in my demo of I'm gonna install a patchy server a PHP extensions start the server etc and download some files to actually initialize a simple a PHP application so that's the second really important use case user user data and then thirdly another really important use case is role credentials which if you deploy an ec2 role to be assumed by that instance and to call ap is that credentials for that role are also available through the metadata service now there's a challenge here the challenge is you're trying to get code from an unauthenticated state to an authenticated state right you're basically have to pass some kind of secret to that code and some kind of safe in the same way and roll credentials allow you to do that if you look back prior to the launch of role credentials say June of 2012 when it launched I actually started I'm an old-timer I've been at AWS for eight years so I actually started in August of September of 2011 and when I started working at AWS there was no good way to do this there was a lot of fairly unsafe practices for deploying to your software the credentials that needed to do something interesting so actually one of the best practices at the time was to pass credentials through user data because at least that way it wasn't in your deployment pipelines necessarily or actually in your source code which is like the worst thing you want to do for credentials but it still was kind of not great because the user data was persistent over the lifetime of the instance and you were passing long term credentials they weren't being rotated etc etc so a lot of bad things were happening and they're the best practice wasn't that great so role credentials really solved a very fundamental problem and since that time you can think of it as being a savior for many for millions of times people would have made big mistakes or getting credentials to instances and they didn't have to do that because they got credentials deployed securely those credentials are temporary they get rotated frequently and they're accessible from that local local software on the instance and only from there so how do you keep this private data safe Rowlf credentials obviously one of the important parts of the private data you want to keep safe and a lot of the safety sort of built in first of all as I said mention it's this link local address unlike a normal link local address which allows you to talk to other hosts on the same subnet is essentially a broadcast based resolution this doesn't even work in a subnet it doesn't go anywhere except it allows you to talk to ec2 now just as a little side note implementation detail this could easily change but right now in as in host you're talking to the hyper to the dom0 and the hypervisor and a Nitra host you're talking to the nitrogen from one of the nitro controllers and they're the ones that are responding to you when you make this request but again that could change so don't count on that but what won't change is you're talking to ec2 so first protection is link-local address only available locally another protection that is really important we'll keep emphasizing is when you do use roll and again the typical use cases software deployed on an instance using these credentials to call EAP is you got to do lease privilege analysis figure out what ap is do I need to call from this instance and then limit the power of the scope the privileges of your role to what is actually needed by that software and frequently that could be very small very limited it doesn't need broad as administrative powers don't put those star dot star kind of rivet privileges on your instance roles and finally you can also use local protections because the local firewalls on the other Linux or Windows can be used to scope down to particular processes or particular local principles who can access the metadata service and we'll show in a demo ways to do that and you can see in our ec2 documentation a bunch of examples as well so all that said it's worked great it's been it's a very successful service we can will continue to support version one essentially indefinitely but it is possible for people to make mistakes and do things on the instance where the software they're running somehow exposes those that private data the instance metadata service whether it's credentials or something else there are miss configurations that can occur a great example is if you have an open proxy and you install a web listener that listens for web requests and says hey send me a webrick command and I'll just repeat it on down the line for you well that can be abused and used to access data coming from this link local address so that's kind of the HTTP level and some software you know there's various kinds of Miss configurations and problems or potential you could call them vulnerabilities you know typically some web app that does they're usually called web hooks like this kind of a little API where you can pass a URL and say hey do that URL for me and send it back this can cause problems at the next layer below layer seven layer for TCP that can be misconfigured as well you might have some kind of funky knot configuration or some kind of router configuration or whatever that somehow gets a packet to the instance and the instance is configured to forward that packet to this link local address and happily forward the reply back and again that can cause problem so given that and given the fact that people have made these mistakes we've been thinking about how do we improve this how do we make it safer one thing this has been suggested and it's actually implemented in some parts of the industry is like hey just add this header to the requests and if the header is you know some magic hey and its metadata header then that would protect me the truth is that doesn't actually work and it works in some cases but in many cases the open proxies or other systems that get misconfigured will take headers and send them forward for you they'll send them along down the line so a clever slightly sophisticated attacker who finds an open proxy will just send the header along with the request and it'll it'll happily send back the data that it shouldn't send back so we want to protect against some of these as well so when I introduce metadata service 2.0 as I said mentioned a couple weeks ago came out and we're gonna go into kind of gory detail about a lot of things here but the fundamental idea here is let's make the access pattern sufficiently unusual that the typical miss configurations are just not going to cause you trouble anymore now you notice I'm using careful language here I say it's going to defeat most known configuration mistakes it's I'd like to make a slightly stronger claim it will defeat all the ones that we've ever found and tested so when these problems occur and we work with customers on those we create a little library of these problems and for now when we run this new metadata service interaction against those known problems this blocks all of the problems but it's still a problem to get software from an unauthenticated state to an authenticated state so it is conceivable there could be software out there that's Mis configured in such a way that it will do this new more sophisticated way of act acquiring the ability to talk to the service so I can't say it's impossible by any means in security you never say if things are impossible right but the risk mitigations here are very strong so we're confident this is a big improvement and we're gonna go forward with that but we're also gonna recommend and you'll see most of my best practices apply to both versions of the surface it doesn't matter v1 v2 there's a lot of things you need to think about to make sure that you're safe and secure so what do we do specifically we turn this request response model into a more session logical session oriented model so the first request is a special one it's a put verb which many many proxies will not forward special header you have to send and then you get back a token there's some limitations we'll get into about even the TCP level around that in particular request once you have the token you add that to the headers of subsequent requests and you continue to use that session essentially until it expires so let's be very specific here simple bash example pretty much tells the tale we'll do a curl copy URL command we'll put a make a put as our HTTP verb talk to a special URL within that link local address and we'll send a header in that says how long do we want this token to last in seconds maximum time 21600 seconds what I think is six hours which is about the time of the kind of longest possible time of the lifetime of a role credential so kind of matches that once you have that token and we'll see this in the demo what it looks like I've got now got it in a little little shell variable there and I can do get requests and send that token back in a header and then the service works as it did before this particular token would expire in 10 minutes because I asked for 600 seconds and this is a critical point both version 1 and version 2 are the same link local address same listeners but I distinguish my metadata service distinguishes between whether you want to talk to me v1 or v2 by the presence of these headers absence of the special headers you're talking v1 presents of the special headers you're talking to b2 question can I disable v1 yes we'll talk about that another question can I disable the metadata serviced yes that's a new feature sometimes you may not want one at all why have it there why have something that could expose data where you don't need that data to even be used on that instance now you can turn it off we're gonna go into all this in gory detail believe me because we still have minutes almost but that's the gist of it so some more details you know you initialize the token it lasts for a certain number of seconds the token is special it's not usable on any other instance in the fleet so you're welcome to test this I promise you that that token only works on that instance where it emerged from another useful protection is on the put request we look for x-forwarded-for headers and we reject put requests the token acquisition request if there is an X 44 header why well because obviously a lot of proxies who even if they're willing to do a put verb which super rare they almost always will add that header because the recipient wants to know the IP address of the caller original caller so that's just not an appropriate use case for a metadata service so we reject those calls if you go down to the TCP layer or actually IP strictly speaking we also treat the put request a little differently than normal because what we do is we say okay you're talking when you say put we know you're talking v2 so your new software you've been upgraded you're aware of this new service what we're gonna do is we're gonna send you back an IP packet that has a TTL of 1 that means if there's anybody else listening there that's going to forward that packet on to some other host and very likely they're gonna decrement that the TTL and that packet will just die it'll be black hole because you know you actually what'll typically a per the soft Wilson's back and ICMP message saying sorry T tail dead I just killed that pack point is that by default that packet will and again you can imagine crazy software that doesn't honor this but in general this is going to be very safe away to make sure that at the layer three four that these put requests will not ever traverse beyond the host where your software is reading and calling the metadata service now this is configurable why because there are valid use cases and the most typical classic one is that container networking will often run some kind of like a little mat type of scenario there and it'll decrement the put the TTL to and to pass it on to the container so you might need to use two or three as your TTL you've got to test it default is one because we're you have to have upgraded software and when if you're upgrading your software you better be testing and if once you're testing shows you that you need a TTL of more than one then just launch the instance with more than one or modify it with the API that allows you to change what this detail setting is so that's another important point you can vary it from 1 to 64 there's no practical limit to the number of tokens that this system will emit to you so you don't have to worry about like oh I've called this too many times I've got a hundred tokens and play or 500 whatever don't worry it's gonna be fine that said I mean don't be stupid in in terms of efficiency you'd be perfectly reasonable and probably recommended that you if you're gonna create a token that lasts you know ten minutes just reuse the same token for ten minutes and if you ever get an unauthorized error then get another one or use your own counter your own timer to renew it you know every five minutes whatever so you know but the point is that you don't have to worry about gee am i kind of over using these tokens there there's there's plenty of them now there are limits and you can read the documentation around the IMTS itself version one or two it does have some throttling and limits around you know trying to avoid abuse abusive calling that will potentially slow the whole system down but those are you know pretty pretty extreme you typically don't need to worry anything about that all right so let's do the basic demo here and then if we have time we'll do some slightly more sophisticated answer demos as we go along before that I want to make clear certain things because if you're staring at a screen and you see these little terminal things flashing by it's not always super clear what's going on so I want to kind of create a mental model for you about what you'll be seeing on screen we're gonna be logging into an instance we're gonna be using that arguably not super common scenario of being a human being using the metadata service as opposed to software I mean we're driving software but it's an interactive use case but there's three different very different kind of call path access patterns that will be effectively doing here one is number one well sometimes be calling the ec2 API and we'll be doing that to launch instances to modify instance metadata which is a new API or something along those lines okay we'll be calling the ec2 API and anybody that has access to the API and has the right authorization the right credentials could be doing these calls now what's a little confusing perhaps is because of the demo we'll be doing that from the instance right so we'll be using credentials that we got from the metadata service by calling call pattern to to sign the requests that we're making and we call these two API but just be clear that some of the things were show so eat and by the way I'll show by experiment that the AWS CLI it calls the metadata service every single time you run a command it doesn't actually cache credentials or tokens so that you'll you'll see if I type you know AWS STS get caller identity so I make an API call and then I disabled the metadata service and I try it again it fails because I have no credentials so it's actually calling each time again normal saw a lot of software I don't know what the SDK defaults are I forgot to I just never got around to going and looking up in the source code which is available I suspect the SDKs will cache these tokens for you and not use them every single API call but for the CLI it kind of makes sense because you're in these interactive use cases that's exactly what you're going to tend to be doing is experimenting trying things what-have-you so these CLI will actually refresh each time so that's kind of a set up now let's go to my exciting visualization here two terminal windows now these happen to be logged into the same instance and I'll explain in a minute you'll see why in a second and let's run our little demo alright so first thing we're going to do is the thing we've done all done a million times curl on the metadata service get back a list everything's fine that's all the standard version one now we'll try this in this new mode we'll say okay I want to do a put I get back I put in the header that said I want this to last 60 seconds and I get back this long string that's a token representing this session now I'll use the token I'll take the token copy paste it into a get request put it into this header the metadata token header and I'll hit enter and I get the same result but this time I talked version two previously I was talking version one okay now let's try another thing here let's try the token length okay I meant to say this token should last for 10 seconds okay and I put that into a shell variable and then I do a curl okay seven eight nine I get a result I try again and I get unauthorized my token expired okay that was like 12 seconds so it doesn't work anymore all right let's try another little variation here this time I'll do a 60 second token get back my my response so this is all kind of normal for the put operation now I'm going to add a text forwarded for header and I get essentially access tonight so I get an unauthorized exception from the web server that I'm talking to you the ec2 service now we'll do kind of a normal token this will last an hour so we're gonna use this for the rest of the demo no more refreshing I'm going to put the instance ID of my instance in a variable and now I will call the modify API easy to modify instance metadata options with my instance ID and now I see the defaults because I haven't changed anything ignore for a second depending I can describe that HTTP in points enabled so metadata services they're optional means one and two are both working if it was if token was required that would say two required and then the hop count is there as well now same instance I'm gonna run TCP dump so I'm gonna watch the traffic between my CLI and the metadata service listening I'm watching for packets on port they have port 80 either source or destination do the call you see the TTL of a normal version one call is a kind of standard TTL to 255 now I'll do a version to call a get same thing nothing special about the get operations it's only the put operation that's impacted by this new setting now I'll do a put and you can see on the right side a TTL of one for the packet that's coming back to the CLI now I'm going to modify the put the hop so I'm going to call the modify API with a hop limit of three and you see it responded said fine I'm going to do this for you do the same call and you can see on the right side the TTL that a TCP dump is report showing us is too hot a three hops so we'll kill TCP dump we'll do one more a little demo here and that is we're going to turn off version one so now I've asked the metadata service say require version to require token ok so now I'm gonna try version one I'll do this simple curl because it's way curl works that doesn't show me why something went wrong but I didn't get any results so I'll turn on the verbose option try again now I see that I have an unauthorized exception so basically there's a security problem with my request because version one has been disabled and then finally to kind of close the system out logically will try version two and it works fine okay so it's in version two mode version one doesn't work anymore locally and so forth now one thing to be clear is again it's the software on that instance that's it you know general kind of available software can call either one of these at one or two api's but when I hit with a required mode only version 2 will work version 1 won't work there's also in all emphasize that several times this is a breaking change for a lot of software so don't go around willy-nilly like disabling version 1 without some very careful testing because you're gonna break things I promise you you got to be careful you got to do some testing got upgrade to the newest SDKs you may have to talk to partners and even Amazon Web Services software frankly will break right now because we need to upgrade some of our agents so if you're writing like a Systems Manager agent inspector agent those will get upgraded quickly but right now they're not going to be using this new version so the caution and care is required but it's a journey we're going to go on and we're gonna give you a bunch of tools to make this transition and we'll talk a lot about that as we go on so what are some of the ways we support the transition well first of all as you can see by in the demo there's a new API which allows you to modify these properties at runtime so you don't have to restart instances you just call the API and now it's again it's a it's going to be loosely consistent API and that was why that had that term pending at the top every time I call modifies pending because it could take one to ten seconds right you better you may have to wait a little bit and practice it seems to be nearly instantaneous from all I can tell but you can't count on that because this is a big distributed system so but it after a little while the call that you make to modify a running system will get may will be applied and the behavior will be changed but more important you know equally importantly when you start an instance you can specify which of these parameters you want to be using the new new parameter values that we'll talk about in gory detail actually this is so let me pause myself for a second this is actually a kind of a summary slide which I'm going to go over really quickly because I'm going into each one of these in more gory details so new API is new condition keys cloud watch metrics software updates and a few other things that are coming soon so let's go into the gory details so first of all run instances has new parameters you have which resulted in the standard old behavior version one will be there services enabled the hop count is one because again that only applies when you're upgrading it can be the most restrictive and not caused any problems because remember you're not going to use this to taste our testing right you can also do describe calls and I'll show that in a minute you can do a skirt describe call and you can filter on these options and you can create little reports about what the state of your instances are like which ones have been you know disabled what's required very easy to do with the describe poll and then finally the new modify instance metadata options API oops there's a typo there sorry should be API not ap in a pin if you call this API with no parameters it just returns the current state so it's kind of a you know it comes like a little micro describe call with that for that instance and then if you of course would call with parameters you can change the state now you also have new condition keys in your iam policies and SCPs so now you can do access control basically control of the behavior we're going to do the demo of this so I won't belabor this right now but you can put in condition keys which say if the person calling this API doesn't use this parameter fail the call access denied okay so that gives you control at some point in the future as you migrate to lock down and eliminate certain settings that you don't want to be present in your fleet in addition so that's kind of that remember that arrow or I should one coming into the ace of the ec2 API and then I also show this arrow three kind of going out from so I've got credentials on the on the instance I signed a request I'm calling out in that call there's a new property called the role delivery which allows me to write a condition in any Amazon API ec2 or otherwise which allows me to distinguish between calls signed with credentials delivered by version one or call sign with credentials delivered by version two so very much belt-and-suspenders as you enter this process and begin to get to near the end of this process of migration if you choose to migrate I don't have to you can actually begin to lock down and say access denied if a call assigned with the older version of credentials the credentials that come from the older version of the metadata service again we'll do some demos on this to make it clearer here's our access patterns again which will kind of go back to so the first set of I am I am conditions apply to calling the ec2 API and with the little gray arrow controlling the behavior of the metadata service and the other one I am condition applies to calls that are made with credentials that come from the metadata service okay hopefully that's clear now in addition to those controls you also have this really handy cloud watch metric for every instance we now emit a counter of how many times someone's calling version one so the name of this credit of this metric is metadata no token you can see kind of at the very oh I haven't shown you the screenshot yet so we'll get to that second cloud watch metrics you can use this to count up actually I'll fast-forward to that for a second so this is what it looks like for any given instance I can look at this metric ec2 metadata no token and I can see the number of calls per minute and I could even alarm on that like if I'm at a place of my migration where I've migrated to say I want everybody use version 2 I can set alarms on that and if any software happens to still be out there calling version 1 I can get a SNS notification whatever any type of alarm or metric that you want to do with cloud watch so that's very useful we've updated the CL isin SDK so we've started the process of all the software upgrades that need to happen that's the core went obviously because you need to start using this right away if you want to use one of the core tools and another really important launch that's coming just a very soon a couple weeks out hopefully you know within the next two to three weeks for sure is launch template support why is this so important because launch templates are a very important indirection now for this whole ecosystem of api's so that they don't always have to be revised every time ec2 launching this so if you're doing auto-scaling CloudFormation various other things that launch ec2 instances you ought to be using launch templates and in this case you will have to use launch templates right we'll just make it so that launch templates or the indirection that you use to use this new set of features another thing that will come soon can't say exactly when but soon is will start logging in Cloud trail whether a call a request with slightly you already logged the fact that it's ec2 instance role assumption the instance ID it's already there in Cloud trail when ec2 roles are calling things but we'll add the role delivery as a property so you can in Cloud trail also track whether version 1 or version 2 is being used we've already seen the screen alright let's move to best practices so again most of these apply to version 1 or version 2 doesn't matter and I'll start with bottom line up front one of these horrible business acronyms Bluff bottom line up front it's not logically in order the rest are kind of more logical but obviously the greatest risk applies for the metadata service when you have instances that are reachable broadly like if they're on the internet you got to be careful there's bad people out there they're trying to harm you so metadata service like any software that's on public facing things manage it carefully so focus there if you're concerned about this risk and you know it's one of those things you need to be concerned about focus on your servers that have broad accessibility limit the software limit the processes do these other protections start there okay okay well bracket that and we'll talk now just kind of logical ordering of best practices first of all does any software need access to the metadata service sometimes nobody does disable the darn thing why even have it running okay that's step one if it turns out that they need access fine you can enable it but what about then think about local protections I can limit it so that only a single process on the host can access the metadata service and it may just be that one process that runs my software that has to access secrets manager and get some database passwords or whatever it is lock it down locally and that way if some other crazy software on the host ever gets compromised doesn't matter they won't be able to reach the metadata service well we'll give some examples of that the other thing you could do is a broker model you can have some more trusted software whose job it is is to access the metadata service and then it can take some of the data of your choice and just put it somewhere in a text file and other software can pick it up and I'll do a demo with it does exactly that it's a very useful pattern in fact works well in these case of user data and that's the demo essentially is when I initialize the instance I grab some data out of the metadata servers put it in text files and then I turn off the metadata service or I block it with a firewall rule from then on I'm very safe on that instance so that's another thing to think about next question does any software need to call api's that's when you need a role if you're not calling any api's adsap eyes don't don't assign a role to the instance okay just don't do it just creates risk now again there's some really interesting special cases like you might say well you know I got a call a database in that case ok fine you might need the role because the secrets manager is probably the best way to manage that secret so now you have this indirection you say ok I've got a role there any scope down the only thing that role can do is it can get a secret from the secrets manager and then use that for database calling that's fine but just go up and down just say that one secret you know you can use resource errands and your policies to make it so no other secrets are accessible there's no other API you can call etc etcetera so think about that in terms of what whether you need a role and what that role is going to doing and then you know the kind of general most important overall general advice when it comes to role management is think about lease privileged access think about what does he have some minimum things this thing needs to do and I'm gonna write my policies to make sure that they can only do those things if I have multiple processes maybe that need credentials or need access something I can use this kind of a broker model so these are all the kind of steps logical steps to be khair with managing the service and of course keep your software up to date the usual talk around I shouldn't say it like it's a joke but you know if we want to be secure we want to patch our software all right so let's do some more demos now this is kind of a little bit of an indirection but I'm gonna demo a little kind of cheesy piece of software that I wrote as a demo which used the metadata service quite a bit and then I'm gonna show kind of like modify it and show safer ways to do it now that you know I'm preaching as I should be the gospel of limited access to the metadata service so in this case I have this little two things basically it's a JavaScript application so you can see it's very out of date because I'm missing a bunch of regions but this was the global infrastructure as of like two years ago I apologize for not updating this demo but basically the idea here is it's a JavaScript application the users and the audience in this case if you're trying to communicate to like you know the average user like what is the cloud what's the power of the cloud blah blah blah so what it does is it says okay load this URL in your browser and then I'll go over to this little shell script and I'll just start launching instances and I'll launch them around the world so I say run four instances in every region and the shell script goes and starts doing you know for an instance and the demo of the JavaScript which is pulling the ec2 API every two seconds starts to show the instance is launching globally okay and they're initially in kind of a long you know startup State with the blue and then they begin to turn green because they are in the running state in terms of what the API and again this is all JavaScript calling the API from the browser so now I've launched I don't know 16 times 24 times force like 64 servers around the globe and I can hover my mouse over these things and the ec2 API will tell me various things about them their instance ID you know their IP address etc now I'm going to stop two of the four with another command and you can see them start to donítö shutting down state and they turn brown etc but there's one other thing I wanted to add which is I wanted to make it more real so I said I want to make it so that if you click on one of these you actually talk to the server that I just launched ok so that was added as feature two little demo now you'll notice if I click on that little pop-up and I'll go over to Germany click on a pop-up and I'll click on the public IP address of that server and I'll actually go to that host and I'll run it a little PHP application and the application says where am I what's my you know availability zone what's my IP address blah blah blah so it's kind of a cool little demo you can see we can go to Mumbai we can go to Singapore and in each case I'm actually talking to an ECT server that I just launched with this simple little PHP application telling me what's you know where it is and kind of what it's doing so now what's the point of that ID still isn't like the old joke well I told you that story to tell you this one okay so now we're gonna modify this little demo so that these PHP servers which is not sort of the safest technology in the world or don't have access to the metadata service so in the old version of this app you can see that I was sending user data and saying download this PHP app so everyone I launch downloaded the same code but then it went to the metadata service so you can see my PHP a little PHP app and it's calling the metadata service every single time it gets a web request okay so let me let me pause this real quick so that I can catch up with my I don't to get too confusing here all right so what you've seen there is I've got an app it's calling the embedded age of service all the time like that's not great I mean why not be safer this was a real app so what can I do about that all right so let's finish my little demo here first I'm going to prove to myself that I can control access to the metadata service using local firewall rules okay so in this case I'm going to create an IP tables rule that rejects calls from the user ec2 user which is me I'm logged in as a shell I look at the IP tables list command and I see this little reject command down there that applies using the owner module you can actually use the process owners identify or the principle as a rule that says who can call an IP address so now when I try curl you see it just times out so I've effectively blocked myself from talking to the metadata service again version 1 version 2 doesn't matter I'm now blocking myself at the IP level so now I delete that rule the rules gone try the same call again it should work hopefully oh I recorded this demo so it does work now let's try it on this little application ok so I'm running this little application on the very same server that I'm logged into with the shell it's telling me you know you can see when I recorded it not that long ago and it's got the IP address the ID blah blah blah and again looking at the source code for the PHP page it's just calling you metadata service every single time it's refreshed it calls the metadata service but I don't want to do that anymore I'm gonna follow my own best practices even for my little simple demo app so I'm gonna create an IP tables rule that blocks the Apache principle which is the default principle for Apache when you install it but you know I've got to be careful because you know someone might have changed that so be careful there but yeah I'm gonna block Apache from accessing the metadata service and if you look at the you know the spinning circle of death we missed it it's up there in the region demo so now my PHP app is hung up because it's trying to reach the metadata service and it can't alright so how do I make this little cool app I think it's cool app work again and here's what I do during the initialization of that instance I'm going to take my shell script that runs and I'm going to first of all block Apache from talking to the metadata service but secondly I'm just gonna grab the data I need one time stick in text files in the directory where the PHP app runs so you can see I'm essentially curling this stuff and stick and then my PHP app I'll just pick up the data from a local text file so very simple code modification instead of the get you know from the from the metadata service I'm just pulling up a text file this version of the app on the same host with the metadata service blocked runs fine okay when one version doesn't run the other version does run alright just a few notes now on firewalls ok first of all I kind of violated my own advice just now but you probably want a whitelist or allow principles rather than block why because if you have an allow list people can modify that system potentially later and it won't affect your security decision you've decided there is a process or a principle that you trust and they can access it if someone gets added later they can't access right if you use a deny rule or you say you know deny this one principle or these three then potentially something and get changed later that you're not thinking about it you're not aware of no one's conscious of that could access the service and that would violate your intent so it's a good idea to think in terms of like who are gonna allow block everybody else rather than block the dangerous things and leave the rest okay so that's again I violated the principle just now my demo but that's a good way to think about it there's another funky limitation and I think there's even poor requests to fix this but the current implementation the kind of broadly available one of the iptables owner module which uses again you can use other things you could potentially use process IDs or other things to do this blocking but principle seems like a pretty good idea because it requires you know root access to change principles and decide which principles that process is going to run in and so forth the current version only blocks on primary group so you might have an expectation that you can create like a permission on a group and say every process that's in this group can access that doesn't work quite the way you might think they would have to be all associated with the same primary group which changes other behavior on Linux that you may not want to do so it's it's it's an issue hopefully that module might get updated or maybe one of them some one bright person in the audience will go update it for us all and then it'll actually inspect all the groups and do the right thing now the Windows Firewall actually does the right thing in this case although it's a bit complicated you smite one line of IP tables versus this is the PowerShell you need this I'm very proud of this I took me like three hours to figure out how to do this because there's this Windows standard called the SSD DL security description just a security descriptor description language it's almost completely undocumented but you unfortunately you have to have it in order to create these powershell rules basically here I you know I create a user and now the other thing that's weird about Windows Firewall is you can't do it you you would there's no implicit deny like so if you block one you think you're allowing it the opposite or vice versa it doesn't work that way you kind of have to declare both so in this case I'm gonna say I've got this principle that I'm on a block and I didn't use everyone which means create sort of the default deny and then I create another principle that I call trust Rio or the users this is a regular Windows group and this can in this case group membership actually works properly it doesn't matter whether it has one or ten members or anything at all because Windows doesn't have a native concept well the windows API doesn't have a concept of primary group membership and then I create this firewall rule which does just what you'd want it to do okay now let's talk finally about access control capability so now we'll talk about the iam conditions that you can use to control the use of the metadata service whether it's the run call etc because again we can we can control describe instances we have some control over modify instances I'll talk about that there's a bit there's a few limitations which we'll get rid of but right now they exist and then this ability to say whether role version you know role delivery version 1 or version 2 that's also worth a look so we're gonna now do an access control demo so here we go so we have these three windows three terminal windows they're on three different hosts so red is a post that is hasn't the up the CLI has not been upgraded okay so this is a naive old fashioned client on the red host it's gonna do what software has always done for the beginning of time as far as metadata services concerned there's nothing about version two that's useful for demoing certain aspects of this change the blue and the grey terminals are on upgraded hosts so they know about version two however now so that's the software that's running and it's state visa Vee the metadata service however as I'll do it straight show it's start of the demo and terms of their power there the principles that they're running as red and blue are running as the same principle same ec2 role with the same powers same privileges and grey isn't a different one and we'll talk in a minute about why but let's go ahead and start and I'll kind of walk you through this privilege demo first of all who are these these principles you can see the role here is admin on ec2 for red it's also the same on blue and in the gray terminal for talking to a third host we have a slightly different role this role is called ec2 a MDS admins and you'll see why in a little bit I've also created variables so that I can talk about red blue and green easily by just using variables now here is a describe instances query like I mentioned earlier that creates this nice little report so in this region for all my ec2 instances I can see whether you know what the instance metadata service is optional or required enabled or not and how many hops it has for the put request so there was an instance which had optional enabled and five hops I'm going to modify it to be required enabled and fifteen hops so again this is live because my describe call so I'm doing the modify call doing the describe Hall right afterwards and it's showing me the current state as I mentioned this state seems to change super fast but again you shouldn't count on that now I'm gonna go to another terminal I'm gonna disable the endpoint so there's no more maid at its metadata service I'm going to call the same API I just call earlier now what's happened there what's happened is I've blocked access to the metadata service altogether so when that instance tried to then sign a request it failed and I can tree in able it because I can't call any API is anymore so I go to the other instance Eirene able the metadata service on the gray instance and then everything is back to normal so now I can call api's again so there you can see that I've essentially CLI is calling the metadata service every single time so now again I can do sort of standard API calls and I can do modify calls because I've upgraded my COI actually the fales I'm gonna pause myself because I'm getting ahead of myself here so red just failed because there's a new API and the CLI doesn't know anything about it right so it just give me a list of hey you must have been something else cuz it you I don't know what API you're talking about but let's continue on and I can run instances that's an old API and nothing's changed in terms of my permissions I can run instances on the with a new CLI the blue COI or I can run instances with the red CLI but I'm gonna change that because I'm now gonna go to the iam console I'm going to take a policy that says deny request if I call this API and I'm scoping it down to resources and you have to do this reasons I can explain later afterwards if this token is this if the API call doesn't you know doesn't say required then fail a call or if I try to change the hop limit to more than three failed a call so now scope down the permissions for these this role which I'm going to now associate that policy with my role attached policy now I go back to my instances and I try the same API call so I get access denied from red which now with red I can't she there's no I have no hope here with the CLI because I haven't upgraded so it's it's always gonna fail blue fails as well because I haven't specified new parameters let's go to red and try to specify new parameters and what I get is a different error message which is what the heck are you talking about I'm an old CLI I don't know what those new parameters are I go to blue and I do that same call with the new parameters limiting complement to one requiring this instance have metadata service urgent too and it works okay so now I can launch an instance so let's try some variations on that theme first I'm going to stay hop limit four which is more than what I've allowed with the policy and that call will fail now I'm going to try to have the appropriate compliment but I'm gonna say optional which is not allowed and as you would expect given this structure this demo that's going to fail so I'll go back and kind of prove to myself that all is well all is right with the world I'll both require the metadata service V version two that's what the token requirement means and I'll also make sure my hop limit is less than three and now I can launch an instance again okay so we're done with that part of our demo now we're going to talk about this other thing which is role delivery like what happens to the calls I make with the credentials I'm getting from the metadata service so version again version the red terminal doesn't know about version two it makes a call I've now essentially I'm sorry I'm gonna block that from I'm gonna make the metadata service required on red and then it'll try another API call and it'll fail because it doesn't know to talk version two protocol so it's unable to get the necessary credentials so in this case the CLI responds locally and says sorry I don't know where their credentials are well let's rien a believer j'en one on that server make them optional again and then try again and it all should be well okay because it's able to talk version one now we're gonna do the outbound scenario so in this case I'm gonna say the credentials must have been signed with the the call must be signed with credentials that came from metadata version service version two so to do that I'm going to add a policy to my role that has this structure so I'm gonna limit calls to credentials that have the numeric less than in other words it's got to be at least 2.0 this would be be future prefect if there ever was a 3.0 you would still allow calls attach that policy to the same role and then when I do a call from the red guy who a minute ago was able to launch instances now that call even though it doesn't have the other restrictions it's going to fail because it doesn't have the right kind of cribbage so I get another type of access tonight and if I go to blue actually I'm gonna pause for a second there to explain where I am here sorry yeah so I blocked on red because I've required version 2 all right now the last part of this demo is to let me explain something here so ma the modify API does not yet have fine-grain access control so you can't yet limit the modify API the way you can limit the run instances API now we're gonna change we're going to fix that but it's just telling you the current state of the world so you can't for example say only allow modify if put is less than you can't do that yet give us a few weeks it's coming soon but if you think that if you're in the process of upgrading and you want to restrict access to the modify API that's a very reasonable thing to do so what I've done in this case is I have a special role which does have the power to modify the instance metadata service but other powerful roles don't have that all right so let's continue on so blue server which is a pretty powerful role is able to call them edit call that API but now I'm going to add a policy that says you have to be in this IMD s admin role in order to modify the API attach it to both the role of the blue terminal I'm also going to attach it to the role of the gray terminal so they've got the exact same policy attached but one of them is in the role that's permitted so now what happens is what you would expect is that you can not call this API from blue you'll get an error message but you can't call it from gray because this is part of a privileged role that has access to the metadata service ok all right I think that's into my demos so I should just say in passing that you can apply every single one of these policies I just showed you is gonna is in our documentation and went live yesterday and you can take the they're all deny policies which means you can turn them into service control policies so if you're using organizations you can literally copy and paste every single one of those policies put them in a service control policy and it will block everything in the environment now beware it'll block the console in that case because the console doesn't yet have the ability to set these new parameters but again that's coming soon so let's skip to the conclusion call to action I think you can probably understand what this is going to because it's really just a repeat of my best practices which is you know start with your systems that are very broadly accessible right that's where risk the greatest risk in some well for whatever private data is in the metadata service even if you're not using rolls they're still data in there you probably don't want expose then decide about migrating to version 2 some customers are perfectly happy with version 1 and it works great you make your own risk assessment you don't have to migrate it's all fine if you decide that you want to you know do that you can you can do that so start with your risky systems disable the IMD S if it's not necessary don't use rolls unless you have to you know look for that least privileged kind of thing if you do go to version 2 assess the risk you know decide about migration and then start the migration for for example upgrading software you gotta start upgrading software you watch the cloud watch metric to watch your count go down to zero of software using the old version and then eventually you can begin to lock down version 1 so it just doesn't work anymore using both these controls around starting new instances modifying them or using credentials that come from version 1 so thank you very much for your time I appreciate it and have the rest of the week only one more day have a great week thank you very much [Applause]
Info
Channel: AWS Events
Views: 3,696
Rating: 4.909091 out of 5
Keywords: re:Invent 2019, Amazon, AWS re:Invent, SEC310, Security, Compliance, and Identity, Not Applicable
Id: 2B5bhZzayjI
Channel Id: undefined
Length: 59min 45sec (3585 seconds)
Published: Mon Dec 09 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.