AWS re:Invent 2018: [REPEAT 1] Deep Dive on Amazon Elastic Block Storage (Amazon EBS) (STG310-R1)

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

all right so good morning welcome welcome to the EBS deep dive session here at first year at reinvent by way of introductions my name is Ashish poly Kerr I am on the product management team with elastic block store which means that part of my job and responsibilities is listening to feedback from you understanding understanding why we build the things we build and and then and then making sure that that we listen to our customers and and do the right sets of things and with me is I'm mark Olson I'm an engineer on EBS have been around DBS for a little over seven years and I get to build all the things that you guys come up with so I get to have a lot of fun all right so with that Weaver we have a ton of stuff to cover and and this session is is interesting in a couple of days because we try to cover some basics as well as then go through a deeper understanding of what some of the best practices and learnings that we've had from customers from our customers just by show of hands how many of you are relatively new to EBS Stewart show of hands a bunch of you okay so what is the way this is going to be structured as I'll walk through some of the some of the basics and then Mark with Mark here is going to walk through some of the best practices and things so there's a wide array of things that we expect to cover today and so I'll try to go at a quick clip post the session mark and I will we'll hang around here do stop by do ask us questions love to hear from you guys all right with that where where does ABS fitted so if you think about the AWS Storage portfolio we have offerings in block in file and an object around those you have capabilities around data movement as well as around security and management from a focus perspective elastic block store EBS focus is on the block storage portion of it and and that's that's what we're going to dive into today when you think about block storage offerings within the AWS context really there's two categories one is our instant storage family and and between the instant storage family you have SSD based options as well as HDD based options and then on on the EBS side you have SSD backed options as well as HDD backed options the SSD back options are our general-purpose gp2 volume type and our provision I ops are I one volume type and our HDD back ones are as throughput optimized or SD one and cold HDD or SC one again we'll go into what those are and when it's appropriate to use them in in a little while so let's start with what is ec2 instance store in when you think about the instance store really think about it as being local to the instance it is non persistent in the sense that if you start and stop your instance the storage goes away a point of confusion and question that we are often asked is what happens if you reboot the instance and if you reboot the instance the storage still stays right so the the non-persistence really refers to when you start stop an instance what happens to that but the important criteria here is that the life cycle of instant storage corresponds to the life cycle of the instance so the data on the instance store is not replicated by default what that means is you can you can put applications on top that can do the replication for you not you get keep capabilities like snapshot but you can create backup capabilities on your own and again it has SSD as well as HDD options within the product family okay with that what then is EBS so if you think of EBS think of it as block storage but as a service the key thing there is the lifecycle of the storage is create from the lifecycle of the instance and so the service is accessed over the network you can create you can attach you can detach and you can delete volumes right a point of confusion often is is it a disk and really the answer is it's not and and part of that the part of the way to think about abs is really think about it as storage that's distributed over a number of physical machines and that's what allows us to give EBS the availability and durability characteristics that mark will go will go into so I mentioned earlier that the life cycle of the EBS volume is separate from that of an ec2 instance what that means is if you start and stop an instance the storage can still persist that has some nice properties what that means is if you change your mind and your work or your workload changes you can change the instance type that's attached to your EBS volume you can take instance types up you can take instance type stuff and and the way you do that is by detaching the EBS volume and reattaching it to another instance within the same instance door the same is true if your instance fails if an instance fails you can simply spin up another instance or later mark will go into what we do with with auto recovery and attach it to the same EBS volume key thing to remember a single EBS as a given EBS volume can attach to a single ec2 instance however you can have multiple volumes that can attach to a single instance our recommendation is to split your boot volumes from your data volumes because that gives you some nice properties about how you use those boot and data volumes for for other capabilities okay with that let's dive into the bs volume types I mentioned earlier there is there's really two types or two families if you will of volume types there's the SSD back ones and the HDD back of volume types within the SSD family you have GP 2 and I 1 general purpose and provision Isles respectively and then you have your throughput optimized and your cold HDD st 1 + SC 1 respectively on the HDD side the key thing to remember is the SSD type aligns volume types aligned with the properties of what solid-state devices provide you so they can be random access devices and and so that that lends itself to certain types of workloads the HDD ones on the other hand tend to lend themselves to throughput oriented workloads and again that that is that is something that they are designed and worked for but between that entire family you can cover a wide swath of workloads going from relational databases I think my sequel think sequel server s AP Oracle no sequel databases think Cassandra couchdb big data analytics workloads Kafka Splunk Hadoop data warehousing all the way up to streaming media workloads like transcoding encoding rendering filesystem services right so again there is there is a wide spread of workloads that that you can deploy given the volume types that which is described the very logical question from there that that follows is okay how do I choose my volume and really it ball starts with a two-prong question what is more important to your workload is it I ops or is it throughput okay so let's say I ops is more important to your workload and in that case then the question you have to ask yourself is do I need a UPS that has great than eighty thousand or less than or equal to 80 thousand so why eighty thousand every easy to instance gives I get connects to EBS and we refer to it as the EBA has a specific amount of bandwidth and throughput throughput and I ops associated the maximum that an ec2 instance can do we call it EBS optimized I UPS is up to eighty thousand and so what that means is that is the max that a single ec2 instance can do across all the EBS volumes attached to it and in that case if it is more than 80 thousand you should use an SSD based instant storage family it was less than eighty thousand the question there is what latency profile is your application okay with is it okay with a single-digit millisecond latency profile or does it need a less than one millisecond latency profile if it's less than one again the the answer is an SSD back instant storage family if the application is okay with single-digit millisecond latency then you have to gauge between cost and performance the answer is cost the answer the volume type is a GP - so what does GP to look like and act and we did in case you missed it we announced this on Monday but prior to Monday the baseline for volumes was from going from hundred I ops up to ten thousand I ups increasing at three I off spec GB post Monday what we announced was you now get up to sixteen thousand I ops right so you get sixty percent more I ops at the per EBS volume that you have and your throughput goes from 116 megabytes per second to 250 megabytes per second again this is the core volume type that is great for boot volume low latency applications and bursty databases going back to the questionnaire what if performance is more important in that case your answer is likely Iowa or a provision I of slalom type so what does that look like up until Monday we supported up to a maximum of 30 mm yup and a throughput of up to 500 milli megabyte per second the way it used to scale was really a 50 to 1 I ups 2 GB ratio so that a volume of 640 gig could get up to the full max of 30 mm I ups the nice thing is on Monday what we launched was doubling of that maximum right so now you can go from 32,000 if's up to 64,000 I ops and up to 1000 megabytes per second in throughput now to get the full performance of our provision is volumes you will need a nitro instance nitro based ec2 instance which is our C 5 m5 family and beyond but that now allows you to go even higher in terms of performance for your workloads and so that performance chart I just showed you now starts to look like this which is you can go all the way up to one point 1280 gigabytes with the same IO 2 GB ratio of 50 to 1 and hit the maximum of 64,000 I ops and very excited about about bringing this to our customers and so that's what the profile looks like with a baseline of 100 cups going all the way up to 64,000 I ops with its throughput going all the way up to 1,000 megabytes per second capacity ranging from 4 gig to 16 terabyte if your application needs critical sustained I ops then likely provision I ops is the right offering for your workers okay let's flip over to the throughput side of the house what if your workload is throughput intensive it's going to look very similar the question you to ask yourself is is it is it small random i/o base throughput or is it large sequential i/o if it's small random that will again bias you more towards the SSD side of the house just large sequential then the next question is well is it more or less than seventeen fifty megabytes per second why that number because that's the maximum amount of throughput from a single ec2 instance to the set of EBS volumes connected to it so if it is greater than 1750 then that that biases you towards Rd two sets of instance family on the other hand you need less than seventeen fifty megabytes per second then you to ask yourself the question what do you care about more cost or performance if it's if it's performance then the answer is our throughput optimized family st1 let's go into that so st one or a throughput throughput optimized fam HDD family starts with a baseline of forty megabytes per second per terabyte going all the way up to 500 megabytes per second again with a burst going from 250 megabytes per second per terabyte up to 500 megabytes per second the key thing to remember is that the minimum starting size in if for the st1 volume type is 500 gig going all the way up to 16 terabytes the other key difference you'll notice from the SSD families which were I ops based choices really here what you're choosing is throughput based on the capacity that you have if you have workloads they're a large block sequential needing high throughput then that's what SD 1 is going to be useful for ok what if cost is more important then the answer is called HDD or a c1 right so what does called HDD look like cold HDD starts with a baseline of 12 megabytes per second per terabyte going all the way up to 192 megabytes per second with a burst going from 80 to 250 megabytes per second again the entry capacity is 500 GB going all the way up to 16 terabytes this one is again throughput intensive but not as throughput intensive is a ste 1 and therefore think of use cases like logging or is a secondary copy of your data that that that that would be a more appropriate workload for sc1 so that's the taxonomy going all the way from instant storage to our SSD back EBS storage types to our HDD back EBS storage types to instant store that is displaced what if you don't know your workload what if that is something you don't know the profile and how that fits our recommendation is start with gp2 and the nice thing is we have a capability that will go into us into a bit they called elastic volumes you can start there observe your workload and then it just accordingly and that gives you a good branch of point into our other volume types by the way you can mix and match these volume types within your within your infrastructure and and take advantage because these are different cost points and different cost profiles an example of this is with Zendesk and they had an elk stack and ELQ stock that was that was using the I 2 instance family after moving or to EBS and and using using GP 2 in addition to a streaming volume type that resulted in them saving over 60% in terms of the overall cost of the solution so pretty exciting to see how customers mix and match our volume types in order to get cost savings for their workloads just so you have a sense of the cost points these are our cost point so GP 2 is a 10 cents per gig provision IRS twelve and a half cents a gig and six and a half cent provisioned io st one is four and a half cents per gig SC one is two and a half cents per gig capability that mark will be learned a little bit is snapshot which is essentially our equivalent of how to take a backup is five cents per gig month so I touched on elastic elastic volume what if you start on GP two and you decide to change your volume types and that's really what our elastic volumes capability is about it allows you to increase your volume size allows you to change your volume type and allows you to increase or decrease your provision I ops completely non-destructively to your application why is this important let's say you're running an application with a database on an EBS volume and you decide you are running short of I ops well with elastic volumes you can increase the amount of I ops allocated to your EBS volume and thereby get that improved performance so how do we modify we recommend four steps snapshot your volume in case things you were wrong modify your volume monitor your modification and then extend your filesystem cover each of those four so snapshot volume pretty standard user creates app short use the console use the CLI that will allow you to create your snapshot your next step is modifying your volume in this case we are going from a GP to volume type to an i/o one volume type right and and one of the questions that customers do ask is do I have to make one change at a time and the answer's no you can actually change multiple things in this particular use case we are changing the type we're changing the allocated I ops as well as we see in changing the size right so you can combine different changes within within one modified volumes command what you then do is monitor your volumes and the way you monitor in this case is with the described volumes modification command to things that you look at is what's the modification state in this case it's optimizing and and that gives you the overall progress relative to where you are from a from the elastic volume standpoint the UI a similar thing you can go look at your in use its modifying its optimizing and then it's done the last phase of this is now for your file system to recognize that additional capacity and so in this case you would go list list through the file system we've chosen X FS as an example and in this case you then either use resize two FS or grow FS to take advantage of the new capacity on Windows you go into disk manager and in this manager you go extend the volume you'll see that the operating system will see the new capacity you will go extend that volume right and then take that take advantage of that additional space and and that allows you to you can also use PowerShell commands by the way to resize resize your disks so some tips and tricks i'll cover one that's that's that has come up recently with with in cosmo conversations first is your modification must fit within the volume specs as I mentioned earlier sd1 has a minimum entry size of 500 gig so if you're converting from a one gig GP to volume you can't convert to a one gig st one volume you need a minimum of 500 gig assigned then you can modify volumes once every six hours current generation you do not need a stop in instances you do not need a stop-start or an attached patch and if you have volumes that were created prior to November 2016 those may require a stop-start or an attached Akash one last thing that does come up is some quiet customers to ask us about what kind of performance I I get the general guidance is you'll get the least common denominator at least off the performance of the two volumes while it is in the state of modification right so your so let's say you're migrating from a GP to to an i/o one and let's say the GP 2 in this case was the lower performance volume you would at least get that performance as you're migrating that volume from GPT to I 1 right so that's why it will be non-destructive to your your application doesn't doesn't stop there we we've seen customers who've looked at it and said hmm I can now start to automate with elastic volumes right combine that with with cloud watch alarms events and lambda2 start doing things in a more automated fashion so what are examples of this well you can write size in an automated way you can publish a free space metric the cloud watch use lambda to extend the volume resize the filesystem and and take take advantage of that new capacity you can run a cloud watch alarm to monitor what your iap syn sumption is on your volume and decide if if you're running out of IRS and and if you need to exhaust if you need to extend the volume right you can then automatically initiate a workflow where you need additional i ups for the volume with that what thank you now the fun part so how many of this was was this review for or how many people is this review for show hands most of you that's great so what I'm gonna go through is a few things on how to use EBS some tips and tricks here so in order to use EBS what do we need to do we need to attach it to an instance and so this is pretty straightforward we've got an API attach volume you can do it through the CLI or console provide an instance ID your volume ID and a mount point but now this mount point is how it's going to show up in your instance on our design based instances and so when we attach it to as an instance do an LS block we can see that X PD F is the the volume that I just attached it's a nice giant one gigabyte volume probably wasn't just doing anything other than this demo because you can't do a whole lot with one gigabyte these days can you how many people have launched nitro instances a few of you of those that just raise their hands how many are confused by how EBS volumes appear in nitro instances so when we built nitro and we went through the story last year but I'll give you kind of an overview we moved all of the device simulation onto our own custom hardware called the Nitro card one of the things that that allowed us to do was give more resources in the virtual machine back back to your instance on that card we actually look like a PCI device to your instance and so simply we've got a PCI device we decided well most operating systems these days have support for nvme nvme is a pretty good programming model it's pretty easy for us to do it's pretty easy to iterate in the driver community and so we decided to expose as an nvme device now the downside here is in linux for example we have to rely on how that pci bus is enumerated in order to give you your actual volume ID and so linux gives you these nvme prefix names depending on how the order that the the EBS volume is actually attached and enumerated and now this can change across three boots I can change the cross stop starts if something slow if you do attach his detaches attaches and then you reboot you might end up with a different order so I don't recommend relying on these nvme zero and one names one of the things nvme did give us the opportunity to do was add some more detailed information that we really had a hard time doing with son and so we populate as part of the nvme information the serial number is your volume ID we had to take out the - because nvme only gives us 20 characters nobody you've got the volume ID there the model name is EBS or Amazon elastic block store this is to differentiate it from our instance store our new instant storage volumes which are also in via me and those will show up is as Amazon ec2 instance storage now that mount point we still try to carry through for you to give you some some way to identify this volume in a programmatic fashion and so nvme gives us the ability to add vendor specific data we took the first 32 bytes of that and put that mount point that you specify into that data and so you can use the nvme CLI to actually get that data and there are there are a few other ways that you can do that too now if you're using Amazon Linux we made it even easier we wrote a little python script that parses out the volume ID gives you back the one with the - that you would expect so that if you've got scripts that rely on that volume ID you can use that and that block device mapping comes out of there as well in addition we created rules for the u-dub system so that when your volume is attached the nvme 1 + 1 gets created and then a event notification is sent to you dev we pull for that we look at the device for the for the mount point and actually create a symbolic link to that nvme 1 + 1 + 1 and so you can still reference it with these you dev rules using the XV DFT specified on the console now you can also when you create the file system on there specify a label which is something that will stick with the with the DBS volume for its life so you can always reference it by that and Linux by default has you dev rules we'll create symbolic links to that label as well so I highly recommend that you use one of these more stable methods to to identify your block devices and your EVs volumes all right some more best practices we're going to start out with security security is one of the most important things how many people have used EBS encrypted volumes cool I like to ask this question every year and I like to see that more and more hands are getting raised so we've got EVs encryption by where you can just do a checkbox to encrypt your your EBS volume uses your default key so there's no extra management no maintenance overhead will take care of all the encryption now one of the benefits of our nitro cards is that this is all offloaded into hardware so there's no performance penalty for for doing encrypted volumes so all you do is check the box create the volume you're good to go now the one thing the default key doesn't give you is it doesn't give you the ability to to do finer grained access controls and a few other things so if these things are important to you we recommend that you do that you create a custom kms master key and so you can create this key allows you to do a key rotation policy that could be different from ours Cloud trail auditing and then the access controls that I talked about and then when you encrypt when you select the encrypted volume you just change the key that you're gonna use so instead of using that default key you use this new PBS master key that I just created now for those of you that are using encrypted volumes have you do you do a lot of snapshot sharing and things like that so one of the things that we've we've had to do is you can share encrypted volumes but you have to actually create a snapshot and then copy that over into the new account in order to create a new image for it so in preview what we're what we're coming out with soon is the ability to do one click so that you can take an EBS volume that is either unencrypted ami or encrypted with a custom cmk that's shared and just do one click create an instance from it so you don't need to do that copy image anymore and you don't need to take that time excuse me just a second four days and Vegas really drives you out alright so because EBS volumes persist beyond the life of the ec2 instance we really need to think about the reliability of EBS separate from your ec2 instance and so we think about volume reliability in two ways the first is availability and we define availability as the ability to access your data and so this accounts for our software the network in between your instance and the EBS storage servers and the uptime of our service and so from this aspect we're designed for five nines of availability we track this continuously we've got aggressive alarms and and constantly monitor this so that we know that we're even getting close and our alarms are set way above this five 9s design point second is durability and that's once we get to the storage server that's hosting your data can we get it off the media and so here EBS is designed for an annual failure rate of point one to point two percent so the way to think about this is if given a statistically significant sample of volumes you can expect one to two volumes to fail out of a thousand every year now this is the design point but you can reduce the probability of data loss when a volume fails by taking snapshots and so EBS snapshots are a point in time backup that's crashed consistent of the modified blocks your volume now these snapshots are stored in s3 which is a service with eleven nines of durability now since it's only the modified blocks we're able to do incremental snapshots and so you only pay for the new data on a new snapshot what this also means is that we'll keep track for the entire lifetime of that snapshot of what data is unique to that snapshot so when you delete a snapshot you don't have to worry about any future snapshots that rely upon that you can just delete that that snapshot and will delete only the data that's unique to that snapshot now when you take snapshots we also recommend that you tag them tagging is an important feature that allows you to better manage your AWS resources and in this case snapshots and volumes as well they give you the ability to assign a simple key value pair so that you know what those snapshots are for and it allows you to visualize analyze and manage those resources now you can also take these tags and activate them for cost manner or cost allocation tags which I'll talk about in a minute earlier this year we made it a little bit easier to create tags on your snapshot so now you can create tags well you when you create your snapshot you no longer have to wait for the snapshot IDs and then add tags to them so this allows you to actually keep track of those snapshots a lot more conveniently and so you can do that with the CLI or the the console as well I mentioned cost Explorer caustic floor is an interesting thing that we brought out last year and what it allows you to do is you can assign tags that are cost allocation so you can look at your storage usage and costs dedicated to specific tags so maybe you want to track your dev tests versus prod workloads and you can see how they compare and how your cost for those snapshots are now I mentioned the DVS snapshots are craft consistent most applications can deal with that it's like pulling the power plug out of your out of your computer and so there might be some log the log will replay and it'll fix itself but if you need applications consistent snapshots you can do that on Windows through ec2 SSM and so we integrate with VSS now all you have to do is use the policy generator create an ion policy for for AWS allow describe instances create tags and create snapshot and then attach that to your windows instance and this has been available since last last fall earlier this year we announced the ability to do finer grained resource level permissions on snapshots and so you can use tags to limit the actions that specific I am users can actually take now one really interesting use case that I heard about a few days ago was taking snapshots copying these across region and then using resource level permissions to only to restrict the number of people that can delete that snapshot to the to the root account so this is one way that they could do a disaster recovery that's separate from their existing account in their production environment back in July so we've got all these primitives that allow you to take snapshots and tagged snapshots and manage snapshots but back in July we announced data lifecycle manager and what this does is it allows you to create policies again via tags to enforce regular backup schedules retain those backups so if you want to have some sort of retention policy either for compliance or for just disaster recovery purposes and then control those costs by deleting old snapshots and then this is controllable through through IM users as well and no additional cost alright so what can we do with a snapshot EBS volumes are zonal and what we mean by that is that your EBS volume lives entirely within an ec2 availability zone they can attach to instances within that availability zone so what if you want to take your volume and put it in another availability zone you can use the snapshot to bootstrap your application so you create a snapshot you can launch another volume than the other availability zone and then if your application has some sort of catch-up replication things like that you can just have that differential spot the other thing that you can do is you can take a snapshot you can copy it to a different AWS region perhaps this is you want to go to us east and us West maybe you want to launch in one of our new regions that we're announcing maybe you just need a disaster recovery copy and so you can copy to another region and then create volumes and availability zones in that region as well alright so what about AC to instance failures since the life cycle is separate from your ec2 instance when the instance dies if there's a failure of any sort that EBS volume will stick around so I have to do is create a new ec2 instance and then attach that EBS volume to that ec2 instance now that'd be great but again that's a primitive you could it's a lot of manual work and so with an EBS only ec2 instance so if you don't have any instance storage attached to it you can actually use ec2 instance recovery and what this allows you to do is create a clog watch alarm that when that alarm triggers we will actually replace your instance now when we replace the instance the nice thing about us doing it is that it retains all the metadata so your IP your volume attachments the instance ID so it just looks like the instance went through a reboot now if the underlying hardware failed we'll put it on to a different server maybe we'll reboot it in place but we'll look at the the health of where your instance is and make a decision on where to where to move that instance to and so like I mentioned this is available on our newer instances c3 which isn't really new anymore but that's when we started launching it if it's an EBS only instance little bit of housekeeping by default boot volumes are set to delete when you terminate your instance and data volumes are not and so with delete on termination false that means that when you actively terminate your instance will detach that volume and it'll just sit there maybe that's what you want and if you want that go ahead and tag that volume so you know what's on it so that a year from now you don't have to mount it and try and figure out what's on it and why you kept it around but you can then attach it to any other instance if that's not what you want maybe it's a dev test workload and so you don't really care about the data you can set all of your data volumes to delete on terminated equals true and then what happens when you when you call the terminate API and terminate that instance will also delete that volume for you and cleanup saving a little bit of cost all right performance one of the first things I get asked when when I'm talking to somebody about performance is do I need to initialize my volume you know I found a blog post from seven years ago that said you had to do a whole lot of things if it's a brand new EBS volume all you have to do is attach it it's ready to go you'll get full performance on it if you create that volume from a snapshot what we will do is we will load that data from s3 in the background if you request an i/o if you do a read i/o to a segment of that volume that we don't have yet we will fetch that data from s3 at the head of the line so in front of everything else that were loading in the background so it's just in time on-demand loading there now we've done some performance improvements over this but if that's not good enough for you what you can do is you can do what we call initialize which is doing a random read across your volume and I say random because that's how we load it we're a distributed system and you're gonna take advantage of all of our back-end throughput test three that way and so here's an example of FIO now the fun part about this FIO command when you create a volume from snapshot you attach it you run this FIO command FIO is gonna report that it's gonna take three and a half days to load all your data that's because it looks at the very first request at how long it took and then extrapolate that out to the time as we load more and more data your volumes gonna speed up so you don't actually have to let this finish we're gonna load more data in the background than this then this will actually read and usually when it gets to about 20% you'll have to monitor your performance but when it gets to about 20% we've loaded enough data and that your volume is going to get full performance from that point on so just keep an eye on that you don't have to let it finish just when you get full performance you can cancel this all right how do we count io so for our GP two and io one volume to tar SSD back volumes designed for random workloads will opportunistically merge those iOS into 256 K chunks and that's up to 500 megabytes a second and then for s T 1 and s C 1 will merge into one megabyte chunks now those are design there's a hard drive products really designed for throughput to take advantage of the mechanics of a spinning disk where the seek time is really the expensive part there but once we get that head part or we can just read that data all day long and so this helps you maximize the burst capability of your volume and potentially minimize the number of provisioned I apps that you need to provision so let's go through a few examples of what this looks like so if you're doing for random iOS each one 64 K we're gonna take this train load up the boxcars 1 boxcar with each i/o and ship that off to EBS and we'll count that as for iOS so if you're doing sequential i/o so we'll take the first example as an SSD back product each i/o 64 K will park those into the same end of the same boxcar and now the important thing to remember here is this is logical merging we don't hang on to your i/o waiting for some deadline to to approach or the next i/o to come in we will actually just keep track of those iOS for you and let the i/o go through and if we notice that the next one is adjacent then we won't take anything out of the bucket for it and we'll send it along like it was part of the same i/o and kind of logically merge those together for you and so this 256 K i/o is going to be counted as one i/o similar thing with our streaming product if we do these 64 Kos it's the same sort of thing and we'll just keep adding these on into the bucket until you get to the 1 Meg and then we'll close that bucket and go to the next one now if it's a large i/o it's kind of the flip side will split those apart so for a 1:1 on our SSD back products we'll split that apart into two four segments and so that'll actually count as four arrows and our SSD back products most workloads aren't purely random they aren't purely sequential they're kind of a mix and so this is really where we need to think about how do I get the best benefit out of the volume type and how do i best choose the volume type so if this is an ST one SC one volume and we're doing these larger sequentially O's mixed with smaller random iOS we're going to merge them together as much as we can but these six iOS are going to be counted as four so we merged it a little it's not six but it'll count as four megabytes of burst even though your application only got one and a half megabytes of data so there's a little bit of a loss there when you're doing some random i/o on on our storage products on our stream products all right so gp2 st 1 and SC 1 all operate with a burst bucket model now I'm gonna go through a really quick explanation of how this works I used to spend about 20 minutes on this with confusing charts and graphs and everything else but I'm gonna show you a little trick in just a few moments and so all of our first volumes operate on a token bucket model and so for gp2 that bucket is sized at 5.4 million credits and you're accumulating those credits at 3 3i ops per gigabyte per second and you're allowed to spend those at 3,000 credits per second or the three to one ratio if it's greater than than 3,000 so here's what it looks like on a graph with our new 16,000 I ops limit if you have a 5.3 terabyte volume you'll get the full 16,000 I ops all gp2 volumes have a minimum of 100 I ops so if we have a 300 gig volume you're gonna get a baseline of 900 I ops bursts up to 3,000 I ops but the question is how long does 5.4 million credits last and so it's dependent upon your volume size because of that fill rate and so our smallest volumes are designed to burn for 30 minutes all right and that's where that 5.4 million comes from but as we increase in size we can burst for a longer and longer period of time so that 300 gigabyte volume you can burst for 43 minutes if you go up to 500 gig it's an hour of burst and then as you approach a thousand gigabytes you get a very long period of burst so 10 hours you get really close to almost infinity and then you get above that into the non burst regime so here's the easy way to do it we've now got a burst balance bucket or a burst balance metric rather and so this is expressed in a percentage of your burst bucket so that you can figure out how much you're using and so in this example I used to benchmark benchmarks are great because they show you maximum performance benchmarks are terrible because they don't represent your workload so I ran FIO for an hour on this 500 gigabyte GP 2 volume so I've got the 3000 IUP's baseline or 3000 I up first and then 1,500 I ops baseline and so you can see for that one hour period I'm getting my burst bucket is depleting while I'm getting that 3,000 I ups and so the that's the number of operations on the right hand side of that graph over a 5 minute period which is why it says 180,000 if we zoom out on the graph after I stop my my little test you can see my burst bucket refilling so we were filling meet we refill as much as we can as quickly as we can when you're not doing that IO and so when you see spikes in your application the down period between those spikes will actually be filling your bucket back up now it's important to remember that gp2 volumes are designed for most work clothes and and we find that a small fraction of the percentage of volumes on ever run out of their their burst bucket so most workloads will benefit from GP to our throughput volumes operate on a similar principle the biggest difference is here is that it's accumulating based on throughput and not based on I ups and your your bucket is sized based on the size of your volume and so that 12 megabytes per second that you see for sc1 is an interesting number if you do the math what that does is it allows you to do a full scan your heart of your sc1 volume every single day so you do 12 megabytes per second times 86,400 seconds you get to about a terabyte and so that's the design point of SC 1 and st 1 gives you a little bit more three and a half times a day that you can do a full scan of your volume and so here's what st 1 looks like on a graph and you can see that at 8 terabytes you get 320 megabytes per second base with that 500 megabytes per second burst and right around 12 terabytes is where the burst and the base curves meet SC 1 is a little bit colder again designed for one scan a day instead of three and you'll notice that the basin burst never meets so if you do need that 250 megabytes per second you'll have to have some downtime in there and that's part of the design point of the volume so from a data transfer standpoint I showed this in the the way that we merge but here's a comparison that if you're doing purely sequential versus purely random workloads on an SD one volume so we're doing a large 1 megabyte sequential reads here if I let that run for 3 hours I get almost five and a half gigabytes of data off of that st 1 volume but if I'm doing 16 K of random I'm only gonna get 87 gigabytes pretty big difference there and that's part of the design point of the volume so how do we know what our workloads actually doing a lot of workloads you don't actually get the opportunity to choose so you have to monitor and so if you're on Linux you can use Iost at i/o stack gives you the average request size there's some averaging and rounding in here so it's not going to quite make it all the way to one one megabyte but this example here you see I've got 20 46 sectors and those are 512 byte sectors divided by 2 we get to about a thousand 24 probably doing a 1 Meg workload if we as perfmon we can get the similar view of what our workloads doing we can also use cloud watch in cloud watch what you'll see is 128 K and part of the reason for that is how we split it on the back end and how to report these so this is pre merge what we're gonna see this doesn't show you the post merge and so if you want know and understand how you're merging you can look at compare this to your birth balance and if your burst balance isn't going down we're probably merging for you if you're seeing less than 64 K or around that neighborhood you're probably doing some random workloads in there even though you think you're doing sequential so here's a performance tuning tip if you are doing large sequential reads on our streaming products and only reads and only large sequential we recommend that you increase your read ahead and so this is per per volume configuration not persistent across boots the default for Linux is 128 K which means that every i/o linux is going to expand that to at least 128 K so if you want to take a look at it this again is expressed in sectors so here's a simple command to increase it to one mag the caveat here is if you are doing small random i/o it's going to degrade your performance even more you submit that for K i/o with a 1 Meg read ahead Linux is going to expand that all the way to a 1 Meg IO and so you're gonna have to wait for that to finish before some of the next iOS so if you're doing a lot of them it might stack up EBS optimized gives us dedicated network bandwidth and now this is enabled by default on all of our current generation instances and that's all of our instance types that have the Nitro card even if they're Xen or nitro system nitro hypervisor and so we we started using the nitro card with our c4 family and so this is enabled on all instance types for our older instance types you can't enable it via stop start so you stop your instance modify the attribute and then start the instance I highly recommend this what it does is it separates out Network throughput specific to EBS to give you more consistent more understandable performance that your application won't conflict with now it is important to select the right instance size every instance has a specified amount of EBS optimized throughput associated with it and so if we take a seat for large and attach a 2 terabyte GP 2 volume expecting to get 6,000 I ops and 250 megabytes of throughput the c-4 large only has 62 and a half megabytes per second of throughput and which equates to about 4,000 ions and so your volume is bigger than that EBS throughput allows so you're not going to get that full performance so what you'll want to do is you want to go to the bigger instance size and so in this case the I'm doing an eye ops workload so we'll go up to a c4 to x-large which will give us 8,000 I up to that same 6000 I ops volume we'll be able to get the full performance out of it now when we were building nitro our nitro system instances we gathered a whole lot of data on how customers were using EBS optimized and similar to GP to sp1 sc1 we noticed that most workloads are spiky and don't really have sustained traffic and so what we did is we enabled the ability to burst your abs optimized through put on our smaller instance sizes so on the example of c5 large you have 30 minutes of boost or burst and what this does it allows you to burst up to a larger instance sizes amount of throughput for that 30 minutes so with that bursty workload you can you can probably accomplish it with a smaller smaller instance size and again this is our burst bucket model where we're constantly refilling so that you will get that 30 minutes over the period of a day now when do you want to raid I hear a lot of questions what if you need more than 16 terabytes of storage more than a thousand megabytes of throughput now more than 64,000 ions that's when you want a raid and you want to do a raid zero here one thing that I do recommend against is avoiding is doing raid for redundancy purposes so EBS is already replicated through our distributed system and if you use raid 1 or raid 5 or 6 or any sort of erasure coding you're going to reduce the amount of EBS optimized bandwidth available to your workload all right a little bit of Claude watch helped us out earlier this year they announced metric mouths and so one of the things that we've been able to do with that is advise you can actually take a look at all the volumes that are attached to your instance aggregate them together and provide an instance view and so here I did a 10-10 disc grade and I created two metrics to show me both eye ops and throughput across all those volumes in one simple graph and so I can see that in this test that I was doing I was able to get that full 1.76 gigabytes per second of throughput which is the maximum that any ec2 instance can deliver it to EBS all right so in summary select the right volume for your workload if you don't know start with GP to use elastic volumes to change it you can do that up to four times a day select the right instance for your workload so make sure you've got the right size the right compute ratio the right memory ratio take snapshots tag snapshots use our tools to automate that for you and then use encryption whether you need it or not thank you [Applause]

Info

Channel: Amazon Web Services

Views: 12,227

Rating: 4.6326532 out of 5

Keywords: re:Invent 2018, Amazon, AWS re:Invent, Storage, STG310-R1

Id: BuJa6Vl8cn8

Channel Id: undefined

Length: 55min 30sec (3330 seconds)

Published: Fri Nov 30 2018