AWS re:Invent 2017: Deep Dive on Amazon Elastic File System (Amazon EFS) (STG307)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
alright let's get started welcome everyone thanks for joining this deep dive on Amazon EFS my name is Edward name and I lead the product team for EFS and joining me is Darrel Osborne who is a Solutions Architect at AWS and he leads solutions architecture for EFS I'm really excited to be here we have lots of content planned for this session and we have a new feature that we released last week that we're gonna give you some details on and do a little demo on as well so excited about the session and let's get started the session is structured around the four phases people people commonly go through when they're choosing and adopting a storage solution so what are those phases well the first phase is choosing the right storage solution so what actually makes sense for your workload and for your application the second is testing that solution both functionally and for performance and potentially optimizing your application to take advantage of the performance attributes for the the storage solution that you're looking at the third phase is ingest and that's really moving data moving your application data your production data into the onto the solution into the service and then finally there's running it so operating it in production and we'll walk through each of these phases during this session and let's let's dive right in so phase 1 choosing the right storage solution so there are three main things but people think about when they're choosing a storage solution so the first is what storage type should I choose and different types of storage offer different interfaces different semantics different permissions models different features so really choosing what type of storage makes sense for the application that you're your building or the application that you're going to be running the second is features and performance so what are the what are the the different aspects of the solution that are important for your workload and how do the solutions that you're considering meet those meet those needs and then finally there's economics so what's what's the TCO how much does it cost to operate this let's talk about each of those in turn so generally there are three types of storage there's file storage block storage and object storage and what's file storage if data is stored as files hopefully don't have to define what a file is and there's a directory hierarchy that all these files are part of and there's particular metadata associated with each file so in other words your data is stored natively in a file system and when we talk about file in the enterprise context and we talked about at AWS we're really talking about shared file so data that can be accessed by multiple servers concurrently doing some sort of workload where you need to share the data so you can think of it almost as network attached storage which is a concept that's that's common in the on-prem world and then we have block storage where you have a disk or you have a set of disks attached to a single computer and data stored in chunks called blocks and you can think of that as local attached storage or local attached disks and then finally there's object storage where your data is stored in a container called an object and each object is identified by a unique key and your objects are stored in a flat hierarchy so in other words there's not a a or sorry in a flat structure so there's not a hierarchical structure behind it and object storage provides a super simple API for putting data and getting data and commonly that API is called over the Internet so why is file storage so popular three reasons so the first is it works natively with operating systems operating systems are designed around file storage file systems are the abstraction that's used for displaying data to end users and operating systems provide api's that allow you to work with file data from your applications secondly provide shared access will providing consistency guarantees and locking functionality so that's super useful both of those are super useful for distributed applications where you have multiple readers multiple writers that are interacting with the same set of data and you want to understand what kind of consistency you can get and you want guarantees around consistency and then finally provides a hierarchical namespace which is just a helpful and natural way to organize data and then another aspect to think about when you're choosing the type of storage is what is the performance that you get with each and on this graph on the x-axis we're showing latency on the y-axis throughput and you see the block storage is the lowest latency of the three and this is a general statement about these types of storage so generally lowest latency is blocked but it doesn't scale to the throughput that you would get from file storage or object storage and so block is very commonly used for database applications or boot volumes where you just need access from a single server and you want the lowest possible latencies because both of those things are latency sensitive use cases and then file generally is designed for consistent low latency because that's important for a lot of applications but it's also designed to scale to tens of gigabytes per second of throughput so it's it actually has a good mix of kind of latency and throughput good balance of those two and then object provides the highest level of throughput scale but it's not designed for latency sensitive applications and so you're all here because I assume you're at least thinking about file storage for one or more of your workloads we launched EFS about a year and a half ago and before EFS if you wanted to run file storage shared file on AWS you basically needed to do it yourself and so what did that setup typically look like well typically you would have a mirrored set up across at least two AZ's where you'd have a primary AZ that has really the primary workload and then you have a backup that's there for availability reasons and for data durability reasons and then in each of these mirrored AZ's you would have a file server like an NFS server that's running on an ec2 instance you'd have storage volumes that are actually storing the data typically EBS volumes and then you would have traffic that's moving across the EZ boundaries to replicate to your Europe your backup and then in the backup you'd have the file servers and the volumes as well so you had to manage all of this stuff you had to operate the file servers and disks you had to patch the file servers you have to be responsible for replicating the data you had to manage failover so this was the before and now we have Amazon EFS which is our native fully managed file service and when we designed EFS our core tenets were to make it simple elastic and scalable on top of foundation of high availability and high durability so what makes it so simple well it's fully managed so there's no hardware to manage no network to think about no file layer to manage and it's so simple that you can create a scalable file system a file system that can grow to a petabyte scale in a matter of seconds and it also has very simple pricing you simply pay for the number of bytes or for the volume of bytes that are being stored and there's no fees beyond that and we'll get them to a little more detail on the pricing in terms of elasticity file systems grow and shrink automatically as you add and remove data and there's no need to provision capacity or performance and as I mentioned you're paying only for what you actually use and then EFS is designed to be super scalable file systems can grow to petabytes of capacity throughput scales automatically as file systems grow and you get even as a file system scales you get consistent low latencies and in terms of scale EFS also is designed to support thousands of concurrent connections so you can have thousands of servers connected to the same file system and then it's designed to be highly available and durable so every file system object is redundant leased or across multiple availability zones and it's it's a strong form of consistency and what that allows one thing that allows is if an AZ is offline you're still able to access your full set of data from other AZ's in the region and so that makes the FS appropriate for production and tier 0 applications and we designed EFS to serve the vast majority of file workloads and really covering a wide spectrum of performance needs so from big data applications that are massively parallel eyes and require the highest possible throughput where you might have tens hundreds thousands of ec2 instances accessing the same set of data each one driving as much throughput as possible to single threaded latency sensitive workloads so we really wanted to cover the spectrum and this is this is probably my favorite slide in the presentation because we launched DFS about 18 months ago and it's just so exciting to see how customers are using EFS today for a diverse spectrum of workloads and I won't go through all these logos but I'll talk about a couple of the the logos on the slide so first database backups you have companies like JD Power and Cisco and what they're doing is they're backing up databases so they have databases that are on EBS volumes and I want to back them up and they love how EFS is completely elastic so they don't need to to worry about running out of backup space they don't need to worry about provisioning volumes and dealing with that management side of it they can just throw data on there throw their backups on there and it just grows and as they delete the backups at shrinks meeting entertainment workflows EFS has become really popular for things like transcoding and processing and you have companies like Zynga that are using EFS with their online games for enterprise applications you have Atlassian using it for jira storage they use it for their own internal JIRA deployment and they also recommend that customers that want to deploy JIRA on AWS do it on top of EFS we have Cardinal health which is a global healthcare services company and they use EFS with IBM InfoSphere in terms of development tools GES using EFS for version control software and Zend is using it for PHP applications we have adobe using development environments with EFS as well and then home directories companies like Coursera it's an online education company container storage we have Netflix we have here there's actually an interesting presentation that we did yesterday it's it's online you can see the video we're here actually talked about a different use case where they actually are using EFS to store development environment artifacts and they're storing 1.2 million artifacts on EFS that are accessed by a thousand users so it's an it's an interesting one if you're curious about that use case then web serving and content management Thomson Reuters is doing a session later this week they're talking about how they use the efest for their web sites they serve 220 million views per month and FINRA is an organization that's using it for web sites as well digital real AB is also here this week they're talking about EFS for powering their vast media archive and then we have big data and analytics workloads so Monsanto they have a video online about how they use EFS to power a large geospatial analytics platform toast which is a point of sales a company the targets the restaurant industry uses EFS as a low cost replacement for HDFS and then costura they do retail marketing analytics and they're using EFS for data ingestion and analytics platforms it's a really a very wide variety of use cases which is really exciting so one aspect of features that you should think about when you're considering a storage solution is what is a security model and what security features do you get so with EFS you control network traffic using Amazon EPC security groups and network ackles so you can control what instances within your V PC have access to which file systems using those mechanisms you can control file and directory access using standard POSIX permissions you can control administrative access or the ability to create file systems delete file systems using I am and in August we announced the ability to encrypt your data at rest using keys managed in AWS kms or key management service and you can access EFS from various environments so of course you can access it from ec2 instances in your V PC you can access it from servers in your corporate data center over direct connect connections and we'll talk a little bit about that a little later in the session and we announced this morning actually that you can now access EFS from your VMware cloud on AWS software-defined data centers so you have full access to EF resources from from those environments and about about the economics so as I mentioned you pay one flat rate for EFS and in u.s. regions that's 30 cents per gigabyte per month and that's based on just the amount of data that you're storing so again it's not a provision model it's a fully elastic model so one question people often ask me is how do you think about that 30 cents is that a good deal or is like how do you think about it so one way to think about it is to go back to what I was talking about earlier in the presentation which is well if you did this on your own what would it what would the cost be and so if you did this on your own this is this is what we looked at before you have a mirrored setup and two ways and you have your file servers your storage volumes so you have you're paying for ec2 instances that are powering your file servers and you're paying for one and your primary is the one in your backup easy you're paying for EBS volume costs and again you're paying for two sets of those and you're paying for inter easy data transfer costs this is the replication traffic that's going between AZ's and so let's look at some specific numbers with that example so for example if you're storing 500 gigabytes of data on EFS your cost would be one hundred and fifty dollars a month so 500 times 30 cents if you're doing it yourself you'd probably provision around 600 gigs of EBS because you're not gonna provision exactly as much as you need you know completely fill your volume so your storage costs for EBS would be $120 per month your server costs your ec2 instance costs using two m4x large instances would be 290 a month and just the inter easy traffic for a pretty standard type of workload would be hundred $30 a month that comes to 540 dollars a month so it's one way to think about EFS is TCO and to put it into context and EFS is available in six regions today and we're actively working to get it in many many more regions coming soon so and if that's is part of a broader set of of data building blocks within AWS so we have our core storage services we have EFS for file our block storages EBS object storage is s3 and our archive storage is glacier which is also considered object storage and those are supported by a whole bunch of data movement tools and data movement services and data security and management tools and services and what we announced last week that I alluded to earlier was a feature called EFS file sync and what EFS file sync allows you to do is to easily and quickly move data in your move data to your EFS file systems from existing on Prem or on cloud file systems and it does it around five times faster than traditional Linux copy tools it does it in a highly secure way and it does it fast and it's super simple so Darryl is gonna talk about it a little later in this presentation give some more details and actually do a walk-through of that so let's go to the next phase so testing and optimizing so hopefully at this point you've made your selection of the type of storage you want hopefully you've selected EFS and now what do you think about at this point so when testing and optimizing several things to think about so the first is understanding what your architecture is going to be so what is it going to look like deployed the second is conducting functional testing and a third is conducting performance testing and if you like optimizing your application to take advantage of EFSA's performance attributes so in terms of architecture and kind of deployment model what it looks like on EFS so here's what a DFS setup looks like what you do when you create a file system is you create endpoints in your V PC we call these endpoints mount targets and the an endpoint exposes an IP address and a dns name that you use in your mount command from your Linux instances to mount the filesystem and so what you can have is you have these mount targets and of your AZ's within your VPC you can have your NFS clients your ec2 instances spread across a ZZZ within that VPC and all of them are accessing all of them have the EFS filesystem mounted and are accessing them concurrently and you manage file systems through the EFS API there's actually not a whole lot to do in terms of management there's not a lot of levers to pull or dials to turn that was part of our design so there's things like creating a file system creating and managing these mount targets or these endpoints deleting a file system so all that's exposed to the API and of course to the CLI the SDK the console and then for functionally testing the good news is that things should just work EFS is standards compliant POSIX compliant NFS 4.0 and 4.1 compliant and most Enterprise Linux applications are designed to work with those standards but of course we should always test and use your actual application use your actual workload but not a whole lot of headaches on the functional testing front so I'm going to turn over to Darryl now to talk about performance and dive deep on that right thanks ed so there's two different performance modes available to Amazon EFS file systems when you create them the first is general-purpose this is really the default performance mode and it's really ideal for most applications out there it has the lowest latency of either performance mode that we provide and it's really the best choice for most applications and workloads the other performance mode is max IO and that's really designed for large-scale scale out architectures and workloads where you need to scale out to you know hundreds or even thousands of instances virtually an unlimited amount of storage and throughput and I ops is available using max il file systems but there is a trade-off between the two so with GP our general purpose mode filesystems there is a sort of a limit of 7,000 file system operations per second so if your application requires low latency access and can live within a 7000 file system operations per second model general purpose mode file systems is really ideal for you if you need to scale out to hundreds and thousands of instances where your needs are going to exceed 7000 file system operations per second then you may look at max i/o now they trade-off here though is with max I oh there's a slightly higher file system operations per second latency so you need to really again as they'd mentioned test out your application and see which file system works best for you if you do select a general-purpose mode file system we do expose a CloudWatch metric percent IO limit that will let you know how close you are to that 7000 file system operations per second as a percentage so you can monitor this you can see where your your workload sits within this really within this threshold and see if you need to you know maintain and stay with the general-purpose mode file system or possibly migrate and move over to and move your data over to a max IO file system so EFS is a distributed data storage design a file system is really stored across an unconstrained number of storage servers on multiple storage servers across multiple availability zones within the region that you create your file system it really avoids you know you avoid some of the bottlenecks that are typical with you know a network attached storage device just because of the the nature of this distributed design as a part of this you also benefit from high availability and high durability because again your data is strongly consistent across availability zones so often I get asked well what's the difference between EFS and EBS so I'm not going to go through this entire chart but I just want to highlight a couple of areas one the throughput scale as Edie mentioned earlier we're able to get gigabytes and gigabytes per second throughput with a file system like Amazon EFS based on the previous slide that distributed data storage design were able to continue to scale out and achieve very high levels of throughput and ions also when it comes to access EFS can be mounted to one or thousands of NFS clients ec2 instances or even servers on Prem over say a Direct Connect connection while with EBS you can mount a single EBS volume to a single ec2 instance so keep in mind though that due to the per operation latency of a distributed storage design like EFS overall throughput is generally increases as the average IO size increases since the overhead is amortized over a larger amount of data so by IO size here I mean that the amount of data you're reading and writing in a single operation EFSA's is designed to process high volumes of concurrent operations on a file system you can send multiple operations in parallel to EF s using multiple threads and multiple instances so in this graph we held io size constant but we had ten instances and we increased the number of threads per instance isn't and as you can see as we increase the number of threads our eye ops also increased so again this lends itself to this distributed data design of VF s as we can continue to scale out to larger and higher levels of throughput and ions so there are a number of mount options that you can use to to mount NFS systems we do recommend sort of the default mount options when mounting EFS to an ec2 instance we recommend Linux kernel four and above and we also recommend using NFS client 4.1 one of the options that we do recommend is really the the one megabyte read and write buffer size but that's really one of the defaults and all of these mount options are available you can read about them in our documentation so because filesystem workloads are typically spiky in nature driving high levels of throughput for a short period of time and then lower levels of throughput for the remaining of the time we built in a burst model within EFS throughput is based on the size of the filesystem the baseline throughput is 50 megabytes per second per terabyte of data stored and that is the throughput that you should you know continuously achieve now you're able to burst above that up to a hundred megabytes per second per terabyte of data stored when your bursting above your baseline so now let's talk about and just how do we ingest data into an EFS file system so what do you think about first you want to think about where's it coming from what is that source data set is it coming from maybe on Prem is it coming from within the cloud these are some of the things we want to consider then also we want to be able to transfer it as efficiently as possible so what are some of the tools we can use in order to make that to make that transfer so generally we can categorize where it's coming from in these four categories one corporate data center typically you know if you have data on Prem you're going to be transferring this data from a corporate data center another really that the second category would be within the cloud from an EBS volume third could be other cloud storage services maybe you know a third party appliance or instance store something within the cloud we also have object store Amazon s3 so when you're when you want to ingest data from on-prem you can mount a TFS filesystem over a Direct Connect connection so what that means is you can have a server sitting in your on-prem data center with the direct connect and you can issue a mount command to an EFS filesystem and mount that filesystem on that on that linux server so it allows you to then ingest data using the standard Linux copy tools that you're used to another way is using a third party VPN solution basically setting up an example an IPSec tunnel between your on-prem data center and your V PC and then transferring your data over that within that IPSec tunnel again mounting your filesystem form on pet on Prem using IP addresses of the mount target within EFS within your V PC and then again using your standard Linux copy tools to go ahead and copy that data into Amazon ifs so you may think that you know these types of connections are just for migrating data into EFS besides migrating an entire data set there's also conditions that allow you to possibly burst into the cloud temporarily you know moving a data set temporarily to EFS then accessing the data from applications running within or on ec2 instances once your application is done processing that data you can then easily move that data from the cloud back into your on-prem data center this is one example you can also use it as a backup target for backup options so basically using EFS as a way to backup your data using native file system commands so it's no longer on Prem and it's securely stored in the cloud on a highly available highly durable file system so once you've identified the source data set and you've established connectivity between your source and EFS this could be through direct connect this could be through VPN this could be maybe just within the cloud itself may be from an EBS instance you need to look at what tools are most efficient to transfer that data over into EFS so I think most of us are familiar with our sink our think is a very common tool in in order to transfer files from the source to a target one thing that I found I've done a lot of testing over the last year and one thing I found with our sink is when transferring data and to EFS it's not the best tool our sink is single threaded and remember we've been talking today about this distributed data design single threaded application really doesn't take advantage of this design IFS our sink is also very chatty now I know there are options to try to reduce that but it is still very child chatty and during many of my tests I've discovered that you are sink really isn't the best tool to use it really has though the poorest performance of some of the tools that I evaluated CPR the typical Linux copy command actually has a little better performance in our sink it's not as chatty but again we also know what is CP CP is also single threaded so again you not taking advantage of this distributed data storage design it's only when we start looking at multi-threaded tools are we really able to to get higher levels of throughput and higher levels of ions to EFS one example is FP sink FB sink is a tool that's included with an open source utility called F part it is a multi-threaded arsing so if you're familiar with our sink and the commands you can use FB sink very easily but you can get much higher throughput using FP sink then you can our sink another tool that I've used is MCP MCP is a drop-in replacement of this standard CP command or utility it has developed from NASA some of the engineers over at NASA developed this and it is open source so you can go ahead and download it and get access to it today but I've really found the the highest level of throughput using these tools when using standard CP command and a utility called new parallel new parallel is a shell tool that allows you to you know run a command in parallel really allowing you to take a serial operation and running it using multiple threads so by using new parallel and CP we get substantial better performance over the standard CP command we also if you use CP io + F part and new parallel you can get even higher levels of throughput so what F part does there we'll actually index the files that you want to copy it will then take that index pass it into CPI oh then using new parallel multiple threads we'll be able to basically divide that entire data set up into ranges and each range is going to be given a certain thread and then we're able to really get high levels of throughput into an EFS file system so what does that look like in some of my tests running on a c4 to excel I had 5 gigabytes of data 5000 files so averaging around 1 Meg each but some of my files were you know tens of kilobytes in size and some of them were multiple gigs or multiple megabytes in size so I ran a test I transferred using multiple tools these file at these six tools and these are the results that I got so you can see really going from say eleven point seven megabytes per second using rsync the exact same data set the exact same file system on the same instance all the way up to 93 megabytes per second when using F part CPI oh and new parallel so as Edie mentioned last week we introduced a great a great new utility called EFS file sync so EFS file sync is a simple fully managed solution to transfer files into Amazon EFS using a secure high-performance parallel data transfer the FS file sync provides encryption of the cusp from your IT environment to AWS and we see up to really five times the throughput of typical of standard Linux copy tools so again this is a very simple fast secure method of efficiently migrating data from really source to destination destination being a TFS file system now there are number different use cases so you can transfer it from on-prem to EFS you can also transfer it from one EFS file system to another EFS file system may be from one region to another or from one account to another so a lot of different use cases you can use you can use even transfer it from other in cloud solutions to EFS so another we have a number of different tools or recommendations to transfer tools and transfer data into EFS when do you want to use the tool so with when you're within your own DPC and you need to transfer data one option is using the FS file sync another option you can do it yourself install these F parts and other tools new parallel and now new parallel we did include in the Amazon Linux repo so now if you're running Amazon Linux you can do a very easy you know yum install parallel and you'll have parallel installed on your Amazon Linux so it's a great way to get up and running very quickly if you're outside of your V PC we really recommend using EFS file sync because it really allows for a very simple setup no more logging on to you know SSH into instances trying to figure it out at installing tools it's all done through the console you setup your source your target and you basically sync the data so let's go ahead and take a look at what this looks like so what are you going to see here is recording that we did on a typical EFS file sync so you're going to see that we're going to actually install an EFS file sync agent in this example I don't have my own data center so I can't run it on my data center so I'm actually using AWS so I'm going to be transferring data from an EFS file system to another EFS file system so what I do is I actually install the the Aged and it's you're actually running an ami and we'll go ahead and launch that within hey WS once we get that we will then activate it through the service and then we'll set up the sync set so it goes rather quickly so let's go ahead and actually get started let's go back and as we can see here we're going to go ahead and select ec2 we're going to launch this within a V PC very quick it's already set up now that that instance is is up and running what we want to do is grab the external ID public IP address and we want to put it into the configuration so that the we're able to connect to the agent once we do that we want to activate that agent we give it a name and it's activated now that we have that AG that agent activated what we want to do is we want to create a sync set so we identify the source this is going to be from one the FS file system and then we identify the destination we give this sync set a name we create it once the sync set is created what we want to do is we want to start it once we started that's going to be now pulling data from the source and moving and copying it over to the destination as you can see here the status is starting here in a few seconds we're going to see this start to increase I sped it up because it does take a few minutes for the sync set actually be created so I sped through that in this video just for 4/4 time now that the status has changed we're actually copying that data in real time we can see that so far we've already moved you know or copied 1.7 gigabytes we'll see this in real time continuing you to to increase then at the end we're going to see that we actually copied 21.5 gigabytes of data from one file system to another file system in about 4 minutes and 49 seconds so we were roughly 91 megabytes per second I think was the the throughput that we received through this again very easy to set up very secure this is just one example of how you would use EFS file sync so we've already looked at when we transfer data from on Prem or within a V PC now what about from Amazon s3 so definitely we can move objects from Amazon s3 over to EFS by just using an NFS client that has access to that bucket very simple very easy one thing you want to do though is you want to take advantage of parallelism when you when you're doing this so while the s3 CP command is by default you know multi-threaded if you actually use new parallel and s3 CP combined you will actually get higher levels of throughput I've seen in some of my examples 30 to 50 percent higher throughput using new parallel and s3 CP than just using s3 P by itself now in the run phase I'm gonna go ahead and turn this over to you sure Chad alright so thanks Darrell so when you're running your file system there's not a whole lot that you need to think about a worry about because it is a fully managed service what we do provide our couple things though there that are important so first is file system metrics you can view cloud watch metrics on file system performance so you can take a look at the type of the level of throughput that you're driving you can take a look at the operations per second that you're getting you can take a look at how what level of performance you're getting in your general-purpose mode and whether you should use general-purpose mode or max IO mode as a result so a bunch of useful metrics to get a feel for how the file system is running we also integrate with Cloud trail so all of your API calls can be logged to Cloud trail if you enable that and on the other side there's also performing backups and that's a pretty straightforward process using some standard tools and using some white papers that we have available so these are some of the cloud watch metrics that was referring to and the cloud walk trail API access logs that that you have you can enable as well and again that's for your API calls and you want to talk about yeah let me let me talk about some of the reference architectures that we've created for some very common applications out there so WordPress Drupal and Magento we have reference architectures that takes advantage of Amazon TFS as the file storage for these for these environments now while you can run some of these in a single server configuration it is highly recommended if you need to scale out to actually move the file system data say your WordPress installation directory off of the instance itself and move it over to an amazon EFS file system so that really allows us to have a state really a stateless web tier you know there's no data being stored on those on those web instances and those web servers so it allows us to take advantage of say auto scaling groups where you can now elastically scale your web tier up and down without worrying about am I losing data we also have to remember that you know session information so where does session information for web WordPress where is that stored and cookies so we know that that's already off the the web tier so it really lends itself to moving that file system over to EFS moving the database data over to say an amazon arora database instance or cluster allows you to then take advantage of this you know stateless web tier where it can automatically scale out and scale back in when you need to we have that design for both WordPress for Drupal and for Magento these are available on our github repos where you can download it's a really a nested CloudFormation template where you can select the different parameters and deploy say an elastic cache cluster sitting in front of your database tier or you know a cloud front distribution that you can select the different origins and where you want to serve that up so that some of your data is actually sitting in some of our 100 plus edge locations around the globe we also a few months ago launched a backup solution this is an EFS - EFS backup solution that allows you to really automate the backup process of an EFS filesystem we offer a easy to deploy you know CloudFormation template to get this environment up and running so you are able to backup your your filesystem and then have again native access to the filesystem because it's just sitting in a different - EFS filesystem alright I'll go ahead and turn it back to Ed to wrap up our session here so a couple things we we looked at some reference architectures those are available online along with feature blogs white papers there's a TCO calculator a bunch of 10-minute tutorials documentation so a lot of really good resources available online just go to amazon.com slash EFS and I'm happy to announce that we have a new digital training course or actually a series of classes on storage that's available starting now there's deep dives on s3 EFS EBS storage hybrid scenarios you can use these sessions of course to learn but also to make progress towards AWS certifications as well there's also lots going on at reinvent we have hands-on labs we have an EFS spotlight lab on Thursday at 3 p.m. at the Venetian we have storage expert CFS experts at the AWS booth in the expo all week and we have a bunch of sessions companies like Sirius XM here Thompson Celgene are all talking about their EFS use cases its use cases like web serving middleware applications hybrid architectures using EFS and dev environment so good mix of different use cases you can hear from the customers directly about their experience with the FS and best practices and I know there's a bunch of sessions you all probably want to go to across reinvent so ones that you can't attend these ZFS sessions they're all going to be available by video within a couple of days of the actual session happening so ed one sorry what one more thing I wanted to mention that I forgot talking about our WordPress architecture when you're using EFS as the file system basically when you've copied your WordPress installation directory over to EFS you want to install Opa Opa cache is a bit caching bit code caching mechanism you can install on the WordPress servers that really allows you to cache some of those PHP files you know that with PHP scripts it is executed and compiled each time if you're able to cache that on the instance you will have actually have a better experience and we highly recommend that that is included as a part of the reference architecture and it's actually automatically installed in the cloud formation templates and we have a Status page per instance to see to make sure that one it is installed correctly and to that it's actually enabled and what your you know your cache hits and your cache misses misses look like for instance okay all right great so we we have some time for a QA I think there's I think there's some mics yeah looks like there's some mics over there so yeah so make your way to one of the mics in the aisles so that we can hear can I go yes please got two questions about futures number one do you plan on supporting Kerberos mount options in the future and number two do you plan on adding snapshots as a feature so we don't generally talk about roadmap those are both features that customers have asked us for and we're honestly bouncing across a number of requests so that's all I I can real say four for backups I would say you know take a look at the the CloudFormation templates the white paper that Darrel was was referring to it does make it pretty simple so hopefully that helps Thanks you can also use native it is a file system so you can use native backup solutions and applications you run today with the FS let's go to the mic over here is there any way to back it up to s3 yeah yeah so you know there are a number of third-party backup solutions today that backup natively to Amazon s3 so yes you can using third-party solutions okay AWS sync doesn't do it it does not know so when you will be supporting an encryption in transit so again we we don't we we generally don't comment on on roadmap today we support encryption of data at rest encryption in transit is is something that we're lowering along with others other features are is it in your roadmap or it's not that at all is it sorry is it in your roadmap or it's not there it's something that we're looking at that's what I can say okay thanks thanks yes please so um SMB any support for that can you answer in a general term we know that windows support is important so I will leave it at that my developers won't be happy with that you have scheduled for Tokyo region welp so we're in six regions now we're not in Tokyo I can't I can't again I know these are a lot of questions about when and and what but a lot of these features are things that we are looking at where we're hearing the feedback so I can't comment specifically on that but we do plan to be in almost all regions as quickly as possible thank you Thanks yes please so we do a lot of VPC peering and mount endpoint and the way it is implemented in EFS is very different from the way RDS are accessible from our on-prem we have Direct Connect and all other communication technologies so the issue is that the round robin and we have to mount using IP addresses at times and what is the correct way of doing this type of implementation it does it support BBC peirong really or not BBC pairing does not work with with EFS I'd love to maybe we can come off fly we can talk a little bit more about the specifics of what you're trying to do so we can we can talk about it but we don't support VPC peering but there might be other things that you could do you need to get a little more detail where you're trying to do so if you're if you're free afterwards please feel free to stop by yes please so um I think in the demo you sure that there is a option of setting up the file sync using the console the Erebus console is there a CLA version of it that can be used yeah so today it's available only through the console so you would have to you know launch the the agent from the console and then or at least download the ESX image from the console to your on-prem go ahead and get that set up start the the create distinct set and start it all through the account of the console we're looking at you know do we want to go to an API CLI it's something that we're we're evaluating so is it possible also do I know that from EBS we can copy to EFS but is it possible to coffee from EFS to EBS also using that oh not using that tool yep as file sync it is the destination will need to be an EFS file system it is a file sync to ingest data into EFS okay thanks please so on one of the early slides showed a couple companies that we're using this or home folders how does that work with I mean that does that work with say Windows 7 Windows 10 desktops are they those Linux environments those are all EDX environments it did currently it you can't mod from windows so Linux yeah quick one I thought I heard that we can mount s3 volumes on the Linux systems did I am I getting confused with the EFS no so Esther is is access through the s3 API EFS gatekeeper allow you to mount as a filesystem okay so the s3 Williams cannot be mounted like EFS that's correct yeah so I'm wondering about the the new feature where you allows the encryption with a key in EMS can I bring my can I have a key that actually that was created by me as opposed to service D for key in yes so we do support custom see MKS oka you know it is all run through the AWS kms service but you import a key into kms and then allow that C mk2 to be used for Amazon EFS file systems to to use to encrypt your Foca so yeah so I got that part but the second part is what's the implication when you have to do backup and sync to a different region the key might not make I'm sure actually that wouldn't be available anymore i if the the fire was synced to a different region or as a copy to a so so what you would do is you would you would if you have a file system in another region that would be encrypted with a different key so no kalila sync well your data will get decrypted okay well you copy it over it's actually automatic around different key yeah thanks sure yes please just a short question on that a throughput graph that you had where does it end so you went up through a specific number and what's the highest number of IEPs that you have been generating in May of s yeah so there's actually no upper bound so it really depends on kind of how you're distributing your your requests mom if you're distributing it across tens hundreds thousands of instances each of those is able to drive a certain amount of throughput you you there's really no cap to it there's a soft limit of three gigabytes per second but that's something that we can raise so beyond that there's there there's no limit okay that's very interesting things sure yeah EFS has certain limits in terms of how many files we can open per instance and per process so are those hard limits so we have approaches where we can work around those are hard limits okay it would be great to maybe have two words understand what will limit your running into because it'd just be good to hear that but those are hard so you're not talking about the roadmap but do you get demand for EFS access from lambda functions it is something we've we've had customers asking us for yes so I will +1 that Thanks could you explain the correlation between the 7k opps versus the throughput there's an there's number of I absolument and then there's the total true but-- do you have in terms of operations yeah i system operations versus total trooper yes so in general purpose mode the the cap is the operations per second so the level of throughput you can drive really depends on the io size that your you're driving so there's there's no real throughput limit i would say as a result but you do have this operations per second so in practical terms there will be a limit just depends on Io size that they are you're using okay so for large file systems which have a lot of metadata operations which counts against the semca ops that's your limit that's correct right okay for general-purpose mode file systems so you do have the option of creating a file system as max io and then you don't have that sort of upper boundary lets you extend beyond 7000 file system operations per second for max IO file systems I see so he's a typical trooper limit the instance type the client side well not really because you can access it from multiple instances as well mm-hmm so there will be a per instance limit they will run into but in aggregate you can have thousands of instances and so each of those can drive independently and so your aggregate throughput can just keep growing I see okay thanks sure so we use EFS as shared storage for web servers but where the typical typically the amount of storage is quite small and the only way we can get enough performance through FS is to create a load of dummy data to increase the amount of storage and therefore get better throughput I just wondered if you're they are you ever going to separate out throughput from the amount of storage so just have options of improving throughput without increasing storage yeah so it's a feature that we've heard some customers asking us for foremost workloads the amount of throughput you get is is is adequate but there are ones where you might have a very small amount of data or you know might you might just need a really high level of throughput so today the recommendation is pad your file system but we are you know we continue to get feedback about you know is there are there other options and so we're looking at those things I just want to echo his comments because we we do the same thing we can scrape the generally the dummy data to making sure how to post the code right so we reading is some solution on this one okay understood Thanks so one thing that I've done to help customers in this situation is that I have a cloud formation template that will create a file system and you can identify how much dummy data you want to put in so it's something that you don't have to sort of manage yourself you launch the the CloudFormation template identify well you can create the file system or identify an existing file system then identify how much dummy data you want to add and then it'll automatically spin up an auto scaling group DD that data into a directory and then it's done so while I we know it's not ideal it's just it's something to help you sort of overcome that that burden any other questions will also be available up here for a couple minutes if people want to people have other questions but thank you all thanks a lot [Applause]
Info
Channel: Amazon Web Services
Views: 4,868
Rating: undefined out of 5
Keywords: AWS re:Invent 2017, Amazon, Storage, STG307, AI, Security
Id: VffbHp34UzQ
Channel Id: undefined
Length: 59min 48sec (3588 seconds)
Published: Wed Nov 29 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.