AWS re:Invent 2021 - Keynote with Peter DeSantis

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
please welcome the senior vice president of utility computing at aws peter desantis [Music] [Applause] [Music] hello okay can i just say how great it is to be back i was a little skeptical of coming to vegas big crowds but the energy is so amazing it's so nice to be back here with all of you um for those of you who were not able to join us and are watching from afar uh we look forward to seeing you in the future you may have heard that we're celebrating a few anniversaries this year this is the 10th reinvent which is just amazing it's even more amazing to think that it's been 15 years since we launched our first aws services think about it 15 years what were you doing 15 years ago maybe some of you were still in elementary school or maybe you were buying your first ipod shuffle me i was in cape town south africa working with a small 10 10-person team on a thing that we call the amazon execution service or aes now even with the great three-letter acronym aes did not make it through the naming meetings and we ended up launching the elastic compute cloud when we launched ec2 the cloud wasn't yet broadly in the tech vernacular in fact most people described ec2 as a grid computing service of course the word cloud caught on pretty quickly and if you recall pretty soon everything was the cloud but the word in the name that we spent the most time talking about before we launched was actually elastic when we launched ec2 we had one instance type we didn't uh we had one region we hadn't yet exposed availability zones all the block storage was ephemeral that was a term that we used to indicate that the storage lasted as long as the instance it would have been easy to make the case that we should have held up the launch and waited on more functionality but we knew that we had one really important feature elasticity i actually spent a fair bit of time when i was preparing for this talk looking for a photo of the early team and the early office seven of the ten folks on the original ec2 team are still here at aws now they're not all in ec2 they're working on various different things but i figured one of them would have a photo no no luck so i guess we know now why it's so important to back your photos up to the cloud i did when i was looking however find an email from a couple days after launch from jeff bezos now you might assume he was asking me about the adoption of the service or the press pickup we were getting but he was asking a much more important question how's your performance and operational stability now this wasn't surprising because these were the features we had talked about that would be so important to early aws customers they're still important today security availability elasticity performance cost sustainability we don't usually talk about these things as features but in fact in many ways they're table stakes but at aws we think of these things as the key ways we can innovate to differentiate our products you need to continuously invest in these things you don't just get to check them off your to-do list and today i want to give you a deeper look at some of the big investments we're making in these areas to differentiate aws services and of course we're going to hear from a few customers about how they're using aws to innovate for their customers i want to start off with a look at storage what's so exciting about storage at aws is that we're building at a scale that's just never been done before building at the scale allows our engineers to approach problems in a way that just can't be done at smaller scale if we're going to talk about storage at scale the most sensible place for us to start is with s3 s3 was aws's first storage service actually launching about six months before ec2 not that anyone was keeping score in aggregate s3 holds more than 100 trillion objects 100 trillion that's more than 10 000 objects for every man woman and child on earth and when we launched s3 we launched with a single storage class that allowed customers to read and write objects of any size and access them quickly tens of milliseconds of latency this initial offering was unique and customers found many compelling use cases for it some customers used it to store and retrieve content some customers also found it a really convenient way to back up data other customers started putting data in s3 and using it as a foundation as they built large-scale data-intensive applications today we call these data lakes and of course as the number of use cases grew the s3 product team innovated to even better meet customers needs and one of the main ways we did this was by adding s3 storage classes which enables customers to better optimize their cost and performance of their storage on aws for customers that like the ease of use of f s3 but are primarily looking to back up their data s3 introduced s3 glacier and s3 glacier deep archive which enables customers to get a low storage cost in exchange for a higher latency retrieval for customers using s3 for archival storage this allowed them to achieve significantly lower cost while still getting numerous benefits over a traditional offline tape solution and for customers with less frequently accessed data who cannot accept those longer access time s3 introduced infrequent access and since we launched in frequent access many customers have told us that they want us to find a storage class somewhere between infrequent access and glacier an even lower cost offering for data sets like medical images which where an individual image is almost never accessed but it must be retained for a very long period of time but when you need access to that image you need it immediately yesterday we announced s3 instant retrieval which provides exactly this now today i want to look at how scale and innovation make it possible for us to offer these unique storage offerings and to understand this we're going to have to understand a bit about a pretty old piece of infrastructure the hard drive the first hard drive dates back to 1956 and hard drives have been the workhorses of data storage for about the past four decades and while ssds have replaced hard drives in a number of places in the cloud hard drives are still the king of big data hard drives are mechanical devices internally they look a lot like record players they have a spindle and a set of platters and a motor that spins the platters between five and fifteen thousand revolutions per minute they have an arm with an actuator on one end and heads on the other end and the actuator moves the arm to place the heads on the track and data is read from the hard drive the physical engineering behind these drives is incredible it's hard for us to appreciate it because everything is happening at such a small scale but if we scaled up the hard drive head to the size of a 747 and we said that the platter was the surface of the earth the airplane would be flying at hundreds of thousands of miles an hour and it would as it's flying the pilot would need to count every blade of grass as it passed so just think about how tremendous the technical engineering is but as remarkable as the mechanical engineering is the mechanical aspects of hard drives haven't improved in decades and as a result if you're doing random reads and write with a hard drive you're doing 120 operations per second that's the random access today that's the random access number 10 years ago and that's the random access number 20 years ago but that doesn't mean that hard drives haven't improved they've improved a lot drive manufacturers have been able to methodically increase the density of the magnetic coding material on the drive platters and that means that hard drive costs on a per terabyte basis continue to improve and this is why hard drives remain the best way to store large amounts of data when you need the ability to access it immediately but when we combine the fact that drives are getting denser with the fact that random access isn't improving what's happening is on a per terabyte basis hard drives are actually slowing down so this is bad news if you want to use hard drives for those use cases we were talking about a moment ago so we're going to need innovation to figure out how to use them but let's take a look at some of these workloads and the problems we would have first so i won't say this is a typical big data workload they're all a little different but a lot of them look like this where things are relatively idle most of the time and then they really go when the data is being accessed this particular bucket is 3.7 petabytes and then on the peaks when data is being accessed we're doing about 2.3 million requests per second so let's do some quick math to see how we would service this workload with hard drives and we'll start with storage a modern hard drive stores about 20 terabytes of data so quick math if all we cared about was storage we would need 185 hard drives to store this data set well that's not so bad but what about io we know that each of these hard drives gives us 120 operations per second and for the purposes of our quick math we'll assume that every read can be serviced with one uh operation there's probably a bunch of reasons why you need more than one but we'll keep it simple for this uh conversation so if we got 185 hard drives and we can do 120 operations per second we're going to get about 22 thousands reads per second at peak that's nowhere near enough to do the two million peaks we need to service this workload so we're going to need more hard drives a lot more in fact we're going to need 19 000 hard drives to run this workload and because we only need 185 of those for storage we're going to be wasting a lot of drive storage space on those hard drives of course we can have the opposite problem as well this is a much bigger bucket with 28 petabytes of data but the i o peaks here are only 8.5 requests 8.5 000 requests per second so let's do the math for this workload we're going to need a lot more hard drives for storage in fact we're going to need 1 400 hard drives to store the data but as you probably guessed those hard drives are going to provide a lot more i o than we need those hard drives are going to provide enough io to do 168 000 iops and we only need 8.5 000. so in this case we're gonna have a lot of very full but very idle hard drives so now you understand why it's so challenging to use hard drives for big data workloads but scale offers us an enticing opportunity the ability to aggregate hundreds and thousands and millions of workloads and by aggregating a massive number of workloads we get a much smoother and more predictable aggregate demand this is called workload decoration and it's a huge benefit of scale now while we can see the benefits of this at massive scale how do we make sure that we avoid creating hot spots at the individual drive level part of how we do this is a technique called erasure encoding when you erase your encode and object you first start by splitting the object into a bunch of chunks then you're going to use an algorithm that generates an additional set of parity chunks with eraser encoding you can now recreate your original object with only a subset of these chunks and you have lots of flexibility in which shards you use we started using erasure encoding to more effectively achieve the durability goals that we had for s3 because with erasure encoding you can lose hard drives or even entire availability zones and still have the object but as it turns out this is one of those situations where you can feed two birds with one scone erase sugar coating also lets us balance heat over all the hard drives in our fleet at a very fine level when customers put objects into s3 we erase your encode those objects and store the shards on a diverse set of drives and there are two big benefits to this first it means that any individual's data only occupies a very tiny amount of every drive and so no one workload can create a hotspot on any drive but second it means that any workload is able to burst to the request rate of a very large population of hard drives so how much do we spread out customer workloads well a lot in fact today we have tens of thousands of customers with s3 buckets that are running on at least a million drives so that means there's probably at least a couple million drive members here in the audience with me today and it's also how we continue to differentiate the cost and performance for all those those workloads that we talked about earlier so speaking of scale one of the things we invest in heavily at aws is how we can make it easier for hundreds of developers working on massive services to quickly move without making mistakes now let me say that again because it bears repeating if you want to innovate at scale you need to move quickly and you need to do it safely when you talk to most software developers about getting things right the first thing that comes to mind is testing but how do you test a massive distributed system like s3 sure you can test that your apis work but the challenge that afflict large-scale distributed systems like s3 are far harder to test storage services are huge and they have to run in highly efficient multi-threaded code so you need to make sure that you don't have subtle race conditions because at scale the likelihood of even the most rare things is super it's super likely to happen and you need to ensure that the system can recover correctly in the face of things like crashes and node and errors because at scale you're going to see a lot of those as well and completely testing all the possible input and output states of a system as large and complex as s3 just simply isn't possible it would take millions of years even if we used all the computing capacity in the world and as you can imagine that sort of test cycle is going to really slow down innovation so how can we do better at aws we've been using a set of techniques that are referred to as automated reasoning to help us prove our software works the way we need it to automated reasoning is an area of computer science that focuses on enabling computers to reason about problems and come to conclusions when humans do this they we call the work proving or creating a proof at aws we have one of the largest automated reasoning experts teams of experts in the world and this team is helping us transform how we approach security availability and durability one of the ways we've been using automated reasoning the longest is for proving the correctness of distributed systems algorithms these algorithms are complex and they have a bunch of corner cases so we use special languages to write exact specifications of these algorithms you can see a few examples of the languages here they look like other programming languages perhaps a bit more mathy but when we describe an algorithm in one of these tools we can use one of these languages we can use a specialized set of tools to reason about the algorithm and prove it doesn't have subtle design bugs and then we can use additional tooling to verify that our code matches the specification that we presented by doing this we can we can have high confidence that we got things right but this approach requires a lot of specialized expertise and experience so while we love the benefits of formal methods and we use them extensively in areas like encryption network protocols authorization virtualization and durability we find we are often faced with an ore what do we mean by that on one hand we can build a system with formal tools that i just described this offers the benefit of achieving a really high bar for correctness but it slows down developments significantly and it limits the number of people who have the background and skills that we need to work on the system or we can forego formal methods and use modern agile software development techniques that we all love when faced with this choice the vast majority of software systems get built without formal methods now we started this section with the need for an ant and so we've been asking ourselves how can we turn this ore into an ant to do this our teams have been using a new approach that allows us to combine formal methods with more traditional approaches these techniques are called lightweight formal methods the goal here is to maintain the agility of traditional software development while applying formal methods to critical components of the system this approach also emphasizes designing the system from the get-go so that formal methods can be applied iteratively to more parts of the system over time so how does all this work here's a place where we're taking inspiration from other engineering fields the idea of lightweight formal methods is to create a model of a system you're building right alongside the production system just like you would if you were designing a car or an airplane ideally the models written and maintained in the same language as the core system and that way developers working on the system can maintain the model right alongside the production system now the model is an exact replica of the production system except it leaves out all the complexities around scale efficiency recovery from errors and if you've built these sorts of systems you'll know that that's where these errors typically happen the power of the model is that it provides the information necessary to establish the correctness of the system's design and it also enables something called model based testing every time developers make a change to the code the model is used by a testing system to generate billions of tests that comprehensively look for errors and this does not just test for correctness it helps the developers find and fix the bugs while they're coding which means developers are actually more productive with this technique and over time the model sets the team up to invest in applying more formal verification techniques to more parts of the system so lightweight formal methods really do allow us to take that ore and turn it into an ant the s3 team recently published a paper that on how they used this approach to rebuild a system that managed the s3 hard drive fleet the system we've been talking about here this morning and as you might imagine this system is a critical system that has to perform correctly the paper was published at sosp a well-regarded conference for computer science research and if you're interested in learning more i would suggest you uh look into the paper but next up i'd like you to hear from a customer who shares our obsession for improving customer experience they've been leveraging advancements we've been making in aws storage services to unlock new possibilities for creative professionals everywhere please join me in welcoming branding brandon pulcifer adobe cloud operations [Music] thanks peter it's great to be here i'm excited to share a little bit about our journey and collaboration with aws today mission to change the world through digital experiences has never been more relevant as we seek new ways to communicate to collaborate to learn and interact virtually digital has become the primary way that we connect the way we shop the way we work and learn even the way we're entertained and at adobe we've helped pioneer and create three massive categories creativity digital documents and customer experience management and on any given day customers and people around the world interact with adobe digitally trillions of times creating collaborating and sharing content but as our customers innovate in these new and creative ways they're confronted with an age-old problem storage in a world where we're creating more and more content and we're creating it faster and faster we needed a storage class that can keep up with our needs to preserve and access both short-term and long-term data so let me take you on a journey that i think will resonate with all of us a little bit in the old days the progression was pretty simple we developed a roll of photos we threw out the blurry photos stacked up the good ones probably sort them in a shoebox eventually the shoe boxes filled up and you put that in the attic i'm reminded of this recently as we went about locating organizing and digitizing my 80 year old father's boxes of photos and as carousels of 35 millimeter slides for those who remember those days and it's a lot of work and today we have a similar challenge but at a much larger scale think for a minute about the professional even amateur photographer they shoot weddings family portraits company events homes for sale and every one of these photo shoots creates hundreds of videos an hour hundreds of photos and hours of video i lived this recently as well during the pandemic my daughter got married after a lot of rescheduling and resizing the wedding and a lot of tears our small coveted style wedding at the end of the day still generated hundreds of gigabytes of photos and hours of video at the that we all put in the cloud but our little wedding is no edge case next year it's projected that we will create one and a half trillion photos just in 2022 and that number climbs and compounds every single year so our our photos in the attic problem is similar but now it's not just one attic it's an entire neighborhood full of addicts and as our library gets larger images get bigger phones and editing get more sophisticated it's not a stretch to say we'll soon outgrow the neighborhood and we'll have an entire city's worth of addicts stuffed with content and let's face it the challenge then is how do you find and retrieve and share the content that you haven't seen for a while so historically cloud storage is focused on the two ends of the spectrum there's good high latency solutions like archival storage on one end and there's low latency quick retrieval options on the other end of the spectrum typically at a higher cost but that sweet spot in the middle was missing and we wanted to provide our creative cloud creative cloud customers with the way to store their assets and get back to anything they needed quickly no matter when they created it we needed a solution with the pricing model closer to archive storage but the car but the performance and reliability and the latency closer to your hard drive so we brought this problem to aws and we did that for a lot of reasons but the primary one is our shared obsession with customer experience you see we knew that our that our customers want to create a massive even unlimited amount of content they want to store it in the cloud and store it safely but then be able to find it quickly and share it and use it even years from now so aws engaged with us they met with our engineers they came to hamburg germany where our digital imaging team sits and together they worked and collaborated to detail what our customers needed and that leads us to today with amazon s3 glacier instant retrieval it's an aws product but it matters to adobe and it matters to our customers this fast and durable storage is more critical than ever before but this new offering allows adobe and our customers to upload anything and everything that you want and come back to it quickly but honestly it's looking forward that really excites me the ability to put more content in the cloud gives our customers a creative and a competitive advantage because it unlocks the ai and ml capabilities of adobe sensei sensei's content intelligence can look through if you want to your hundreds of gigabytes of photos and hours and hours of video and help you find what you're looking for so maybe next time we can help peter find the image that he needs for his presentation why does this all matter exactly for that reason i recently celebrated my 26th wedding anniversary and as that grew closer i spent time looking through 25 years of photos that we had trying to find events and experiences that my wife and i shared together but i spent hours scrolling even when i filtered by facial recognition it was still difficult to find the images that i had we all have thousands of photos and they're typically stored across our hard drive and our mobile devices and some are in the cloud and it took time and how much more powerful would it be if i could simply say into my app find me photos of my wife and i at the grand canyon about 10 years ago and instantly have what i'm looking for that's the power of adobe sensei it enables you to focus on what matters most creating collaborating and sharing your content and customers can now do things that were never before possible so let's come back to our problem statement the impact that our collaboration with aws has had is the ability to solve this problem glacier instant retrieval hits that sweet spot that we need as a storage class and it unlocks an entire new set of capabilities so when glacier instant retrieval is paired with adobe sensei and adobe creative cloud we eliminate the filled attic problem and we can enable and empower our customers many of you to do things that were never before never before possible you can access and use and quickly find and retrieve the content that you're looking for anytime you want and that's truly amazing thank you [Applause] [Music] thank you brandon it's great to see how adobe is taking advantage of aws storage capabilities to innovate for your customers now let's look at another type of storage that's really important for computing block storage aws offers a number of different block storage options including a variety of ec2 instances with locally attached storage we also offer elastic block storage or ebs which is a highly available off instance block store or san we still offer a few instance options with locally attached hard drives but the vast majority of block storage these days is done on a different type of storage device the ssd the ss stands for solid state and it's a direct reference to the fact that ssds use media that's not mechanical like the hard drive but is rather solid state and built from silicon chips by removing the mechanical constraints that we talked about earlier modern ssds can do about a thousand times more random iops than hard drives and while ssds remove the complexities of the mechanical aspects they introduce their own challenges flash is the storage media that's used inside of ssds as you can see here flash is an intricate 3d assembly of storage cells these storage cells can be toggled between two states by applying voltage but each state is used to store a binary bit and while you toggle them with voltage you can remove voltage and they're stable so this makes flash ideal for solid-state storage but there are challenges when we write to flash we do it at the page level a page of flash is typically thousands of cells or bits now this is pretty typical it's the same way we write to hard drives but once a page is written it can't be updated without resetting all the cells and resetting storage cells requires significantly higher voltage and this higher voltage requires more wires bigger wires and special circuitry so typically flash needs to be reset in much bigger chunks typically thousands of pages at one time and finally writing a storage cell in flash is a destructive process so you can only write it a certain number of times before it stops working permanently now you might be thinking you've been using ssds for a long time now and you haven't run into any of these complexities and that's probably true because ssds provide a sophisticated layer of abstraction called the flash translation layer or ftl and the ftl makes the flash mediate look to the system like a simple random access storage device the ftl maintains a mapping between logical addresses and physical addresses on the nand storage and it maps reads and writes to the right location transparently moving data around to maximize storage efficiency the ftl also maximizes the lifetime of the ssd through a process called wear leveling now this is no trivial set of tasks and in fact the ftl is a complicated piece of embedded software you can really think about it as a full-fledged database with some really specific optimization logic but it needs to have internal transactions to make sure that it's both consistent and low latency so it's a super complicated piece of software and here's where things get complicated at scale each flash manufacturer produces their own ssds and these ssds have their own ftl implementations and some manufacturers have multiple ftls on different ssd models and just like regular databases each of these different ftl implementations behaves a bit differently from the from the next they all provide generally the same api and they all do a good job for the average case but our experience over the years is that each one has unpredictable and idiosyncratic behaviors for example garbage collection can kick in at an unexpected time and cause i o requests to stall and these sorts of unexpected behaviors can make it really difficult when you're trying to provide consistent performance and they make it really hard to run certain workloads like databases that need consistent latency and we run millions of these workloads for ourselves and our customers so how can we get the performance consistency we need from ssds for those of you that have been to this keynote before you may recall me talking about nitro nitro is the reason that aws got started in building its own chips and it remains one of the most important reasons why ec2 provides the best performance and security in the cloud we use a specifically designed aws trip called a nitro chip to create something we call the nitro controller every ec2 server has a nitro controller the nitro controller runs all the aws code that turns that server into an ec2 instance and there are a number of benefits to this approach first by running all the aws code on the nitro controller we can dedicate all the system resources of the ec2 server to customer workloads and this provides the highest performance for customers and enables things like bare metal instances second nitro helps us secure our ec2 instances and provide unique security capabilities like not like turtle enclaves third nitro makes it easy for us to turn any type of server into an ec2 instance this is why we're able to support intel chips arm chips amd chips graviton chips even mac hardware by doing all our network and storage virtualization in the nitro controller we also reduce variability and avoid interfering with customers workloads which improves performance and these last two benefits supporting any type of hardware and improving performance sound an awful lot like our problems with ssds and so it probably won't surprise you to hear that we built a nitro ssd now here we're zoomed way in and we're looking at the nand storage flash under a couple of heat sinks but if we zoom out a little bit you'll see an annapurna nitro chip our aws ftl is implemented on that annapurna nitra chip so far we've deployed a worldwide fleet of over half a million nitro ssds built using flash media from multiple flash partners nitro ssd has enabled us to innovate on the performance and the features of ssds at the same speed that we innovate on other aws services nitro ssd is used to power the the new i4 instance families which are our latest generation i o optimized dc2 instances here you can see a graviton-based i4 server these instances the i-4s provide 60 lower average i o latency and more than 75 percent lower tail latency and that's that variability of latency that's so important to running things like databases and it's not just ec2 instances that are using nitro ssd earlier this year we launched a high performance version of ebs called block express the ebs io2 block express volumes are built using nitro ssds and offer the highest performance volume type in the cloud io2 volumes offer 256 000 iops with consistent sub millisecond latency io2 is a great option for running databases here you can see postgres sql run significantly better with io2 volumes latency is reduced by 30 and throughputs increased by 140 percent and it's not just postgres sql that runs great on io2 io2 works great with sql server as well latency is reduced by 83 and throughput increased by 400 percent this makes io2 one of the best ways to run sql server now we're really excited about the performance benefits of nitro ssd but if we're going to talk about the big investments we're making in performance for workloads like databases we have to talk about graviton we started building graviton in 2016 with a deep conviction that modern processors were not well optimized for modern workloads we talked last year about how processors had developed a number of features over the years to help different types of workloads but how these features can actually slow down modern scale out memory intensive workloads now i really love the el camino it's a really awesome vehicle but if we're being honest it's not the best passenger car nor the best pickup truck we knew that if we built a processor that was optimized specifically for modern workloads that we could dramatically improve the performance reduce the cost and increase the efficiency of the vast majority of workloads in the cloud and that's what we did with graviton a lot's happened over the last year so let me get you quickly caught up we released graviton optimized versions of our most popular aws managed services and for those of you saying as exciting as graviton graviton sounds i'm all in on serverless well we also release graviton support for fargate and lambda extending the benefits of graviton to serverless computing you can now have 34 percent better price performance with aws graviton powered by graviton audio is lambda powered by graviton there are also several new graviton based instance types these instances provide significant cost and performance benefits to a number of workloads including databases caching services high performance computing leveraging nitro ssds as we mentioned the i4s offer improved cost and performance for workloads that need high performance local ssd storage and the graviton g5g instances combined a graviton processor with an nvidia gpu for graphics acceleration and they're ideal for android game streaming but most exciting of it all is that we've had thousands of customers adopt graviton in the last year many large enterprises and born in the clouds uh natives have gone from proof of concepts to full-blown deployments customers tell us they find graviton easy to adopt and they're delighted by the performance cost and elasticity that graviton offers so where do we go from here how do we build on the success of graviton 2. you probably heard adam announced yesterday that we're previewing the graviton 3 processor graviton 3 will provide at least 25 percent improved performance for most workloads and remember graviton 2 which was released less than 18 months ago already provides the best performance for many workloads so this is another big jump so how did we accomplish that well here are the sticker stats they may look impressive they are but as i mentioned last year and this bears repeating the most important thing we're doing with graviton is staying laser focused on the performance of real workloads your workloads when you're designing a new chip it can be tempting to optimize the chip for these sticker stats like processor frequency or core count and while these things are important they're not the end goal the end goal is the performance the best performance and the lowest cost for real workloads and i'm going to show you how we do that with graviton 3. and i'm also going to show you how if you focus on these sticker stats you can actually be led astray when you look to make a processor faster the first thing that probably comes to mind is to increase the processor frequency for many years we were spoiled because each new generation of processor ran at a higher frequency than the previous generation and higher frequency means the processor runs faster and that's delightful because magically everything just runs faster the problem is when you increase frequency of a processor you need to increase the amount of power that you're sending to the chip up until about 15 years ago every new generation of silicon technology allowed transistors to be operated at lower and lower voltages this was a property called denard scaling dinard scaling made processors more power efficient and enable processor frequencies to be increased without raising the power of the overall processor but dinard's scaling has slowed down as we've approached the minimum voltage threshold of a functional transistor in silicon so now if we want to keep increasing processor frequency we need to increase the power on a chip maybe you've heard about or even tried overclocking a cpu to overclock a cpu you need to feed a lot more power into the server and that means you get a lot more waste heat so you need to find a way to cool the processor now most people don't use ice cubes they use fancy power uh use flower heatsinks but um but you get the idea while this might be a fun project if you're a gamer it's not a great idea in a data center higher power means higher cost it means more heat and it means lower efficiency so how do we increase the performance of graviton without reducing power efficiency the answer is we make the core wider a wider core is able to do more work per cycle so instead of increasing the number of cycles per second we increase the amount of work that you can do in each cycle and with graviton 3 we've increased the width of the core in a number of ways one example is we've increased the number of instructions that each core can work on concurrently from five to eight instructions per cycle this is called instruction execution parallelism let's look at how this translates to performance how well each application is going to do with this additional core width is going to vary and it's dependent on really clever compilation but our testing tells us that most workloads will see at least 25 faster performance and some workloads like nginx are seeing even more performance improvement higher instruction execution parallelism is not the only way to increase performance by making things wider you can also increase the width of the data that you're processing a great example of this is doing vector operations graviton 3 doubles the size of the vectors that can be operated on in a single cycle with one of these vector operations and this will have significant impact on workloads like video encoding and encryption okay we talked about how optimizing processor frequency and increasing it is one trap when you're focused on sticker stats and how width is actually a better solution now let's look at another potential trap adding too many cores to a processor adding more cores is an effective way of improving processor performance and generally you want as many cores as you can fit but you need to be careful here as well because there are trade-offs that impact the application the performance of real workloads when we looked closely at real workloads running on graviton 2 what we saw is that most workloads could actually run more efficiently if they had more memory bandwidth and lower latency access to memory now that isn't surprising modern cloud workloads are using more memory and becoming more more sensitive to memory latency so rather than using our extra transistors to pack more cores onto graviton 3 we decided to use our transistors to improve memory performance you can see here graviton 2 instances already had a lot of memory bandwidth per vcpu but we decided to add even more to graviton 3. each graviton 3 core has 50 percent more memory than graviton 2. to enable this the c7 g powered by graviton 3 is the first cloud instance to support the new ddr5 memory standard which also improves memory performance now this is all great but what really matters is how these innovations come together to help customers get better performance on their workloads we have some really promising early data from the customers that we've been working with in the early beta on workloads as diverse as web applications to high performance computing customers are reporting significant performance improvements okay now let's hear from a customer with a unique role in the financial services industry that's utilizing the breadth of aws computing services to tackle one of the most intractable problems in the housing market please welcome kimberly johnson executive vice president and chief operating officer frannie may [Music] thank you peter and good afternoon now my guess is that everyone here has heard of fannie mae in fact odds are that hundreds if not thousands of you actually live in a house or apartment that's financed with a loan backed by fannie mae but you probably don't know exactly what we do so i'll start with a few facts fannie mae is one of the largest financial institutions in the world we have a balance sheet of four trillion dollars in fact last year one of every four single-family homes in the us was purchased or refinanced with fannie mae now we're also one of the largest providers of financing in the multi-family rental market so how does it work well family doesn't operate we're in the secondary mortgage market we don't make loans directly to consumers rather we buy loans from lenders we package them into securities and we sell them off to investors so today we're working to solve the biggest challenges in housing we're always looking for ways to make it the whole mortgage process simple simpler safer less expensive now the mortgage industry has been rapidly evolving from days of paper and fax machines and now we're we're moving to a faster easier a more digital process that our consumers have come to expect we're also really committed to advancing greater equity in housing we want a housing finance system where all people including those of modest means have quality affordable housing options so to tackle these challenges we really have one tool smart risk management no risk management isn't part of our business it is our business to manage risk effectively we need to understand the creditworthiness of borrowers we need to understand the value of our properties and we need to do it all continuously for millions and millions of loans so we need to understand home price dynamics macroeconomic trends we need to process large volumes of data and we need to include really sensitive personal information we need to do it efficiently and securely so to do all this we rely on technology providers like aws several years ago for example we built our serverless high-performance computing workload with aws lambda to run monarch harlow simulations on 20 million mortgages now most recently we completed a proof of concept running amazon rds on graviton 2. i have to tell you the early results look great we're seeing performance improvements of 54 and cost improvements of 11 percent so working with aws has helped us make a big difference in two more areas that i want to highlight first let's talk for a minute about coveted housing do you remember how scary the economic picture was at the start of the pandemic in just two months time from february to april of 2020 unemployment swelled by 25 million jobs 25 million we had no historical reference point to suggest how borrowers or mortgages might perform under conditions like that so we needed more information we needed to discover new data sources we needed to understand them we need to develop solutions for troubled homeowners so in the past it would have taken us months to do this or maybe even a year but we had to move much faster than that so thankfully we were already using kinesis our streaming data platform that allowed us to ingest new data in real time now tools like amazon s3 took storage constraints right out of the equation sagemaker gave us the analytics the insights we needed and together that all allowed us to quickly roll out new solutions for homeowners so in the end the results were nothing short of amazing we provided forbearance plans on 1.4 million single-family loans and to date 1.1 million of these loans have exited forbearance successfully that helps put homeowners back on their feet so data analytics are also helping us responsibly expand access to credit for historically underserved populations here's a really good example now credit history is a key element in qualifying for a mortgage now most ways to establish credit or basic things like student loans or credit cards maybe even having a parental cosigner but people of color are statistically much less likely to use these forms of credit so notice what wasn't on the list rent payments well it seems kind of obvious that if someone could make a regular timely rent payment they could also make a really similar mortgage payment but credit reporting agencies they don't take that into account so for more than 20 years we've been perfecting our automated underwriting engine it's called desktop underwriter du it uses technology and data and analytics and it helps us understand whether our loan application actually meets our eligibility requirements so we asked ourselves could du actually look at a sea of cash flow data and identify timely rent payments so it's a little trickier than it sounds there are so many ways to pay your rent you could pay by check by zell you can make an electronic transfer you could pay your roommate who could pay the landlord or maybe you even take your monthly payment and you break it down to smaller installments so we divided this challenge into two parts first we had to leverage the new data source the the loan applicant's bank statements and we relied on aws tools like s3 and redshift to store all that new data now we're also turning unstructured data into structured data through tools like amazon elastic mapreduce now second we had to take these new data sources and make them usable for production so we use machine learning to develop some algorithms and they read the bank statement they go through and identify the rental payments we use amazon stage maker to create features combinations of these data elements that we can feed into our underwriting system and so in september we actually began using rental payments as part of our underwriting system and now thousands of people who would have been denied before will actually become homeowners based on leveraging the power of a single data element so these are two examples that show digital innovation is changing the landscape of housing there's no going back in fact we are leaning forward where our sites set on the future let's talk about climate change for a minute we need technology and tools that are up to the challenge and for housing climate is is now so we're thinking a lot about housing resiliency in the face of climate change for example we need to understand what's the risk the likelihood which homes are most likely to suffer from flooding wildfires or hurricanes which homes would benefit from flood insurance which ones need retrofitting and how do we make that retrofitting a little more affordable so with better insights about climate risk we can better protect homes and communities across the country so clearly our relationship with aws helps us generate breakthrough innovations across a wide range of areas now first climate is all about location and we leverage amazon emr auto scaling and geospark to support our location intelligence we also use amazon rds with gis extensions for quick spatial processing now second to handle really large amounts of data we use a serverless etl workflow that leverages lambdas and step functions and emr to give us speed and transparency and third as we move forward aws marketplace will be a huge asset it's going to allow all of us to simplify our data architecture and to share insights for the greater good so this is a global challenge and a global responsibility making us housing greener and more resilient will have a huge environmental impact in fact it could have the same scale as reducing the carbon footprint of a medium-sized country so helping with the pandemic making housing fair and accessible addressing climate change for fannie mae all of these challenges are within our responsibility they require maximum effort our best minds our best technologies and partners in and out of housing who are strong capable and committed thank you [Music] thank you kimberly it's really exciting to see how you and your team are transforming the housing market now i want to turn to a much more specialized area of computing machine learning machine learning was once a technology only accessible to a handful of researchers and very large technology companies but today more than 100 000 customers are using aws for machine learning according to the study by the idc more customers run machine learning workloads on aws than anywhere else no matter the size of company or industry customers are using machine learning to transform their businesses innovate for their customer experience optimize their operation and improve their products and the massive interest and investment in machine learning is also making life really interesting for infrastructure you can think about machine learning as having two distinct components the first component is training this is where you build your model by iterating through your training data you can think about a model as a math formula with lots and lots of variables and all the math is generally done on very large matrices with floating point numbers the training uses statistics to find optimal coefficients for all those variables and those coefficients are called parameters the second component is inference and inference is where you take the model you train to make predictions to make predictions on new inputs you need very different infrastructure for training and inference but one thing is true for both the new approaches and new algorithms are coming along almost daily from industry and academia so it's important for us to give scientists and practitioners access to the broadest range of tools to do their science and optimize their workloads and that's why aws is investing in supporting the most machine learning infrastructure options but also making deep investments in purpose-built aws machine learning processors no matter what domain you're coming to work in whether it's language speech predictive analytics computer vision you want options some workloads are going to run really well on nvidia gpus some might run really well on intel habana some are going to run best on aws trainium or aws inferentia and which one works best might change as compilers frameworks and models change and improve over time let's start by looking at inference we targeted optimizing inference with our first aws ml chip aws inferentia because for most at scale workloads the cost of inference is the vast majority of the cost and that's because while you might train your models a few times a month you're doing inference all the time think about something like amazon alexa every time a user asks a question alexa is doing inference and not just one inference there's speech to text and then natural language processing to understand the text then there's inference to figure out how to respond and then you have to construct a response and turn it back into speech lots and lots of inference happening and for most applications inferential continues to be the lowest cost way to do inference some inference workloads are optimized using nvidia libraries like cuda and for those workloads ec2 g4 and g5 instances provide great inference performance and low cost but as machine learning becomes more pervasive more and more inference is being done on cpus and this makes sense because doing inference locally is going to provide the lowest latency at no additional cost as an example in many of our database workloads we use inference to optimize query responses and we simply don't have time to go to another box to do that inference so it needs to be done locally so we're innovating to improve inference on general purpose processors for example graviton 3 will be the first aws processor that supports b-float 16. the beef load 16 standard is a way of representing floating point numbers in 16 bits while maintaining the same range as a standard 32-bit floating point number what you trade off with b float 16 is precision you basically don't get as many significant digits but what's interesting is for many machine learning workloads this lower precision works just as well and because you're using half as many bits if your processor supports b float 16 you can do twice as much math running faster and more efficiently graviton 3 benefits from the data width increase we discussed earlier but also for support for bf bfloat 16 enabling it to perform significantly faster on many instance inference workloads of course the exciting news this week is trainium and as the name implies it's focused on training you can see the sticker stats here and they look pretty exciting but as we know by now it's not the sticker stats that matter but real world performance so let's look at how training workloads are evolving and how training is designed to optimize these workloads i said earlier that you can think of machine learning models as statistically generated math models and while this might sound simple in concept models are becoming bigger and more complex because bigger comp more complex models are demonstrating better results in a number of important problem domains model complexity or size is measured by the number of parameters in the model and as you can see models are getting massive in recent years model growth is averaging close to 10x growth per year and what about machine learning hardware that's getting faster too right of course it is training hardware performance is doubling or maybe tripling every two years and this is actually faster than general purpose computing hardware scaling these days but the problem is it's not nearly fast enough to keep up with the complexity growth of models the only way we're going to be able to scale is to use multiple processors and this has led machine learning scientists to develop distributed training techniques the simplest way to perform distributed training is called data parallelism with data parallelism you use multiple training processors and each processor has a complete copy of the model in its memory the training data is partitioned and each processor processes a subset of the training data with data parallelism the processors do occasionally have to exchange some information essentially updating their model parameters with the other model parameters as they converge towards a common solution but the demands on the network are pretty minimal you can think about data parallelism as a math party back in the in-person days you know as is normal in a math party you get a few friends together you divide up the math problems you all work independently and then after some time you merge your results okay you guys don't do math parties you're missing out okay let's look at a real world example bert large is a popular language model that appeared about three years ago it has 340 million parameters and as the name implies when it was released it was a large model but even back then on a single gpu the fastest available gpu you would train a bert large model in about six weeks which was an awfully long time now here you're looking at the number of servers and each of these servers has eight gpus that we would need to speed up our training run from six weeks to a single hour and as you can see as we move from the p4dn to the p4d we're able to get our training job done with about one-third the number of instances and this is really primarily because of the more powerful gpus on the p4d but this isn't only a matter of increasing the performance of the processors we also had to scale the network on these instances networking increased from 100 gigabits per second on the p3dn's to 400 gigabits per second on the p4s but what's interesting is you can see here the estimated time that each of these instances spends on networking overhead this time is essentially not being used for useful work it's overhead and you can see the p4ds are actually spending less of their time on networking and doing more work now while burt was considered a large model a few years ago the machine learning community is now training much larger models like we discussed an example is gpt-3 which has 175 billion parameters that's 500 times bigger than bert large and i heard last week there's a new model going around that has 10 trillion parameters so model growth is going to continue and when you understand this trend the first thing that comes to mind is you have to increase the amount of available memory on the training processor the more memory you have the larger model you can keep in memory but you're going to need terabytes of memory to store a model as large as gpt-3 and even if that does fit in memory it would take decades to train on a single p4d so what can you do to train these really large models well we need to actually break the model itself up over multiple processors and this is called model parallelism and there's a couple ways you can go about this one's called pipeline parallelism and a second is called tensor parallelism in pipeline parallelism you factor your model into layers and then you put each layer one or more layers on each machine and you hand off results from one server to the next server this works well when you can fit one or more layers of your model in in memory on a processor but if you can't do that you need to use tensor parallelism and tensor parallelism factors a single layer onto multiple machines now for our purposes here today we don't need to understand much more about how this works but we do need to understand that this sort of parallelism is going to put increased demands on the network if we go back to our math party analogy model parallelism is like having your math party virtually it's like working on the math problems over the phone where each of you can only see a subset of the equations not only is this going to be more difficult and less fun it's going to put real demands on the latency and throughput of your networking connection this is why many people were enjoying when people moved to video conferencing virtual math parties became more fun the p3dn was launched in 2018 it provides 256 gigabytes of memory and 100 gigabits of networking this was a big training instance three years ago but the world is moving fast the p4d was launched last year and is currently the most powerful aws training instance the p4d provides 25 more memory and four times the networking of the p3dn we also collaborated with nvidia to enable ultra low latency networking between gpus on different servers to accomplish this we used elastic fabric adapter or efa efa is a specifically optimized ec2 optimized networking stack that was designed for high performance computing and we've applied the same techniques to machine learning the resulting reduction in latency can speed up training when training servers need to coordinate on these tightly coupled model parallel tasks making aws the best place for training on gpus training based trn1s are going to have 60 percent more memory than the p4ds and they're going to have 2x the networking bandwidth now this is an impressive jump on both dimensions but we're working with customers who want to train even larger models so we'll also be offering a network optimized version of the trainium one which will provide a whopping 1.6 terabytes of networking now these training training based instances were built with the same efa accelerations i said on p4d and we actually added specific capabilities to the training processor to further reduce latency and cost of exchanging data between training processors let's look at how all these innovations help us train very large models here we're looking at how many servers we would need to train that gpt-3 model in under two weeks right two weeks that's fast when you're talking about these big models and as you can see we significantly reduced the numbers of servers needed to train the large model from 600 p3s to 200 p4s to 130 trainium instances but let's look at the estimated time that each instance spends with on networking as expected it's much more significant here and the work we did to optimize networking on the p4d and training instances really improves the efficiency of the instances it also means you can scale up to larger clusters and further reduce your training time and when we're not done working with nvidia on further optimizing the performance of their gpu instances either we expect to keep delivering performance by adding more memory and more networking throughput on our gpu based instances in the coming year now we did not just design trainium to be more efficient at running large distributed training applications trainium is specifically designed for machine learning training trainium uses an approach called systolic array manipulation now this approach eliminates the need to move data between the registers and main memory and instead efficiently hands intermediate mathematical results off inside the processor this makes it more efficient at performing the matrix multiplication that i talked about that underline underlies most training now systolic array manipulation is actually an old technique but one of the problems with processors that try to use this approach is that they typically do not allow the same flexibility that you can get with gpus one of the reasons why people love gpus is they're they provide a high degree of flexibility with which lets practitioners and scientists experiment many specialized processors for ml forego this flexibility in exchange for efficiency but in a space that's moving as fast as machine learning this sort of premature optimization is a bad idea with trainia we knew that we needed to provide efficiency and flexibility there's that word again and it's one of our favorite words at amazon trainium provides 16 fully programmable in-line data processors essentially allowing data scientists to program their own machine learning operands directly into the training processor achieving both flexibility and efficiency and to make things easy these operators are programmed in standard c and c plus so there's no need to learn new languages or tools and while we're really excited about how training can help customers scale more efficiently with their current applications we're also excited that trainium is going to enable new techniques let me give you an example training requires doing a massive amount of floating point calculations and in various steps along the way you need to do rounding now we've all been taught that point three rounds to zero and point six rounds to one all processors do this math very efficiently however when training a model this kind of rounding might not be optimal scientists have proposed using stochastic rounding stochastic rounding is probabilistic 0.3 no longer rounds to zero instead it rounds down to zero seventy percent of the time and it rounds up to one thirty percent of the time and machine learning scientists are finding that this type of rounding can enable practitioners to use lower precision floating point numbers and achieve comparable model accuracy and as we talked about earlier using lower precision floating point numbers helps you achieve better performance at lower cost the problem is that stochastic rounding is a lot more work every time you want to round a number you need to generate a random number and use it to round the input and current processors don't do this efficiently trainium is going to provide native support for sarcastic rounding and while scientists have demonstrated that stochastic rounding can be helpful no one is training large models with stochastic rounding today and that's because it's simply too expensive so we look forward to seeing real results as customers use trainium to try stochastic grounding on some of their very largest machine learning models now if all this sounds a bit overwhelming don't worry just like inferentia our aws neuron software development kit allows machine learning practitioners to use training as a target for all their popular machine learning frameworks including tensorflow pi torch and mxnet with aws neuron you can take advantage of the cost and performance improvements of trainium with little or no change to your machine learning code all while maintaining support for other machine learning processors we're looking forward to seeing how trainium helps customers lower their cost and achieve unprecedented results with their machine learning training workloads but now i want to wrap up with one of our most important areas of infrastructure investment scientists tell us that we have a limited window to make unprecedented headway on global warming and keep temperature rise to 1.5 degrees celsius to drive collective cross-section sector access action on crop climate crisis amazon co-founded the climate pledge with global optimism the climate pledge commits amazon and other signatories to achieving net zero carbon by 2040 10 years ahead of the paris agreement last year this time the climate pledge had 31 signatories today i'm excited to share that we have well over 200 companies that have joined the climate pledge [Applause] we'll look at amazon's progress towards 100 renewable energy in a second but aws has always been focused on improving efficiency and running the reducing the energy we need to deliver services to our customers we focus on efficiency across all aspects of our infrastructure from the design of our data centers to modeling and tracking performance of our operations to ensuring we're continually identifying innovations to increase efficiency and one of the most visible ways we've been using innovation to approve efficiency is our investment in aws chips graviton remains our most power efficient general purpose processor and is about 60 percent more efficient for most workloads and while we expect trainium to be our most power efficient processor i don't yet have production data to share but we do know that inferential is our most power efficient inference processor and this is really important because machine learning is becoming more and more pervasive and the amount of power required for those workloads continues to increase rapidly when we co-founded the climate pledge amazon committed to powering our global operations with 100 renewable energy by 2030 and with the progress we've made since then we're now on a path to achieve this by 2025. last year i shared our progress through 2020 and we announced that amazon had become the largest corporate purchaser of renewable energy in the world since then amazon's announced another 3.6 gigawatts of new wind and solar projects and today we announced another 18 renewable projects totaling 2 gigawatts with projects in the u.s and europe in total amazon has now enabled 12 gigawatts of renewable capacity and i'm happy to confirm that amazon remains the world's largest corporate procure of renewable energy it's hard to visualize or think about the impact that these investments will have but let's try when all the renewable energy projects amazon has enabled come online an estimated 13.7 million metric tons of carbon emissions will be avoided and to put that in context that's the equivalent of nearly 3 million cars each year us cars and it's not just the volume of projects that excite me we're deeply investing to enable renewable projects in new locations and in new geographies these investments are difficult but they're necessary to help with our commitment and their hopefully help others as they work towards a net zero carbon future with us for example this year we delivered our first project in south africa and we announced our first renewable project in japan and both of these are the first corporate initiated projects in those countries amazon also announced two new offshore wind projects in europe including our largest renewable project to date offshore wind projects have a number of benefits when compared to wind farms on land offshore wind currently produces only a tiny tiny fraction of the global energy power generation but it has the potential to generate 18 times our current total worldwide electrical usage so we're excited to be part of enabling more offshore wind projects and the power grid needs clean energy around the clock even when the sun isn't shining and the wind isn't blowing and we're working hard on cost effectively solving this problem for ourselves and others this year we announced our first two utility scale storage projects the storage projects will store solar energy during the day for use on the hours when the sun isn't shining these projects are our first step in matching our load with 24 7 carbon free energy we're also working to optimize our renewable projects we're using aws services to monitor manage and optimize our renewable projects to ensure we maximize their power output by 2025 we expect these optimizations will deliver the same energy as building an entirely new 200 megawatt wind farm and while we're always excited to share our progress on these goals with you customers have asked us for visibility into how these efforts are helping them achieve their carbon reduction goals and so today i'm really excited to announce that aws will provide our customers with carbon footprint of your use of aws services not only will the show that will show your current usage but it's going to give you a forecast that shows how our sustainability investments are going to lower the carbon intensity of your workloads in the coming years as we continue our march towards net carbon zero i could spend a lot more time this afternoon talking about sustainability but when they gave me the afternoon slot i promised not to keep anybody into dinner so i've been asked to make a pitch for a large number of the additional sustainability and i want to do that there's a large amount of content at this re invent on sustainability and i would encourage you to check that out or check it out later online i hope you enjoyed this little peek into our innovation at scale and i want to thank you for joining us here at re invent this week enjoy the rest of your time [Music]
Info
Channel: AWS Events
Views: 127,341
Rating: undefined out of 5
Keywords: AWS, Events, Webinars, Amazon Web Services, AWS Cloud, Amazon Cloud, AWS re:Invent, AWS Summit, AWS re:Inforce, AWSome Day Online, aws tutorial, aws demo, aws webinar
Id: 9NEQbFLtDmg
Channel Id: undefined
Length: 80min 35sec (4835 seconds)
Published: Thu Dec 02 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.