AWS re:Invent 2018: Aurora Serverless: Scalable, Cost-Effective Application Deployment (DAT336)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everyone welcome to reinvent my name is Chandra Maurice I'm a senior engineering manager with AWS joining me in this presentation it's Josh Eichorn he is the CTO of page Li so we are here today to talk about Arora serverless thank you for joining me so before we get into aurorus herbalists I wanted to call your attention to a few related breakout sessions that you may want to attend if you want to look at Arora more broadly outside of the scope of server lists so on Tuesday tomorrow we have what's new in Amazon Arora 1 p.m. here in the Venetian we also have at 6:15 p.m. deep dive on Amazon or Ora with my sequel compatibility you may want to check that one out as well and then on Thursday we have Amazon Aurora storage demystified also here in the Venetian so please check these out if you're available I think you'll find them very interesting now let's start by taking a look at what we're going to be covering today in this presentation about Amazon Aurora serverless we're gonna start briefly by catching everybody up on our dias and Aurora fundamentals then we're going to talk about Aurora serverless what is it how does it work we're gonna get closer to the end talking about what's new and Amazon Aurora server lists I'm gonna be talking about some features that we've just recently launched that I think you're gonna find really useful and then I'm gonna hand it over to Josh who's going to talk about the way that they use Aurora server lists at Paige Lee so with that let's take a quick look at what is RDS so Rd S stands for relational database services we offer a platform that provides management around a whole variety of relational database engines including Amazon Aurora which is our homegrown cloud native relational database engine but we also do of course support my sequel Maria DB in Postgres the open source engines and we do of course support sequel server and oracle the commercial database engines now as part of RDS you get all of these capabilities out of the box without having to do any extra work for them so let's jump - well what is Amazon Aurora before we get to talking about Aurora surrealists so like I mentioned Amazon Aurora is our homegrown cloud native relational database engine we made it generally available back in 2015 and we've been really excited by the adoption for it now Amazon Aurora gives you the speed and the availability of high-end commercial database engines many of you have tried to use it and you've found that with traditional databases you don't get the kind of availability and performance that you might want Aurora does give you that capability the same way a commercial database engine might give you but it gives you that with the simplicity and cost-effectiveness of an open source database engine it is fully drop-in compatible with my sequel and Postgres so you can migrate in and out without having to change occasion and it's simple pay-as-you-go pricing so there's no vendor lock-in there's no licensing involved we're in your business every hour of every day this is delivered as part of RDS as a fully managed service now let's take a quick look at what is the architecture of Amazon Aurora so the key innovation to Amazon Aurora is our purpose-built log structured distributed storage layer where what we've done is we've taken the whole storage layer of a relational database and pulled it out into a multi-tenant process by giving you a highly elastic data volume what we do behind the scenes as we break down your data into ten gigabyte chunks and we store six copies of it to per availability zone what that gives you is the ability to survive an entire azy outage without any service disruption without any performance degradation and without any data loss now the nice thing you can do about this is you can create your master database and you can also create up to 15 replicas that are both useful in terms of serving your read traffic but they can also act as failover targets because they're attached to the same storage volume so you can failover without any data loss but now let's get to talking about what do we need server lists for in the context of Aurora well one restriction you still have with Aurora provisioned is that you need to provision your database server you need to decide on what size server do you want to operate on up front now the industry standard thing to do is to peak for your provision workload a lot of the time this works sometimes it doesn't work so well for cost efficiency standpoint so one thing we find our customers often doing is they'll continuously monitor and manually scale up and down their database server as their workload fluctuates so this is where we get to Aurora serverless so in order to solve this problem we started by looking at offering a solution where you don't have to monitor your database server anymore you can simply run your workload and it will automatically adapt to your workload as things change so Aurora server lists will automatically scale up and down based on the capacity that your workload consumes and you don't have to think about it when your database becomes idle will automatically shut down your database server when your workload resumes will automatically spin it back up so this is the core to what we're going to be talking about today now Aurora server lists we announced it last year reinvent and we made it generally available this past August three months ago so to really make the case a little stronger here we took a bunch of workloads from our production fleet here's one example where we saw this dev test workload now we're looking at right volume right I ups there's a lot of other metrics you could look at but what was immediately obvious to us is if you were to draw this fake line across this graph and say well this is what your provisioned for what you would quickly notice is that for the vast majority of the time you have resources provision that you're never actually using so this is where we started in terms of inspiring us to go build Aurora server list now let's look at a couple of other sample workloads before we move on this is an even more typical dev test we see a lot of these in our production fleet today where our customers provision for their peak workload even for dev test instances but they never actually use it so this creates a huge opportunity for cost savings now let's look at some production workloads that we see this is a sample spiky gaming workload where you have your daily peak coming and going and even in this production workload if we were to draw a line in terms of what you would provision for you would quickly find that you're paying for a lot more than you're actually using for the majority of the time and here's zoomed in an e-commerce production workload I promise this is the last one but it's really the same story so in terms of provisioning your database cluster you have a few choices to make you can provision for peak and this is sort of the industry standard this is what you typically do for any production workload that you actually care about or you can under provision now this is probably not what you for a mission-critical workload because you're gonna have end-user customer impact if your database server isn't large enough to handle your workload or like I said earlier you can continuously monitor and scale up and down but this creates risk to your business because every time you do scale up and down there's the potential for downtime there's a lot of expertise involved there's the risk of data loss and I'm gonna be getting into the details of what that looks like and then talking about how Amazon Arora serverless solves that so in terms of database management let's pretend you're gonna go and manually monitor your database fleet and continuously scale it up and down you might have an architecture that looks a little bit like this you're probably going to have more than one application server but we'll just have one for the sake of simplicity so a typical scaling operation is going to look like going ahead and provisioning a larger replication we've quoted a scale target now this is easy with Amazon Arora because you don't need to worry about replication that's all done for you down at the storage layer but for a traditional database you actually would have to do that yourself now you're going to go ahead and cut over okay now you've failed over to the new master now you're running on a larger box but the problem you're gonna run into is that cutting over from your smaller database server to your larger one can often involve downtime and even after you've done the cut over operation your performance is going to be compromised for some period of time because now you need to warm up your buffer pool load up your caches and do a lot of things to get back to the same performance level you were at before now what we see a lot of customers doing to reduce this problem is they'll add a proxy layer in the middle that basically makes the workload cut over a lot more seamless so that application servers don't have to worry about rerouting their traffic so a cut over will look a little bit something like this and as you notice the application server has not had to redirect anything but there's a couple problems you're still left with having to solve one of them is now you've introduced a new single point of to your service so yes you can have a multi multiple proxy servers there's a lot of things you can do but it really adds to the complexity of your operation the proxy solution also doesn't solve for you this problem that after cut over there's some period of time when your performance is compromised because you are starting from a cold buffer pool so let's talk about Aurora server lists and how it solves some of these problems as I mentioned earlier Aurora server list was designed to automatically respond to your application workload it will scale you up and down without any downtime what we have is a multi-tenant proxy layer that we've added to make sure that there's no single point of failure introduced you don't have to manage it this is all done for you and finally the scale target has a warm buffer pool which means your performance does not degrade after cutting over you continue running at least as fast but probably faster because you're now running on a larger box immediately after cut over and finally when Aurora server list is not in use the database will just go to sleep so this is really ideal for devtest workloads you can also use it for your heavier workloads where your traffic is highly variable now when we announced Aurora server list last year we got a very strong positive reaction from our customers this is one that I was really happy to share our friends over at Netflix just to use a reference to one of my favorite shows stranger things check it out if you haven't already so we got a very positive response when we announced Aurora server lists a lot of customers like you were really excited about it now what I want to get into next is how does it actually work behind the scenes how does it solve these problems that I've described so what we start out with is something that looks a little bit similar to what I showed before you've got your application server fleet you've got your database but the first thing we did was we added this multi-tenant proxy layer where we have thousands of proxy servers that load-balanced horizontally without introducing any single point of failure the next thing we did was we added a warm pool of database servers so that when we decide to make a scaling operation we can swap hosts in and out very very quickly we also added a monitoring service that is going and checking the entire fleet looking at how are the different database servers performing as anybody starting to exhaust their capacity might we need to start thinking about scaling them up or down and so when you connect to your Ora serverless database you're connecting through a proxy layer now I did mention earlier that you can connect to any one of the multi-tenant proxies and this is not something that you need to worry about because this is handled by a network layer load balancer that you don't even need to think about you connect to it as though you would any other database now you can distribute your workload across multiple proxy servers and you don't need to think about it like I mentioned so we have not introduced any single point of failure now what we look for in terms of making scaling decisions is the monitoring service is sitting there looking out for any database server that has more than 70% of its CPU capacity utilized or more than 90% of its maximum connections have been consumed and if it finds a database server living in the state for more than five minutes it'll go ahead and decide to scale up or down now what you get with this is you grab a fresh database server out of the warm pool so scaling operations can be executed very very quickly without having to wait to provision a new server we're able to transfer the buffer pool to the new cut over target so you don't have to worry about that and you get fast performance immediately after cut over and finally we look for what we call a safe scale point which basically means we look for a point where your production workload is up and running but there's no single long mutating transaction that we can't easily transfer over to the new box so you don't have to disrupt any of your in-flight transactions when you do the cut over we find the safest point for you to scale so that your application doesn't feel any downtime now when we do one of these cut aware operations what we're going to do is momentarily freeze your application workload now this doesn't drop connections it just basically parks all the transactions at the proxy layer and during that brief period we're going to take all of the transactions that are in flight we're going to take all of the sessions and we're going to move them over to the new scale scale-up target what this means is that when we resume your workload it looks as though nothing happened all your application experienced was a very brief spike in latency and because we have the warm buffer pool as I mentioned before you're not going to see any performance degradation following this cut over now when the workload becomes idle when your application stops sending traffic to your database server we're going to go ahead and shut down the database and save you a little bit of money when your workload comes back we're gonna spin up the database we're gonna take a server out of the warm pool and keep running as if nothing happened now as I mentioned earlier the proxy layer is designed in such a way that we have not introduced any single failure we can horizontally distribute your connections across the proxy fleet and you don't need to worry about introducing anything that's going to take your workload offline should a single server fail now let's take a look at what this actually looks like in practice having gone through all this it sounds great doesn't it I want to take a look at some graphs this is an actual workload that we created we ran it against a server list database what we're seeing here is transactions per second over time so we're basically going for throughput now this is looking down at the one-second granularity so you can really get a feel for what this actually looks like as an application server so we go ahead and start running our synthetic workload and we basically push the box to its maximum capacity and what we see over time is these drops in transactions per second when we're performing the scale operation and then when it comes back it's up and running at the next level now let's see if our nifty animation okay it's thinking about it there we go what we have here is overlaid our transactions per second with the actual capacity that's been provisioned behind the scenes so you can see how the Aurora server list database is reacting to the synthetic workload that we've created you can really see the capacity that's being added and each time the scale operation happens you see this momentary drop in throughput but there are no connections dropped and so from your application standpoint there's a very brief slow down but things get right back up and running and a small number of seconds now midway through this we go ahead and slow down the applicant the synthetic workload and we can see Aurora serverless gradually scaling its capacity back down then finally at the end we go ahead and shut the workload down and we can see the Aurora serverless database going to sleep now here's a similar overlay where I'm tracking latency instead of transactions per second so we can see that each time a scale operation happens there is this very brief jump in latency corresponding to the increase or decrease in serverless database capacity but it's really not very noticeable to your application because it lasts a very small number of seconds and as I keep mentioning there are no connection drops and finally just to round out the picture here what I'm showing you is throughput mapped against latency so we've got transactions per second we've also got latency in terms of milliseconds and you can see how they correspond now we built all this we made it generally available in August's and we've been really excited by the response so far but one thing that our customers have said to us is that Aurora serverless is really just photo scaling for databases and that is actually true a lot of what we initially offered it certainly does give you automatic scaling capabilities and it does this in a very seamless way one thing we had not necessarily done was make this really really simple and easy to integrate with other AWS server server lists serverless technologies so that brings us to the what's new of this presentation the first thing that I'm really happy to share with all of you is what we call the RDS data API so what the RDS data API is going to allow you to do is connect to your database without any dedicated JDBC or ODBC connection without any custom configured driver you can actually ascend both queries and mutating operations to your database through a simple web service API protocol now these are actually packaged as HTTP requests you don't need to worry about connection pooling anymore and you can really access your database from anywhere now now we've made it really really easy to access your database from AWS lambda and AWS app sync and I'm gonna describe how that how that was made possible in just a moment you can also access your database from the AWS software development kit or the command-line interface so that's the RDS data API the next thing I'm really thrilled to share with you is the RDS console query editor so many of you have probably used the RDS console before you're quite familiar with it well now you can actually run queries right in the console itself you don't have to go and spin up a dedicated database client you don't have to create an ec2 server to connect to your database anymore you can run your queries right from within the AWS console I'm also really happy to share with you that we have as of last week opened up a preview to support all of the capabilities that I've described not only with my sequel compatibility but also also with Postgres compatibility so I hope you'll go ahead and sign up for the Aurora server list Postgres preview it's available today sorry for the long URL you can also web search it and you'll find the same page we've recently added Aurora serverless support in many many more regions I'm not going to read off the full list but we wanted to make this a truly globally available service as of two weeks ago we made this all available and now what I want to do to tie a lot of these pieces together is to run a little bit of a live demo for you and before I do that I'm gonna describe what's going to happen here so this is going to require some audience participation I'm gonna take an audience poll and see where is it where is everybody from and then I'm gonna show for you the results in the RDS console query editor okay so the way this works is you're all going to take your phone out and send a text message now I'll share with you the phone number in just a moment but you're gonna send a text message and in the body of the text message all you need to do is specify the city that you're from don't worry about country or anything like that now what's gonna happen is we have a lambda service that's going to pick up those text messages and it's gonna send them down into an Aurora serverless database and then I'm gonna go ahead and load that database up and show you the results right in front of your eyes so when you're ready go ahead and send the city that you're from to this number now I'll just give it a moment where everybody sends their text message in okay I hope you all got that now what I have here in front of you is the RTS console oh I got some requests to switch back to the number happy to give it more time there's no rush okay I hope that was enough time for everyone so I'm gonna go ahead and switch over this is the RTS console that many of you are probably familiar with one of the things we've added is the query editor that I'm gonna show you right now so we can go to this query editor it's going to have us choose the database that we want to query against pardon me folks there we go got to choose the cluster there all right now this is the table we're now logged into the database that is storing the results from the text messages that you just sent let's go ahead and show tables see what we get sure enough there's a table called cities so I'm gonna go and look what is in this table now contained in the table is basically the aggregated results of what you will just sent me and by the way I'm going to show all it share all of this code with all of you after the demo let's go ahead and select city and count from cities let me just fix this as count group by it's hard to type when you've got a big audience okay let's see if I typed it correctly there we go so we have a lot of people from Seattle with us here today a pretty good group of us from Boston San Francisco Dallas Los Angeles Denver now what I'm trying to showcase here is the fact that it's now really really simple to build these dynamic scalable applications and then to access your database using Aurora serverless now just to give you a quick sample of what this code looks like this is the kind of code that you would write to access your Aurora serverless database from your lambda application we have shared the complete code for this demo right here on this github link you're welcome to go and download it make it your own do anything with it that you would like to do at this point in the presentation I'd like to go ahead and hand it over to Josh he's going to be talking about how they're able to leverage Aurora serverless and Paige Lee thank you thank you so I'm Josh hi corn CTO at Paige Lee and world managed WordPress hosting company hopefully you've all heard of WordPress last time I looked it Powers 32% of the web and most of that you might think of is it's just used as CMS our blogging platform what a huge number of those sites are also ecommerce sites and fully custom applications so what page we brings to the picture is everything you would need to run WordPress on Amazon at scale without worrying about any of it and not only is that mean we're setting up the platform for you but we're the DevOps engineers who get paged when something invalid goes wrong so you think of us as bringing the peace of mind to running WordPress at scale so we've been in the WordPress market for about a decade now and we've seen a change and grow and where we're focused right now is really on the larger scale of the market enterprise customers people who are doing really interesting things with WordPress but as we talk to our customers we find that our current offerings doesn't always meet their needs because it's really focused around that high-touch white-glove service we take care of everything and sometimes a customer just wants a platform to run things on and so that's why we're we've been building a new product called north deck and so that takes the same service application and concepts that you might get from lambda but instead of putting them at that infrastructure level being able to run a full-blown application like WordPress on that platform so you don't have to worry about any of the servers you get an auto scaling platform that is build in a metered way just like any other Amazon product but you don't have to do all of the work to shoehorn an application that was written almost 20 years ago on to the platform now besides just that platform we do have to change a little about how you work to make that work and so north stack also introduces the workflow that's needed so gives concept of doing releases since the only way to make those legacy applications work in a fully serviced way is to give up a little bit and how you work you give us some flexibility and you gain the platform so this is the high level architecture of north deck so it's built on a bunch of Amazon services the biggest ones are ECS and aurora server lists but we also use EFS for places where we need to have shared storage among a bunch of different containers and we use s3 for all of the bulk storage so a an interesting thing is that we have a bunch of different sorts of applications they might run a wordpress and one class of them is that standard CMS use case and the goal there is that you only want to use a very small part of the platform the majority of the time so the very first the application gateway EECS containers they are running our nginx custom caching layer and your goal in the CMS applications is distribute the vast majority of your traffic there because that allows you to do it the cheapest way and only need to use the PHP layer that's running on ECS and the database when it's needed so you want those pieces to be able to stop which is really easy with docker you can just manage the life cycle stop the service when it's now used and have something to start it back up but it's much harder in in the standard database world to do that so the way we work is each production application gets its own Aurora serverless instance and then the non production applications are in a multi-tenant setup so when last year when we heard the announcement for a raw service we were very excited because we were mulling how to solve this problem so having a database that easily scales they can stop is really a hard problem and kind of the classical approach to this might be you you have a master and then you scale out read slaves using auto scaling but WordPress makes this really hard so the standard prep pattern and WordPress is you do a database right and then you immediately do a read to fetch the data any fields that you might be missing around you're right and so that makes it very hard to get more than twenty or thirty percent of your traffic to go to a read slave which means that doesn't really help your scaling problems so you need to look at what the options are for scaling masters so and we do both these options right now have lots of experience doing it so there's the single tenant option say you're running on over provisioned Aurora you have a read slave you resize that and then you do failover to do that it has its problems and you generally have around 30 seconds of outage around your scaling operation that's hard to get away from and it ends up being a very high operational overhead so the other option that we do and it's more common for us is we do multi tenant so if you use very very large instances and have different customers share them then you have a bunch of spare capacity and everybody is sharing that spare capacity so it makes it less expensive on the over-provisioning the downside there is that you have to be very careful to know who those noisy neighbors are because once somebody uses all that spare capacity it's not available for anybody else so if you find somebody who's bursting all the time then they have to be moved off that multi tenant solution to a single tenant solution so again very high operational cost so as I mentioned early were really excited when you heard about Aurora server list because it's the perfect fit for us right it scales the master we don't have to deal with downtime we don't have all that operational overhead and it has an answer to pausing so if you're in that standard CMS use case and the database isn't needed one unless you're adding new content the the solutions built right in so over the last year we've been building out north stack and more in our early testing phases right now with real customers and we had a lot of great success but we've also found some things that you need to watch out for when you're using Aurora server less so I'll go to the end of those from details so the first one is that if you're using pause and that's really the best way to see real money savings in a lot of our use cases that resume time is inconsistent so you need to plan around that and as mentioned earlier Aurora has to pick that safe place to scale but sometimes it's you're slow queries that are causing the load and preventing a safe place to scale so you have to have an answer to that and we noticed that at least in the PHP case we noticed more my sequel has gone away and so you need to take some steps to manage those so in architecture the goal is to stop the whole second half of the design when it's not needed and so that means when a new request comes in first we're gonna wake up a PHP worker it's gonna then connect to Aurora and then Aurora is gonna on pause and then it's going to be able to run the queries and so starting a new task on ECS might take 10 20 seconds and then starting that Aurora on pause and that Aurora instance might take well sometimes it's really quick but oftentimes we'll see 30 or 40 seconds and so when you do those two things back-to-back you know you might be looking at 40 seconds on a minute and that starts to get to the point where it's unacceptable and so we had to look at well what are some ways we can reduce that time so obviously we talked to the roar team and hopefully in the future they'll be able to improve that time but we can also improve it in our design so and this is the case of many things like you might not be connected to the database until you need it but if that database is paused immediately is high up in the process as possible you want to connect to it and start waking up and so in our case we have a dedicated service that's running and it's only job is one connection is being sent from our load-balancing layer to our PHP layer it sends a connection request to the database so that our long possible alright so second case the thing you really need to worry about is you got to give it a time you got to give a roar service a place to scale so let me tell you what a little story about WordPress so as I mentioned it's it's a relatively old application and it was designed not as saira with everything was used in my used today in mind and but it did have a lot of flexibility in its design and the biggest part of that flexibility is this table called post meta so a post can be many types but it only has in that table five or six default fields and anytime you want to add optional data maybe that describes an order if you're using WordPress WordPress for e-commerce site or maybe it just describes different sorts of custom content that are being used to show the page they all get stored in this post meta table and if you look at the design of the schema you can say oh well as long as I only access things by post ID or med ID I should be in pretty good shape and the most common access pattern is post ID but you often get cases where the data that's in there is being used for filtering and so instead of querying against those keys you end up querying against either that large varchar' or even more painfully that meta value and because of the way that my sequel handles blobs in the tables any query against this table is going to end up using a temporary table so it's really easy to run on a small instance and everything's fine until you do with that one query that scans this table looking for a specific met eval you and then that query might take if you're on a to the smallest Aurora server site so that quarry might take 10 minutes to complete and because it's using that temp table there's no scaling point until it's complete or it's canceled so for ten minutes your database is maxed out but it can't scale but if you would have been at 8h to use during that time that query would have completed in under two seconds so what you need to do is figure out how you're gonna manage those runaway cases like a roast herbalist will give you great results if you're doing lots of small capacities and your load is coming by increasing the number of queries but you have to have an answer if part of your workload could be these queries that take a long time and so I mean the obvious easy answer is you want to start setting some timeouts and as I was researching options for this I was really excited because you look in the documentation and my sequel has support for max execution time unfortunately that's in my sequel 5.7 and Aurora server which doesn't support that yet though hopefully it'll be soon so the long term there's a really nice answer built right into my sequel on a per session basis or globally you can set the timeout but until then you need to set that timeout at your application layer so if you have an application where you can't change anything in it there is a nice option there's a product open source product called proxy sequel and it allows you to set a query timeout and then it manages stopping killing the query when it goes past that timeout and returning an heir to the client but and if you don't want to deal with all the problems that a proxy is single points of failure is managing that most clients have some sort of socket timeout network traffic timeout that you can use as your timeout mechanism so in PHP that's the option you set so in our case most of the time we're setting a 30-second timeout for our workloads with the ability for specific customers to be able to override that for specific sites but it's also very important when you're thinking about those timeouts is you often have a reporting query database import where it might need that a much higher timeout than normal it's just a query that you don't run very often but it is gonna take a long time and so you have to have some application specific awareness about how you apply those timeouts there's no just global solution 20 seconds will always work so then the kind of final wish we ran into is that we'll get a lot of an increased number of my sequence Conway airs that's not entirely clear to me why we see this one theory is that it's just you're connecting against the proxy layer and you have shorter inactivity timeouts than you do when you're connecting to your own RDS and that the my sequel client doesn't manage that well on its own but I think if you're using any language that's using or following the same patterns it's the my sequel C library you're gonna run into the same increased number of servers gone away errors deal to how it's managing timeouts on sockets the nice thing is you just catch the air and reconnect and everything's fine but it is something that you want to test around and be aware of all right that's everything thank you Thank You Josh appreciate your coming on stage and sharing with us your experience of using Aurora server lists we did want to open up the floor to some question and answers we do have microphones set up so in each of the aisles so if you want to go ahead and step up and ask any question you'd like happy to try to answer that for you or any question for Josh for that matter as well so does anyone have any questions they'd like to ask yeah and just as a reminder before we get to questions please make sure to fill out the survey in your mobile application thank you it looks like we have a question right over here yes sir yes question is when we have a down skill the instance what happens to buffer pool what is the strategy of downscaling of the buffer tool so it's the same strategy in that we do pre-warm the buffer pool even when scaling down however we are scaling down to a smaller machine so you have to assume that part of the buffer pool is going to get evicted what we try to do is optimize for loading up the parts of it that are the most used but that is kind of a harder problem to do it course as you can imagine okay thank you yeah you're welcome yes I'll take one from over here what version of Postgres is supported in the server list platform I'll need to check on that to confirm but I believe that it is ten three okay great thanks you're welcome question from here yeah two questions ones quick one the whole thing where it scales off 70% CPU or 90 percent of Mac's connections is that configurable or is that a hard set thing that it does today we don't make it configurable but we've been hearing the requests a lot where customers want to configure it that configure it themselves so that's something we're taking a very hard look at it's not a difficult thing to do to make it configurable so you can expect us to be working on that real soon and then one more question for Joshua right yes yes so when you said that when you have a query that takes ten minutes to run on the two CPU unit one I know you can increase the timeout or set a timeout to prevent issues but what do you do about actually making that query run in a good amount of time you don't want to have your instance at 8 AC use all the time but have you said it to to ECU it goes to 10 minutes so how did you end up fixing that well so in our case we're just allowing the auto scaling to happen there's enough traffic happening against it that pushes it over that 70% we do have some edge cases there that we haven't solved yet which is if that query happens I don't know once or twice a day it may not ever force that scaling event and so we're looking at that we may need to do some application lot specific logging to detect those cases and know to use the API to force the scale but in all the cases we've run into so far there's been enough it pushes up the CPU load enough to force that scale thank you thank you yes sir I have used any more and they have also this auto scaling feature correct but one of the big issue I have seen with that is it will throw provision through put exception for couple of minutes and all the request of the client fails and later it will scale it and then it will work fine which is unacceptable like for my case so what do you have some similar issues with server lists Aurora that okay if suppose my request is very less now and then suddenly there is a biggest spike wait a second sure so it can be challenging to handle that very sudden burst especially if there's a lot of long-running transactions because we're not able to find a safe scaling point but what you are able to do today with Aurora server list is you can always go and explicitly set your capacity to say hey I want to set it at 8 or 32 or 64 right now if you know that that large burst is coming especially if you're gonna be running a in any sort of heavy workload against your application and you know it ahead of time no my case is okay take time latency means instead of one second I'm ready to wait for one minute but I don't want the request to fail that is the only criteria I have right so whether like it will be able to handle it or it will fail well we won't typically fail the request we'll go ahead and do our best to service every transaction that comes our way we want to make it look as much as possible as though you're talking to a traditional relational database engine so that's why we look for a safe scale point if we're not able to find one then we won't scale up but we won't just outright reject the transaction okay thank you you're welcome question from over here yes have you noticed any difference in I ops between RDS my sequel and Aurora my sequel so for a similar workload in you know between artis Postgres and Aurora Postgres we did notice increased IAP Sonora because we are charged per a UPS in Aurora I wanted to understand the story with my salute right so there is a pretty significant difference in terms of how storage volumes work with Aurora which kind of goes to what I spoke about earlier for some specific workloads we can see higher higher I ops on the volume layer it what you can often get away with doing is running on a smaller server because Aurora at the database server level does have much higher performance to offset that cost difference but it's a very workload specific question so I'm not really able to answer it in general terms other than saying you can for certain workloads see higher volume I ops than you do otherwise unfold auto-scaling sorry in terms of downsizing can be configured like can I save the CP utilization is 30% or something right be able to don say yes so the question really is more generally can you configure your auto scaling policies which is not a capability we give you today with Aurora server list but it's something we're taking a very hard look at it's an obvious feature that we get a lot of customers ask us about we did make the service generally available just three months ago we've announced a whole lot of new things so we're working pretty hard at doing everything as our customers asked to do but you can expect that to be coming good thanks you're welcome yes question over here if using the SDK from a lambda function to call serverless yeah are you still required to have the lambda function within a V PC or is that limit you are not required to have the lambda function in a V PC it can be running from anywhere so what the data API gives you is a publicly accessible endpoint you could call it from your desktop you could call it from absolutely anywhere you do not have to be running lambda in a V PC anymore and it also uses like I am roles for the access users what sorry I am roles yes it does use I am roles to authenticate we also use the AWS secrets tour manager to provide your database credentials sorry was there follow-up questions it doesn't natively support i''m roles the way Aurora my sequel does today I mean that you can't use that to solely authenticate but that's something that we're gonna be looking at thank you you're welcome question over here when do you expect the Aurora serverless Postgres sequel to be generally available unfortunately that's not a data I could share with you right now I wish I could we've only just opened up the preview today we're gonna be taking a lot of feedback and building our engineering roadmap based on the feedback that we get if I could give you a date I would but I'm afraid I can't and how about having this available across regions so you want to do cross region replication between Aurora serverless questions yes so that's also something we've heard requests for today unfortunately it's not supported but it is an obvious feature gap that you can imagine we're gonna take a hard look at supporting thank you you're welcome yes question my understanding is that the query performance based on the history and statistics when we do the auto-scaling does it also takes care of the statistics capture the history and cache you know all those things I'm sure it is a capture which captures statistics of what are the queries happening how does the response on the way to look at it I think I'm going internals of the database yes so does it catch I mean take the statistics to that new everything that's in the buffer pool does get carried across okay yeah yes question yes you said there was a forty second start is that and you said also that there was a you know approaching unacceptable that's from from my experience definitely unacceptable and you said but what we're cloud in a production environment are using that has a forty second startup time that's acceptable right so in this case the vast majority of the traffic would be served from the caching layer so you're looking at that slow start would be from an admin logging in and then that's the only case where they see it I see some more about you know behind the scenes uh or a dev environment would be something like that production you're saying some WordPress admin logging into to modify their site that's right and so if you were you know regularly needing to talk to the database it wouldn't be ever be pausing so you wouldn't see it and that's my next question if without the pause is bringing it down far enough reasonable to assume a cost you know to make this cost effective well I'm you know it's it's all relative so you know the the lowest cost runs around well I mean before your I ops and the $40 a month range so for some use cases yes for others no you know if this is a low volume site where you're trying to have hundreds of them and make it work then it might not be the answer but you know it really is relative we are also looking at other options there like where instead of an instance per site it would be per customer and then it would really never be pausing but you would be sharing the load there's always there's multiple ways to address the price problem but there are going to be some cases where an RI t class RDS will be cheaper you just have to find the right model for your use case thank you question over here yeah how good is cloud watch support for Aurora server list we'll be able to monitor this spikes I mean after school activities using cloud which metric alarms yes you will you can get all of the same cloud watch metrics with Aurora server list as you can with provision Deloris so you get the same visibility as you would whether or not you're using server lists through cloud watch okay but the monitoring those spikes would be great to have so you can see those spikes the only caveat there is that cloud watch will sometimes restrict your granularity to one minute for certain metrics so you can't necessarily see them if you're looking at in one minute granularity the most of the metrics that I was showing are taken from the client side where you can sit an arbitrary granularity that's probably the best way to measure throughput and Leigh see in that particular context thank you you're welcome it's question I have three questions about the new instance warming process mm-hm as I understand it it is done in parallel so it doesn't stop the original correct yeah so does it involve reading data from the storage or does that data is pushed from one instance to another right it's actually read directly between database servers it's done so in an asynchronous way so that won't slow down the primary workload and if you think about it it doesn't really matter whether that warm-up process takes 1 minute or 10 minutes because you're not down during that time we're just sitting there waiting to find a scale point and warming up the buffer pool but it is directly between database servers to answer your question okay well it's typically gonna take anywhere from no time at all up to maybe five to ten minutes in the worst case for a particularly large database okay yeah thank you you're welcome yes question two questions come to expand on that last question these standby instances I guess mm-hm I mean are they are they free like how does that work right so you don't pay for them we manage the warm pool on your behalf we we absorb the cost of having that standby while it's warming up all we charge you for is the amount of capacity that you have provisioned at any moment so you're not paying extra money for the warm pool or for the hot standby while it's warming up okay so those machines you have different instances just prepared and ready to go and time comes that's right and the second question is for the I guess the cutover point if you have stacking slow queries mm-hmm usually when you have like a undersize server the slow queries will start to pile up and so they're like the snare that Josh had mentioned was a single slow query and I can understand that you set the session timeout BAM no problem that's a kind of a point but if you have these stacking slow queries you don't know when that's going to happen to give you that cutoff point so in that scenario are we out of luck so what well if you have a large number of very long-running transactions in that particular case what we're gonna do is we're probably not gonna be able to find a safe scale point so we're pretty much going to lock you onto the server size that you're currently on so in terms of being out of luck I mean that might be a strong word we won't be able to scale you up or down but but you know you have to accept that limitation we make that compromise because we don't want to start rejecting transactions or disrupt your workload we kind of put the onus as a shared responsibility to say okay you're gonna need to you know tune your workload so that you don't have a lot of long-running transactions in order to get the most out of Aurora server list thanks man thank you yeah you're welcome so we only have five minutes left in the session I'm gonna answer a couple more questions and then you're welcome to approach me afterwards and I'll just answer your question one on one yes one over here what one question about auto-scaling in Aurora is it yeah so currently Aurora serverless only supports vertical auto scaling at this time however if you're using Aurora provisioned what we do offer today is an auto scaling capability to grow or shrink your read replica fleet so you can you can get that with Aurora provisioned already so if we don't have a read write separation I'm correct if you don't have read write separation then it's a good use case if you specifically want to separate your read traffic it's probably not the best fit for you right now okay and you're doing plan to it to have a layer like well that's something we've heard a lot of requests for and we're gonna be looking at for our roadmap unfortunately I'm not able to share any specifics around her when we launch it okay thanks thank you okay last question and then I'm gonna take everything in person right over here yes can we go back to the slides with the preview link like you posted today yeah let me see if I can find it yeah would you like the source code is that no no the prepaid for the pages and all of these slides we will be publishing online so you don't have to you're welcome to take pictures of them of course but we will be publishing these slides and making them available to all of you so we only have three minutes left in the session I'm going to get in trouble if I continue I'm sorry to cut off the questions but I will be right over here to answer any questions that you have thank you very very much [Applause]
Info
Channel: Amazon Web Services
Views: 13,730
Rating: 4.9379845 out of 5
Keywords: re:Invent 2018, Amazon, AWS re:Invent, Databases, DAT336, Amazon Aurora
Id: 4DqNk7ZTYjA
Channel Id: undefined
Length: 55min 5sec (3305 seconds)
Published: Tue Nov 27 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.