AWS re:Invent 2018: Deep Dive on MySQL Databases on Amazon RDS (DAT322)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
I run engineering and product for artiest my sequel and arias mutiny be and I'm joined by my colleague Tryon who's the lead product manager for both of these services welcome welcome to reinvent thank you welcome to reinvent it's it's such an exciting time of the year for me for all of us you know it's a great opportunity for us to connect with our customers and partners in person it's also a chance for us to share what we've been up to through the entire year before I get started perhaps a quick show of hands how many of you use Amazon our yesterday ok good number and how many of you use either our dias my sequel or mariadb terrific probably a little bit more than half the audience so for the benefit of those who are less familiar with RDS or just getting started on the aid abuse journey Amazon RDS is it's the portfolio of manage relational databases that we offer to our customers the very strong believers in customer choice and so we offer you a range of options all the way from commercial databases that are created by other vendors you see that on the right of the screen through to the most popular open-source databases my sequel Mario DB and Poe stress and also our own cloud native database which we call Amazon Aurora which is my sequel and posters compatible the several talks this week that cover all aspects of the service I highly encourage you to go to all of them or as many as you can but in this particular session we're gonna focus in on two of these engines my sequel and Medi DB which together they represent the open source family of my sequel engines that Amazon RDS supports I'm going to start with something fairly basic which is why I run my sequel now I imagine most of you who signed up also from sort of the question earlier already used RDS or to use my sequel so it must seem a little strange that for this audience I'm starting there the reason I'm doing it is there's been a ton of innovation both in my sequel and also in Moorea DB in the last year and it continues to be a very strong choice for a customer so I want to put a little bit of a spotlight on why do customers choose my sequel next my colleague Tryon will talk about the benefits of running my sequel or mariadb on RDS versus say on crevices if you have any workloads running on-premises or even self managing it yourself on ec2 and who closed with some of the tips and tricks that we have picked up both from being amongst the largest my sequel fleet in the world but also looking at our customers who are amongst the largest my sequel users in the world so why my sequel it's really three things the first my sequel is the most popular database in the world and MariaDB is not that far behind it the second there's been a ton of innovation if you look at the two new major releases that came out this year my sequel 8 MariaDB 10 3 there's a lot to be excited about lots of cool stuff happening in the engines in the ecosystem around it and the third thing it's open so popular that's kind of a weird thing right it's why do I care about popular you know when I when I hear the word popular and reminded of my favorite musical Wicked there's a song called popular right the good which is trying to make Elphaba popular and it goes something like it's not about aptitude it's the way you're viewed it's very shrewd to be very very very popular great lyrics fantastic song great inside of the human psychology but it's mean is that a good reason to make a database right and it is popular my sequel is the number one use database popular database according to Stack Overflow no surprises here and Maria DB is not that far behind right what's interesting about Maria DB is it's actually being climbing the charts you're near it's actually now ahead of Oracle it's number eight on the list and if you look at just professional developers the results basically the same and the reason this matters the popular matters is that you have the assurance that you have a ton of people pounding on the code looking at it using it in every possible workload that you may want to use it in and as we all know the best disinfectant is sunshine right when you have a lot of people looking at the code a lot of people exercising all parts of it you end up with highly exercised and stable code so that's the first benefit of using my sequel or mariadb the second benefit is that there's a large ecosystem of tools you know implementation providers support providers that have grown up around my sequel you know just to use a few examples a lot of our customers use Burke on our tool kit and for example for online schema changes a lot of our customers use monitoring tools right we offer something called performance insights but you can also take data dog vivid cortex percona has its own thing for most of the things that you want to do with the database there's a really neat available ecosystem that's the second benefit of having something that's popular and the third one is there's a large community of free resources that you can reach out to if you need help and there's also large talent pool right RDS as a managed database but it's still very helpful to have people in your organization who understand database as well databases have a very large surface area and so being able to access that talent pool is also very helpful so that's popular that's the first reason many customers continue to use my sequel and it's sort of you know self-feeding virtuous cycle but popular does not really answers of the more fundamental question are these databases any good right or are these just you know Emperor's with no clothes these are good databases like I mentioned there's been a lot of innovation that's happened in the last few years and my sequel and Maria's per each other to keep driving more and more innovation and if you just look at last year my sequel IDO Maria 10/3 they both have a lot of very interesting features and I'm going to dig into them so let's start with my sequel this is somewhat old news you know my sequel 5:7 has been around for a while but a lot of her customers are still migrating from five six two five seven you know five six to eight which we just announced support for a last last month and some of the big drivers for going to five seven from five six our JSON that's probably the biggest driver for using my sequel five seven but there's also things like spatial indexes you know it's more of a nitch feature but for the folks that need it super critical when you look at a tow you know it's the cupboard is a lot more full there's a lot of exciting functionality that's introduced in a tow there's a lot of interesting availability changes that have happened performance changes manageability and security changes with my sequel IDO just looking at the functionality right five features stand out what used to me and also when I talk to my customers very briefly common table expressions and window functions make it simpler to write more complex queries and in many ways this sort of bridge some of the gap or narrow the gap to add C sequel Ida Postgres JSON does the same kratos introduced in five seven the team got a lot of feedback and there's a lot of interesting new improvements to JSON functionality in my sequel eight beyond this when you look at spatial indexes which are again introduced in my sequel five point seven there's a lot of new stuff under the covers in terms of more than 5,100 spatial reference systems the way to think of a spatial reference system is it's a geometry and you have built an index of built-in functions for example to do what's the distance between two points right so it makes it much much easier to be a developer and use the database and the last you just miss me seem like a small thing but when I talk to customers in Asia and other places where a Latin is not the default character said it's a big deal right so to have default multi-character multibyte character support i'm going to dig into the first three because these are things that you know a lot of customers historically have looked at my sequel and say hey i'm missing this functionality and now there's a lot of excitement around using my sequel because it's not possible I'll start with common table expressions so what are common table expressions for those of you who are familiar with redshift Amazon redshift the data warehousing solution or Postgres which ratchet itself is built on a common table expression is is the width Clause right it allows you to name is sub-query and then use that name again in other parts of the query so it's very convenient it makes queries more readable easier to write you can chain them and you can refer the multiple times but perhaps most interesting it actually improves performance query performance and it also allows you to do things much simpler like recursive queries so I'm gonna dig into these last two points right and you can see the example here with the width with Clause so why does it improve query performance this is query 15 from D PCH benchmark right not nothing new here you have a view which is revenue 0 you define it upfront and then you use it twice in the query once in the from clause and once in the where clause right so this is standard my sequel before CTAs CT for common table expressions with common table expressions the query largely looks the same so this is not one of those cases where you know readability is fundamentally different you put a width width gloss you define revenue 0 and then you use it again twice and sort of the main query so not that much difference the difference is in how it's implemented without CDE what's happening here is you materialized revenue 0 the view twice once for the front claws and once for other where clause and what this means is you're doubling the storage you're also consuming more time to consume to to generate those views and you're also increasing the contention on the table on which that views built line-item right so not the most optimal way to actually build this view when you use CT ease or common table expressions you only materialize the new ones and then you use it again how many other times you actually use that view in the quake I'll use a common table expression and if folks have run benchmarks on this exact query and there's an order of there's a 2x improvement in query performance time so CTS are not just convenient in terms of helping you express more complex queries they can actually drive query performance the other part of course is it does let you Express complex queries write recursive queries are really hard to write in Stan you know pre-pre CT ease and this is one example of why you'd want to use a recursive query so think of an example where you know you have sales data and each row is how much you sold on a given date and let's say I want to roll this up by date and I want to be reported the way I'd want to generate this report is that every date should have some entry now the problem with you know grouping query is that let's say that I have no sales on in this example October 2nd October 6th you know the ones of the pink arrows you would effectively not have those results show up in a group by query right there's no entry for those dates there's no group by to show up and so you'd have to do some sort of a join with the ocean table to recover those data items which is complex with common table expressions you can easily do a recursion where you generate all the dates in an interval you can do a left turn with the sales table and then you can get the results you want this is one example of where you'd use a recursive query and again CDs make it very easy to do that another example perhaps a more classic example is that you want to get the list of all the management chain in an organization here again you can use CD tried with the width recursive clause you can define a table where you start with the base case you know max here has no manager so he's you know the top dog and then you recurse through the table to get all the other reports right and you work your way through the organization and then the query itself is very simple you just select from the table you order but path and that gives you the results that you see on the right so common table expressions closes the gap Jancy sequel closes the gap to Postgres more performant very helpful in writing recursive queries and simplifying creating query design I'll switch to the other big syntactic functional capability change in my sequel eight and this is window functions so the use case for window functions is you want to run an aggregation but you want to do it in a way such that you don't lose record level data so let's look at the you know the the DMS that you have at the top the details and emails it's again a table it's a sales data table and each record has the employee the date of the sale and how much they sold and we populated with some data for two employees Otis and max if you want to find out how many sales oh how much the total sales was by employee you can of course do a grouping but in this case you lose the record level data so you know Otis sold six hundred dollars a product in the period that's involved Maxwell a thousand but it doesn't actually tell you what the individual sales were if you want the finer granularity of detail again you'd have to do a self Shore and you'd have to go back and recover sort of the original raw data right perhaps you know you do it on for example on the employee name but that again is complex it's not very performant to go back and do a join and this is where something like a window function in come in very handy so what you see again in the purple is the over and partition by clause for those of you familiar with Postgres or this will be very familiar syntax this again simplifies the way you can do aggregates while retaining each row and so you see the results on the right on the bottom right where I can we have preserved every single row in the core table but then the very right column tells you what the aggregation is this is a bit of a dummy example you know something a little bit more advanced is if you want a running total in this case we order by date and then we sum from the beginning of the table all the way through to the current row and so now you see when you look at the result tables again you got the transactions or the records order by date and it's a running total of the total amount of sales by both of those employees an even fancier example moving averages in this one we order and group by month and then for any given month we use the preceding and following clause to give you a moving average of three a three-month window so again very powerful stuff both CDs and window functions in terms of the kinds of queries that you can now execute with my sequel eight let's go back to JSON right so JSON was the big news in five seven and there's been a tremendous amount of improvements in a doe versus what was previously available in five seven the first one I'd call out is JSON cable very powerful function to take a JSON object and convert it to a relational table and when you do this you can use a table anywhere else in your query just like you would any other table very powerful the next set of functions are the aggregation functions in some ways this is the inverse of the JSON table function where you can take a table and you can collapse it and you can create JSON objects for some fields I'll give you a very specific example so think again of a table where you've got two columns you got an employee and then the second attribute is all the direct reports if I want to create a new table that tells me group this by employee and give me a list of the direct reports you can use either the add a function of the object function the the aggregation functions and what it does is basically collapses all the employees and it gives you a list of the the people who report into them right there's other functions I won't go through the full list there's functions to merge objects this functions to simplify query syntax and also utility functions for example for printing so again a lot of changes with JSON so those were some examples of the very powerful new capabilities that my sequel 8 has from a functionality perspective write something that makes your life as a developer much much easier but what if you're more on the operational side right DevOps or your database administrator does IDO actually add to the equation in terms of improved availability because in many ways when you use a database for your production workload perhaps the thing you care about the most is it never goes down or if it does it comes back quickly right and again my sequel IDO has a lot of interesting new features that have come out the 3-yard call out our instant add column unified data dictionary now this may seem a little bit under the weeds but I'll go into why this is important and also very important atomic and crash safety deals I'll talk to each one of these in in order so the use case for instant add columns right anyone after you you have large databases you millions of rows billions of rows you're used to this details take forever right and many of these videos you got a go allocate temp space you got to get locks on the table it you know I can run for RS depending on the size of your database house the size of your table and that's really painful degrades performance creates contention and as we've seen a little bit if you have a crash you sort of out of luck incident at column is just a fundamentally new way of looking at this rather than going changing the data it's basically just a metadata change and you don't need to have a material lock when this changes in the storage engine and you don't go update the data itself so things basically instantaneous there's a number of different types of details which you see sort of in the middle of the slide there that you can now apply instant EDL's to and the nifty thing about this is that if for whatever reason you can actually execute a DDL using the instant algorithm it fails fast and it fails explicitly right so you aware of it very very cool feature a lot of customers want this and it's now built into my sequel the other thing I mentioned was a unified transactional data dictionary know what is this there's a lot of metadata that my sequel stores the challenge historically has been that it's been stored in disparate places and it's not transactional so all the acid properties right the commit rollback crash safe guarantees that you get an user data did not typically exist for metadata and that's a dangerous thing right you don't end up with a corrupted table a corrupted database with this new improvement there's now a unified data dictionary that's stored in in ODB the unified think helps because it you know simplifies things like object caching you know hopefully will improve sort of the rate of innovation but the fact that it's in a DB the fact that it's transactional is critical because now you have very much the same guarantees that you would expect for any of your other data and this is sort of the the killer point here is that you now have atomic and crash saved EDL's right you're not going to end up with this inconsistent state if you crash at the wrong time so that's not availability let's continue our tour and turn to performance lots of changes under the covers to improve performance in mice equally specifically in inner DB they've done a lot of work to improve read/write performance and performance under contented workloads in our benchmarks and these are informal benchmarks and even if they were not informal I'd always say take benchmarks with a grain of salt right your workload is different from our benchmark and your mileage will vary but we're seeing two to five X improvement performance improvement versus my sequel 5-7 a couple of other very interesting features one is descending indexes so what are descending indexes now historically in my sequel you could always traverse a index in the opposite direction of what we spelled the issue is that there was a performance set with descending indexes now you can traverse it in a forward direction not pay the head and just as important your optimizer can now use multi column indexes where the best scan method uses ascending in one direction descending in the other you're not forced to make the choice right so descending index is very powerful depending on your workload the other nifty feature I'd say is invisible indexes so in this you can turn off an index so that your optimizer doesn't see it and you see what it does to query performance the important thing is you're doing this without actually in a non-destructive way right your index still exists because if you actually had to delete an index and you realized hey I needed this index that's a lot of work that you'd have to go back and recreate the index so invisible indexes are great in terms of playing around and seeing what happens if I turn this index what are the query plans what's the performance like and the inverse also holds you can add an index in a staged way and you can keep it hidden and then you can make it visible when you know that it's actually filled helpful so if you have multiple shards for example you can try it on some charts before you roll it out across the board there's other very interesting changes under the hood with a cost optimizer one example is that with my sequel 8 the cost model now is aware of how much of your index is in memory now previously it would assume that all of the pages were requiring an IO which is not always the case so this is a much smarter cause cost optimizer there's a few other things like resource groups which I'll talk about in a second but it allows you to assign threads to specific resources so you have more control about kind of isolating your different workloads and there's also a ton of improvements under the covers on your application the chart that I showed before we ran some basic it's a custom benchmark to read ran just to kind of play around with my sequel five six five seven and eight the black line is eight here and it performs a 2 to 5 X 5 7 now you'll actually notice that 5 7 has much worse performance than five six but even so eight outperforms five six by by a healthy margin I mentioned those groups it's an interesting feature I think it's quite enough T the idea here is that let's say you have two types of workloads you're running on your database one is a reporting workload and others a batch workload and you want to isolate them from one one from the other you can create two different resourcing groups of these and you can say one set of V CPUs is assigned to one resource group another set of DCP users assigned to another resource group and then you can assign different threats to these resource groups and what happens if you look at sort of the bottom right is let's say that at some point you want to change the priority of let's say a reporting worker because you're running a huge batch job you can reduce the number of P CPUs that you have assigned to that resource group you can also turn down the priority right hired as lower priority in this case and that gives you a lot more fine-grained control over how your resources are be consumed across your different workloads so that was in performance security and manageability are also obviously very critical for us for you for you know anyone that's running a real workload and the key innovation here at sales around roles you know there's also a few other things like password strength and observe availability but the idea behind roles is that it's a named collection of privileges pretty basic idea right in this example that you see here you could have an application where some set of folks have read acts some set of folks have readwrite access and what you could do is you could define these two different roles read read write you could give a set of database permissions into each role and then as each user comes on board you can decide which tool you want to give them access to pretty fundamental but makes access management a whole lot easier than it would be otherwise so that's that's my sequel 8 in a nutshell right so there's a lot going on lots of exciting new developments from a performance standpoint availability standpoint functionality security manageability and it's available on Amazon RDS as of last month what about Maria DB I know again tend to relatively old but I threw this up because it's sort of an interesting observation a lot of the new features in my sequel 8 have also been and MariaDB tend to a common table expressions window functions it's really good to have these two different sort of vendors communities looking at each other hurting each other to drive more innovation now we also support Maria de 10 3 as of last month it's the latest major version and Maria DB has its own set of priorities right I'd say the four major thrusts of de innovation are around Oracle compatibility right you would expect matadv to do this and not my sink well and it's great that Maria DB is focusing on giving customers who are currently an Oracle a open source alternative there's a lot that drives this is stuff around the parser stuff there are new functions data types go through some of them the other thing they have done that I think is very interesting is what they call temporal databases it's a new construct where essentially they maintain version tables and you can ask a query as of a historical time and I'll give you some examples of that as well they've also introduced user-defined aggregates and just like my sequel 8 they also have add-ins to column so let's dig into the oracle compatibility so the big addition with MariaDB 10 3 is this new sequel mode that they call oracle right you can set the compatibility mode of Maria DB to be sequel mode equals Oracle and what happens when you do this is that it's able to understand PL sequel con language right syntax without having you change it to Maria DB native stored procedure language and this is huge all you have to do set the sequel mode and the parser automatically addresses a large subset of the syntax of pl/sql so you don't have to go change your stored procedure code for it to run on Mary DB and you can see sort of the examples of things that they handle here right it's a very comprehensive list and you can have both stored procedures in the PL sequel container syntax and also mirrored V's native syntax co-op interoperate with each other right so you don't have to choose between one or the other the other set of changes that meta DB continues to make as it strives more and more Oracle compatibility is things that are more data types right so they're introduced sequences the best way to think about sequences is its auto increment plus plus rated again works along with auto increment you don't have to make any changes to your existing tools or processes they've also introduced a new data type called row which is think of it as a vector right it's it's basically a tuple and you can use it anywhere that you could use a normal variable anywhere where a stored procedure variable would show up and they've also been adding functions some of these may seem small but there are getting important if you're moving off of Oracle and on to an open source engine right so this example is for intersect and difference it does what you'd expect the syntax is pretty straightforward one of the things that they have done in terms of driving greater compatibility with Oracle is they've introduced invisible columns it's a little similar to invisible indexes you can add columns to a table and you can mark it invisible and what this really does is it makes it optional for applications to use them or not use them so it's a really good way to evolve your database without breaking your existing apps it's also a really good way to deprecated columns that your new applications may not require but your existing applications do right so it's a it's a it's a great sort of feature to be able to prepare your database for an upgrade for a schema upgrade but we actually have to go do anything with your applications the last thing I'll point out with Oracle compatibility is the changes they've made to their cursors here the viscerally made cursors taken parameters so the example you have here is you have a table t1 and there's a cursor there's a stored procedure called p1 now the goal of this is that you're able to pass in a interval into the stored procedure right which is the men and the max the two arguments which then get pushed down through the cursor so that when you're actually fetching sequel results that filtering is happening for you automatically and again that aids with Oracle compatibility the other very interesting feature in Moorea DB as I mentioned was temporal databases the idea here is that you can mark a table as a system version table and where this happens it effectively maintains all the history of all the changes that have happened to the database everything is time-stamped and you can use the as of syntax which you can see some examples on the slide here to ask a query as of a certain time or as of a certain interval and this is again very helpful in terms of data analysis right you want to go back to retrospect on trends very important of the forensics maybe something went wrong or you had a data change that you're not expecting so it's a good way to audit and look at lineage of data it's also you know neat way to do pointer time restore so this is the other big feature in Madurai DB 10 3 and again where a DB 10 3 is now available on Amazon RDS so clearly there's a lot going on with both my sequel and with Maria DB in terms of innovation right the third big reason is both of these engines are open and this matters so a lot of customers right customers have suffered with lock-in in just last year one of the two commercial vendors that I showed on an earlier slide just overnight they double the cost of using their commercial database on AWS I mean who does this to their customers right I mean someone who's much more focused on driving finances rather than obsessed with a customer's success and so for folks are being burnt this way or folks are afraid of being burned this way open matters sort of reminded of the song by George Michael Wright I don't belong to you you don't belong to me it's very liberating every day we want customers to choose these open source engines or I choose RDS because we've earned your trust we're the best solution for you not because you're locked into us so with that I'll hand it off to China who will walk through the benefits of Amazon RDS [Applause] Thank You Sheree SH all right how everyone so in the previous segment Suresh what he really showed you was how when you consider database options my sequel databases stand out so they have been battle tested for close to two decades now on workloads which are as critical as you can imagine right and with recent innovations in 10.3 MariaDB 10.3 my sequel 8o it's also becoming increasingly easier for you to migrate from let's say commercial databases to my sequel databases recent innovations and from a form of deeper json integration geospatial functions also position my sequel databases really well for modern app development so as suresh you know ended with with the reason behind my sequel be using my sequel databases and for that matter any open source or non database is that you can operate it very big you like right you have the flexibility of running your databases on-premises on ec2 if you self manage them or run them on aureus right so in the next segment what I'm going to focus on is the reasons essentially why our dias my sequel makes sense for you how RDS can help you run your my sequel database is better and help you improve your productivity do more with with your resources and the premise for that is really straightforward with RDS we do the heavy lifting for you and by heavy lifting what do I need it's stuff like packets patching monitoring replication high availability these are things that you have to think about manage yourself worry about right not on set it might seem simple to operate but over time customers tell us that these become really cumbersome so we do this heavy lifting for you so that you get the time to focus on what really matters for your business so think of if if you are an application developer right what do you want to do you want to build awesome apps for your customers right if you are a database administrator you want to spend time with your development teams make them more productive help them improve their queries design scheme out of that so with that premise what I'm going to focus on as a set of three reasons same to see three reasons why you should be using my sequel databases for those reasons why you should be using RDS myseif all databases for and out start off with popular so a little bit of context here back in 2009 when RDS was not shown he WS it was one of the first services and it was built on a core principle to help our customers by then we had developed over a decade of experience running databases for world's largest e-commerce website right amazon.com and we thought how could we take these lessons that we have learned and use it to help customers run their databases AWS customers from their database is better so that's how RDS came into being and since then we have you know covered a lot of ground now hundreds thousands of customers use RDS for their databases and you can imagine like any industry vertical and we have customers using from those verticals using RDS so think about internet scale companies like Netflix or Airbnb former companies such as you know bristol-myers Squibb and Merck and then we have you know manufacturing companies G using RDS so what do we get with this right with this popularity the fact is that we are likely running one of the largest my sequel fleets in the world right no one individually can cover the service area that VC we come across which means our code gets exercised very heavily the point unit the Hornet our tools are purpose-built for what you can do in the cloud it also means that our operators who are looking at all these issues at that scale are engaging very closely with our customers developing solutions for that I think of a security patch maybe or maybe an oasis and as we discover these issues we go back we have a really phenomenal world-class team of engineers who go back and automate this so I see this as a virtuous cycle the more databases that run on RDS the more surface area our team covers the mold lessons we learn the more tools we build and we put it back into the system right so it becomes better and better what one example of this is recommendations right so it's not just limited to automation as our operators learn more about running databases at scale we take those experiences and share them with you as best practices so RDS recommendations is the future will be launched in the last year and I'll touch upon that more essentially shares these practices in an easily consumable way to you on on the console so with that squared away I want to focus on the innovation and the stuff that we have done to make it easier for you to run your databases on RDS and the first point there is automated zero recovery point failover across availability zones right so let me clarify a little bit there what it implies is our multi availability zone or multi az Department scheme now and availability zone in AWS week is ashen essentially a fault isolated data center right so we synchronously manage a standby for your database in another availability zone in order for you or your applications to tolerate failures and the failover between the standby and the primary is managed by us it's automated so it's transparent for your applications the other great piece you know functionality that we offer as read replicas and we need replicas across regions aw operates across the globe multiple regions right so previously it was unfathomable for most companies of more scale right to deploy data centers in you know 14 15 countries impossible with let's it's straightforward that's simple for you right it's it's a click of a button and you can create a read replica and their region AWS region some of the reasons why customers create these 3d replicas are to offload you know who read pressure from your primary database you can do that you can use it for read scaling or serving reads closer to your end customers and finally view of course right it's it's it's a critical tool I'm not sure if we'll ever lose a region but if that ever happens you can use these read replicas for desert and disaster recovery as well and then we offer automated backups backups that he could continuously for you every five minute interval which means we can give you capabilities like point in time recovery so within your backup retention window you can flip back your database to any point in time with the granularity of five minutes and on top of it you also have the option of creating manual snapshots right 0.8 times short of your database which you can keep for your record-keeping or if you want to use it for disaster recovery you can copy it to another region or for your confidence at the very least another thing that we added support for in the last year is new is that now we allow you to scale up your database up to 32 terabytes and this scale is elastic that is you can do it online you can change the size of your database ability this is for like there's no point if if your database gets full you are not available anymore right so you want to maintain availability even when you're trying going to hit the limits of storage in your database so we allow you to seamlessly with the console all right here as you can just add additional storage up to two thirty two terabytes that support it so with that I do want to double click on the availability pieces in terms of multi easy deployment particularly because just because this option is there it doesn't mean that your app is designed our configured to to run that this configuration in a seamless manner right so to give you a little bit more detail into how multi ez setup is configured for you right so you essentially get one instance when you create a multi ez database instance in our yes but under the covers there's a standby instance available running in another availability zone and the data between the primary and the standby is replicated synchronously at the block storage level as you can see in the diagram the storage volume is where we are copying the bits and bytes off to the standby so which means it's not logical it's a physical replication and monitoring all this is an RDS you know third party essentially as an observer to ensure that there's no confusion as to who the primary is and who the secondary is or the strand is the other key important function that observer does is that whenever let's say there's some sort of unavailability in the primary whether there's a network outage you have a bad host something going on in that availability zone RDS process would automatically detect that and failover to your standby and here's where I want you to also go and in short lag you test this failover right you can also initiate this failover through the console or through the api's and with that what you want to make sure is that your apt using TTL time to live values for the DNS and wanted setting small enough that when the failover happens you get back the IP address of the now promoted standby I do you want to be able to connect to that stand by your all connections of course would break so your app would hopefully return minutes retrying make sure that the detail values are small enough you can thin your IP address once that happens of course your app reconnects to the new primary and under the covers RDS flips the stand I stand before you right now we creating we flip the role of the old primary into standby and really establish our application which means you continue to have a high availability configuration the other aspect I want to talk about in terms of availability and you know dive deeper into is how you manage backups and snapshots on how you can I do point in time recovery from those snapshots so as I mentioned previously you have an option of automated backups right and these backups by default retained for you up to seven days and what we essentially do with automated backups is that every day around evening that's when we take a full snapshot of your database and every five minutes we take this transaction logs and we ship it out to s3 for retention so whenever you are doing a point in time recovery we find the nearest snapshot that we have but we play black the transaction lots and you have your database pointing to a state in the past one key benefit sort of tying back to multi AZ deployment is that backups and snapshots have no impact on the performance of your database and you ask why that's because backups are taken from the block storage device in the standby right so your primary remains fully available to serve your application both fields so that's another reason you may want to use multi easy deployments the other piece I wanted to talk about is of course manual snapshots right so you also have the option of on the console or through api's take a point in time in a view of of your database and store that for many times in must do that for their own confidence sometimes they do it for compliance reasons they might need to you know go back to an old state at some point another interesting reason why I have seen customers use them for is disaster recovery so you can copy it over to another region and for whatever reason if your database becomes primary database becomes unavailable you can used our snapshot to restore your database it snapshots can also be shared with other accounts all right so this I've seen being used for situations where you have a separate devtest AWS account you want to use the most recent production database but of course I want to impact it and want to perform maybe DML arterial operations you restore a snapshot and another account do all that as you want to do there and eventually you know make that change on your production accountant moving on to security and manageability again many options that we offer many switches and knobs that you can use one key thing fundamental thing is from ground up RDS is designed to be secure it uses the same you know schemes that you apply an ec2 you get your VPP virtual private cloud you get your own network isolation there right one best practice I want to share here is that when you are setting that make sure that you are not opening up your production databases for public access you only want your regular applications to be talking here your database production database another key capability here is I am database authentication I am or identity and access management as a service that you use to control and grant access to resources within AWS and much like you use I am users and roles within your AWS account you can use that to also manage permissions within your database that's what you get with this so it becomes simpler for you to sort of manage authentication into your database with that many customers are concerned about compliance as we are compliant with Pedram HIPAA many other regulations if you want to look at it and then coming to you know scaling both instances and as I mentioned storage compute scaling and storage scaling you can do it with a push of a button similarly managed urban log replication we manage it for you recommendations I already touched upon it briefly essentially we share best practices with you and the final thing that we added I want to talk about and the last year is log upload to cloud cloud wash logs right so audit general slow query and error logs oftentimes customers want to retain them for longer durations so now you can stream them out cloud watch and retain them for whatever deviation mode with recommendations as I was saying we are really taking the best practices we observe and share it with you it could be that your database is not encrypted maybe you are running an outdated configurations you can use the notifications you receive on the recommendations consoles RDS console and act on it you can schedule any remediations for the next maintenance window or immediately so it's just a point-and-click operation again on the performance front we added the latest generation of instances m5 instances are general-purpose instances our five instances are coming soon for my sequel and MariaDB the other piece is performance insights with performance insights you of course can dive deeper into your database really see what's going on right what queries are running what queries are taking the longest amount of time which hosts are throwing what kind of workload if there's some sort of DDoS you know happening there you can view that through a really nice dashboard similarly you can find out where contention is going on in your database right which queries are ill performing did data is retained for you for free for up to seven days beyond that also for a small fee you can regain all this information and with that the next thing I want to focus on is openness much the same way you can you know import data into RDS right we provide tools for that data migration service is a service you can use to migrate data into RDS from various different sources there illustrate there I would encourage you to use the service in situations where you are doing a heterogeneous migrations and what that means is let's say you're migrating from a non my sequel complaint database it could be you know sequel server oracle whatever if you are doing homogeneous migrations that is migration from my sequel source you're better off relying potentially on the native my sequel tooling and it's my secret number and order also you could use per core naturally be backup third party tools like that and as easy it is for you to get data into aureus we also make it easy for you to take the data out right so you can use DMS or the my sequel tooling to take your it out to all the different systems I mention there so again pointing back to you know George Michael right I don't belong to you you don't belong to me for whatever reason if you don't like RDS you have the option of going out and taking your data out if you want with that what I want to move on is hopefully I've convinced those of you who are not already running your my sequel databases on RDS do so now and my favorite things as as a product manager is the interaction I have that customers you know understanding their problems and giving them solutions and oftentimes these solutions are not about building new features it's about providing guidance on what already exists so much upon very quickly upon six frequently asked questions questions that we often get I'll start off with the first question if you are coming out to RDS new or if you are already running your data bases is like what instance what what type of instance should have use so we offered three broad families of instances and it has grown over right so we offer tea trees now when you think about tea two and three three instances these are both stable family of instances what it means that you get a certain set of CPU credits which you can monitor through cloud watch and those credits you can consume bursting right you also get moderate networking with that we typically recommend using these instances for dev test workloads you also get access to free tier with t2 and t3 instances so you can run a key to micro instance or teetering micro instance for a year and try out our DSN see how it works for you and family of instances of which now we support m5 our general-purpose instances and really meant for query intents are rather compute intensive workloads right so if you are doing heavy writes these instances are great for that with m5 you can now go up to 96 V CPUs and 384 gigs are front right so that's tremendous that's amazing the next family is the r family which is memory optimized you get twice the memory per be secure right and but r5 will add support for that soon you will get again 96 V CPUs but 768 gigs of ram writes if you have a workload which needs a very large working set in the memory these are the instances for your right it's tremendous like close to you know 3/4 of a terabyte you can store in memory also use that for query intensive workloads where you need that kind of memory footprint the other question I get is like when should I use multi AC versus three tropicals my simplest answer is it's not either/or right it probably makes sense for you to use both multi is great for you know maintaining high availability we are of course doing synchronous replication there but the point to note there is your standby is not available for reads your only pro only the primary instances of for weeds whereas with a reed replica it's more general-purpose their application is asynchronous you must note here which means that depending on your workload and depending on where your reader Flicka is I mentioned that your reed replica could be in another region it could be in another is it could for that matter be in the same easy but depending on your application lag how for the behind it is from the primary you could lose some data if you happen to fail over but it's great for situations where you want to offload you know Reed pressure from your primary or you want to use it for serving Reed's closer to here and customers or disaster recovery another key thing to note here is that engine version upgrades happen independently the three drop Lucas that is you can upgrade your Reed replica first see if it works fine with your application if if it does you know test it out then you know you can upgrade your primary as well and be satisfied that it looks fine with multi is the failover is automatic as I was mentioning it we detect and we failover we manage it for you Reed replicas it's manual so you have to make the choice if you do want to failover you can go and do a manual promotion especially if you're doing it for disaster recovery it might make sense for you to do that anyways right again automated backups and manual snapshot automatic backups are great you know turn it on keep it for a duration that you you know prefer to have the ability to go back in time right in in terms of operational pain right so if you think that seven days are fine that's the default go with that you can go up to 35 days a new capability that we introduced very recently is the ability to retain automated backups even after your instance is deleted what that means is that by mistake if you delete your instance you can still perform point in time recovery with your automated backups within the retention window so if you have configured seven days up to seven days you can go back and get your instance back data back manual snapshots you know they retain for however long you want also used for you know compliance purposes disaster recovery purposes another option their customer question that I often get we're customers is how do I secure my day today so I talked about veetc you have access to security gross consider them as your firewall rules or make sure that only authorized instances are accessing your database right I am have touched upon that with k-means you can key management service what you can do is you can either bring in your own keys or you can use chemistry generate encryption keys and encrypt already addressed and finally of course SSL is used for encryption for anything that goes over the wire monitoring databases is again a very you know key aspect if you're running databases you also want to know what's going on in your databases so CloudWatch metrics enhanced monitoring make sure you have turn that on you get up to 50 more than 50 you know metrics emitted to cloud watch cloud watch logs i touched upon this briefly once you have these you know audit logs general logs in cloud watch you can also set a monitoring on top of it right you can say that user acts should never have access to table why and if that ever happens to tell me right so you can set up things like that alarming is similar to again you know i talked about it monitoring tools others cake capabilities performance insights also gives you api's right so if you are used to using third party tools on-premises you can use those tool to ingest the data from performance insights and get visibility into your cloud infrastructure another new capability we added is with RDS events are dias events are special types of events that are published on an SNS topic on which you can of course set up email text alerts what we are now also doing is we are omitting this out to cloud watch humans so now you can use cloud watch events and programmatically react to thing happening in your database things like storage full or backup failure right backup was not taken properly and cloud wash even makes it really easy for you to point it to a lambda function or you know and easy s for automation on top of that few things that I really want to talk to you about a few other sessions we have a lot of great stuff packed into the week as you know these three sessions Amazon Aurora as mentioned it's you know our database built for the cloud if you want to learn more and what we have done in the recent past I do attend that session is today and then migrations we talked about data migration service and there's a topic on that too and finally what's new in a broader relational database service space as well so with that thank you very much thank you for joining us today [Applause]
Info
Channel: Amazon Web Services
Views: 3,118
Rating: 4.8823528 out of 5
Keywords: re:Invent 2018, Amazon, AWS re:Invent, Databases, DAT322, Amazon Aurora, Relational Database Service, (Amazon RDS)
Id: b5vvW0l78Vs
Channel Id: undefined
Length: 60min 27sec (3627 seconds)
Published: Tue Nov 27 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.