Scaling Your Application with AWS Relational Databases

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi everybody my name is Gerren Brisbane and as we are in the last section I know that the vanish off - Lita saga English Mita so I'm gonna stick with English instead of my terrible German and thank you for living with my my failure to speak your language as I visit your country so in this section we're gonna talk about relational databases now this is supposed to be an advanced session so I'm not gonna explain what a table or a row where a column is everybody knows what that means right yes you've heard of a table this will be a really tough section on you if you don't know what a table is okay good so everyone here using some relational database right now okay which ones hello let's do a common ones so who's using Oracle okay how about Microsoft sequel server okay DB - one lonely db2 user okay how about Postgres there we go my seat okay something I didn't name yell it out what are using dynamo DB is not a relational database ámoreá DB absolutely you can argue whether or not that's a variant of my sequel right but which today it is but I think over the next few years they're going to diverge we'll talk a bit about that okay so relational databases are popular for a number of reasons for me personally 21 years ago when I was a junior architect at IBM I got to work with dr. Ted cod who's a person actually invented relational databases and my first question for him is why did you call this normalization it's a very strange word and I did not expect the answer he gave me which is when he was inventing the relational database what was in the news was President Nixon normalizing relationships with China so he said if America and China can normalize the data can normalize and that's why we have normalization and de normalization and the relational databases so there's good and bad things about relational databases later today we'll talk about the non-relational Revolution but you know there's argument that the best thing about a relational database is also the worst thing about a relational database and that's the joint right joins are great because they let you do flexible things joints are horrible because they add a lot of complexity so is a joint a good thing or a bad thing yes it depends all right those who were here earlier know that if you're an architect you can answer any question with it depends and usually be right so in the Amazon world when we started doing this it gets back to kind of the roots of AWS there is a rumor that AWS was created because we had extra compute capacity and we just wanted to find a way to sell it not true where AWS came from was when Amazon started growing in a really big pace around 2003 2004 2005 it was really hard for us to get hardware and systems and equipment to match our growth the vendors were not able to keep up with our pace and they weren't able to give us what we needed quickly enough we ended up having to create a lot of our own processes and procedures and we were thinking hmmm well maybe we can make it easier for the next people who want to grow really fast so web services developed deliberately and it was developed in order to get rid of what we call undifferentiated heavy lifting the things that are a lot of work but really are every company has to do that don't really make you different than anyone else so we started putting out services that did that here's a good question is anyone know the very first AWS service sqs yes why did sqs come out before ec2 because that team finished on time and ec2 was late so when we first launched sqs the whole industry said why is this bookstore selling a queueing on demand that makes no sense but then we had ec2 and then we had ask three unit started making sense and then we started getting questions around in a base and people said okay what is the undifferentiated heavy lifting I mean there's some things that are very specific to your database like schema and query but what is the stuff that you spend a lot of time and effort on that you just have to do well back up right and restore but especially back up and high availability high availability is hard anybody here been a DBA ever done high availability right I mean what makes it hard is that there's a thousand little details and what happens if you get 999 details right and you get one wrong doesn't work right because oh I forgot to upgrade the drive the the new the the new driver on the failover side and then when it fails over it doesn't fail over another good piece of that is patching and upgrading particularly the low level system so you know keeping your Linux version or your Windows version whatever up to speed these are all the things that don't really add value but are necessary to make it work so our customers came to us and said can you do that for us in the database get rid of that that low level stuff so we originally started doing that with my sequel because in the early days of Amazon Web Services our single largest customer Amazon had a lot of my sequel and many of our other customers were startups so like rocket in Germany or Airbnb or Netflix and such and they wanted not my sequel so we introduced this my sequel relational database service which has all of those things we just talked about it does installation it does patching it does upgrades it does high availability it does backup and recovery so since we've introduced my sequel we introduced some more entrants we introduced Oracle we introduced Microsoft sequel server we introduced Postgres most recently Marya DB so we have these five different engines that we can run to automate and make scalability easy now we also since then have added aurora but a roar is kind of special so we'll get its own conversation a little later so one of the things that we're trying to do here is eliminate the need for specialists if you do DevOps and you want to have small teams that own and manage services you can't have three people on that team having to do nothing but database backups all day it's not going to be effective so these kind of mounted services let you focus on what's important this has been critical to Amazon's use of micro services in many of our and many of our customers so you'd get more leverage from your team you focus on the things that matter one that really shocked me when I learned this is is I recently visited Airbnb in San Francisco they are currently running nine hundred and forty databases no DBAs they have a small service team whose job it is is to advise other service teams on schema optimization and query advancement but each service team can handle their own stuff because they don't need to do the kind of traditional DBA work so yeah I've heard does this mean Amazon once and put DBS out of business no we want to quit boring work at a business we want DBAs to be able to do data engineering and data analysis and things that had a lot of value so what do I have in here high availability cross region replication and then the other piece this is something you've really found interesting even a start-up of two people in an office now I have access to the same kind of database capability that it used to be only the largest organizations have I'm starting to watch this change the world it's not that long ago that if you wanted to do a start-up you had to go to Berlin or London or San Francisco or New York or someplace like that it's not true anymore we're starting to see a lot more startups in places like Darmstadt in hamburg and you know middle of Iowa in the US and other places that historically I couldn't do it but now I can because of the cloud so which engine should use well price is the same no matter which one you use for Amazon but if you use one of the commercial energy you get to give extra money to either Oracle or Microsoft they all have their advantages and disadvantages but the truth is there's not huge differences between them it's a lot of small differences Oracle is a hugely capable database when I'm running Oracle on RDS you have two choices you can buy your own Oracle license and apply it into the RDS environment they call that bring your own license or you can do what's called license included which means you pay by the hour and you'll be able to see for each instance size here's how much for being Amazon and here's much for being Oracle and for license included so we're bringing your own license you could use pretty much any Oracle license for license included we don't support Enterprise Edition but you can do the standard edition or sp1 or SC two versions of Oracle Microsoft has chosen to only work with us with license included so again I have many different versions of Microsoft sequel server and I can again pay each by the hour excuse me certain amount Microsoft certain amount of Amazon the open source engines there's a little bigger differentiation I think so my sequel was designed about 20 years ago to be a lightweight database for the Internet it's designed to be easy to manage it's designed to not do some of the things the other databases do it deliberately has a weak stored procedure language it's designed to just be a datastore and and and in the you know in the internet world you should be putting your logic not in the database but outside the database in the application layer it is very scalable and we have a lot of people running very large environments on my sequel now my sequel took a kind of interesting turn because it was originally created by Monty Whedon else and a team of developers and they had a company that owned the the copyright it's called my sequel link and my sequel ain't got purchased by Sun Microsystems and they looked what happening at Sun Microsystems I said you know this is not quite the direction we wanted to go and so they the team pretty much left Sun Microsystems shortly after they left sun microsystems sun was purchased by oracle so my sequel is now actually property of oracle and be honest with you oracle has not done that much to advance it since they purchased it now recently oracle has announced that they're going to be putting a lot of new features in so the current version of my sequel is 5.7 oracle has announced that they will soon be shipping my sequel version 8 because that's how oracle does math right 5 6 5 7 & 8 ok but my sequel 8 does have some interesting new functionality that they promised meanwhile by the way the reason was called my sequel is Monty's first daughter's name is my so when they formed a new company guess what his second daughter's name is Maria as Marty's foot out I only have two daughters I can't do this again and when they founded Maria DB they did the same thing that Postgres had done earlier neither post Krishna Maria DB are owned by companies they're owned by foundations so nobody can come and buy them and take them off the market or crush them way things have happened so the Maria DB foundation owns Maria there's also Maria DB Inc a company that makes a living providing support and and development for Maria DB and they're a great AWS partner so Maria is you could think of it as a fork from isequal it's had a lot of innovation last three years and added all sorts of interesting things like GT IDs in order to make transactions independent or in recovery operations it recovers from failure much faster than my sequel it has a client that's much faster than my sequel is fully compatible with my sequel so if you're using my sequel you might consider using the Maria DB client in your code instead of the my sequel client they work together nicely we see a lot of people using my sequel database Maria DB client then there's Postgres so Postgres was a group of developers who are working on a commercial project called ingress which got sold to ca and pretty much shut down so they went and made their own version of it which is Postgres get it this is about as good as the jokes are going to get in the in the linux world so so Postgres is oh that's from the mid-1980s and Mike Stonebreaker and the team on that are pretty much the same people now who were writing at 30 years ago so you actually have a very mature team who have been developing it over time because it's a mid-80s database like Oracle like Microsoft well Microsoft's really based on Sybase which also comes to the mid-80s it's similar semantically so one of the things we see a lot is when people are using Oracle or Microsoft or db2 and they want to move to open source the one that makes the most sense to move to is Postgres because it's the most similar in terms of procedural language and stored procedures and functions in the way that it operates so one thing I'm not going to talk a lot about today but I'll mention on the side is AWS database migration service so that's the service we have that lets you move one database to another now you can use that just to do major version upgrades without taking downtime so for example if you're on my sequel 5.6 we don't recommend upgrading in place to 5.7 because that'll probably break your code so we say take the 5.6 use database migration service to synchronize it to a 5.7 copy let it automatically keep those two in sync so you can test the new one and then once you're sure that your code is has any changes it needs and you can turn off the old one okay great well of course a lot of our customers came to us and said okay great but what if instead of going from my sequel five six two five seven I want to go from say Oracle 10 to 11 all right we support that all right what if I want to go from work 11 to Postgres don't we support that too so in the database migration service it can either be the same engine or it can be different engines and we support a whole bunch of sources and targets so the sources include Oracle and Sybase and Microsoft in db2 and and green plum and cherry data and a whole bunch of others and the targets as you would expect include various sort of things you can do with the club I would also point out that these go both ways so if you are currently running say Postgres on the cloud you say I don't like this cloud thing I think it's a passing fad I want to move all my stuff from the clouds Oracle on my own data center you can use our tools to do that so occasionally had people say why would you make it easy for people to move off of Amazon and our answer is we are not trying to lock you in we're trying to make sure you get what's right for your organization and if the right thing for you to do is move off of Amazon that's right gonna move off of Amazon and we'll help you get there cuz we figure you'll come back when we're the right thing to do or to quote Jeff Bezos always like the same Amazon customers are loyal to Amazon until one second after they find a better deal we like it this way this means we have to wake up every day and say how am I the better deal today this is how we've been able to grow to the size we have and still act like a startup alright so back here I have all of these engines are all good one key difference between them is commercial versus open-source okay first of all open source is a very attractive license price right 0 what's more important is the elasticity and this is the biggest reason I see people move off of the commercial databases I have never had a customer come to me and say Oracle's technology doesn't work well enough I don't want to use Oracle I've had many customers say that the price or the limitations or the licensing of Oracle they're not working for us the same thing with Microsoft because why do you use the cloud one of the key reasons you want to use the cloud is elasticity right so I can use more when I need it unless when I need it if I'm an open source no problem if I'm gonna commercial I'm limited I'm locked into my license so if you need more I got to go buy a more expensive license and then when I need less I still have to pay for the more expensive license when we so when you use RDS with license included that gives you that flexibility with the commercial but when you get onto open source completely you get rid of that commercial problem so almost all of our customers are gonna say I'd rather use open source if I can but use commercial if I must so why are cases people have to use commercial well one reason is you might be using a packaged application that only supports Oracle or Microsoft another reason could be that you've written a whole lot of code and it's just too hard to change your code especially if you use things like OCI and Oracle or a lot of the.net constructions okay we're gonna support these four well I've got a joking forever because in our industry forever means about three years but we have no plans to ever stop supporting these on this side why would I use Postgres versus my sequel Oh Maria there are a couple of technical differences Postgres as I mentioned has a really powerful stored procedure engine which will run both native stored procedures and lots of languages inherently about twelve different languages so Java even Perl for I don't know why you'd want to run perl in your database but if you do you can do it with Postgres it also has really powerful geographic information system so if you're doing stuff that requires complex boundary boxes or stuff with latitude and longitude and Postgres is a really good choice Postgres uses something unusual in databases called MVCC which means it's not log based still has some logs but instead of pushing everything off of a log instead of Postgres you get what's called an immutable row so whenever I write something in Postgres I get a row if I update that row it doesn't really update the row it marks the row for deletion and writes a new row when I delete a row it doesn't really delete the row it marks it for deletion this has a good side and a bad side the good side is when I have a failure and I have to failover Postgres is much faster because it doesn't have to do redo logs the rows already there you can pull them back quickly the bad news is let's say I have 50 I have a 5 terabyte of hose grass environment and I delete half of my rows I'm still using five terabytes of disk just I've got all these rows that are marked for deletion but not gone yet and actually get rid of them and Postgres you have to run a vacuum function and vacuum like garbage collection Java does have an impact on performance so if you have a database that does a lot of update Postgres might not be a good choice but just regular insert and delete it works great alright that's as deep as I'm gonna get for now into the differences for the databases if I'm gonna run it RDS I've got to make some other choices at a choice I got to choose what instance type now the instance types are the same as ec2 instance type and the main difference is instead of being say m4 dot large it'll be database or DB m4 dot large and the difference of course is now it's under our management now what does that mean it means a couple of things one means you will never get the root password it's a managed service the root password is only accessible by the Amazon management but it also means that we're gonna take care of it and keep the Linux updated keep all the software to up-to-date and so on so one family the family that we most commonly see in databases is the our family the our family is high memory if you look at the our family it's 8 gigabytes of memory per CPU so for example if I do an R for large it's 2 CPUs and 16 gig of memory this goes all the way up to the crazy big R for 8 extra la8 extra large which is 64 by 4 tired 16 extra large just 64 by 4 88 now why 48 instead of 512 we hold a little bit back for management and other pieces so that's actually how much remember you have for the database so that we're all ready we're ready not counting the memory we're gonna use for the operating system and swap space and so on so the M family is good for small databases or things that need more compute power because you're doing complex aggregates so in M you have 4 gig for CPU so if I look at a m4 large instead of 2 by 16 it's 2 by 8 and so on in that world and then the T family T stands for tests are small and this is a big deal shared all of the other environments if I'm using that are four large nobody else will ever know other user gets those that memory those processors are only mine in a tea you're sharing among others you're in a pooled of resource so your performance will be a little bit unpredictable so we really don't recommend them for poor production environments but they're great for non production and they're really cheap and indeed if you want to play with this without paying for it you can use the free tier and the free tier will let you run a t2 micro which is one processor and 512 megabytes you get 750 hours a month ok the longest month has 730 hours in it so why do we allow 750 well you can run 25 of them if you want to for 5 hours each you know that's that's over 50 hours each that's up to you but you could run one of those indefinitely for free to test things just don't expect a lot of performance because it's half a gigabyte of memory which is not going to be a very big database so you pick your instance type so again if you don't the other limitation of the R is the smallest R is large for the MS I get down to smalls and mediums so you pick your date your instance type then you pick your storage type so we support three types of storage one type is magnetic that's only there for legacy reasons don't use it won't save you any money you know perform horribly so why do we not turn it off because we don't like to deprecate things and we have some customers who do ten years ago or nine years ago when we first launched the service turned on a copy of my sequel on magnetic storage and it's still running so ok we won't force them to change so usually you can use general-purpose so what is general-purpose its SSD it's fairly high speed and you get three uh ops three I ops per gigabyte so if you have a terabyte you get three thousand i off some pretty good performance for most databases it never goes above 3000 IHOP's though so if you have two terabytes you still get three thousand the maximum size is 16 terabytes which is the largest size will support under the relational database service if you're below a terabyte it will actually burst I if you used them up to 3000 but if you need more IAP so that you can use provisioned IHOP's cost a little bit more but provision die ops you pick the size and you pick the I ops and as noted here you can pick any level of AI ops up to 40,000 unless you're using Microsoft sequel server because Windows has a limit so Windows can only handle 20,000 so you pick how many I ops you want to do and let it run so how do you decide which one to use if you don't know use GP to by the way why is it a - it's just the second generation of this so use general purpose if that doesn't give you enough performance and go ahead and use the the AI ops the provisioned high ops you can change these out quite quickly so what if I want to change my instance well I might want to make the instance bigger I might want to make the instance smaller so the disk size the disks are elastic I can make that bigger or smaller while it's running and the only limit is I can't make it bigger than 16 terabytes and I can't make it smaller than the actual amount of data I have right if I've got 5 terabytes of data I can't make it smaller than 5 terabytes won't let me keep in mind if using Postgres might need to do some vacuuming to get rid of data that's not really there if I want to change my instance so the actual the compute instance then I can do that very quickly because I don't have to copy any data it's an elastic block but it will have to take a brief outage so I'll have to shut down my database and then start it up in another instance connect to that block you can almost always do this in less than 5 minutes worst case for really big environments might take 7 to 10 minutes but in a very short outage I can say ok I'm using a a m4 with 2 processors and an 8 gig and I know I got some extra of capacity I'm going to need I want to skip to say a r4 to extra large with a processors and 60 sorry four processors and 32 gig so I'll just shut this down and bring this up now there are ways to do that even faster if you want to but this does let me move this stuff dynamically storage height misari with a short outage storage I can do dynamically and if I'm using PI ops they can change the number of i/o ops dynamically okay how do I do a high availability it's really easy you check the boxes has multiple availability zone that's it when you do that it will set up another server and another availability zone now does everybody know what an availability zone is in Amazon ok not everyone is nodding doesn't mean I'm putting you all to sleep I'm hoping yes cuz that way I get to go home early if you all fall asleep no all right so let's describe that so AWS has regions a region is a place in the world frankfort dublin paris london virginia oregon in each region i have multiple availability zones availability zone is one or more facilities that are all within one millisecond time from each other so some of the availability zones are three or four or five or six buildings but they're all clustered together so they act like one large data center each availability zone has isolated power and isolated network from other availability zones and the other availability zones are usually between two and three milliseconds away that that's a couple hundred kilometres usually so the idea is i don't have a single storm or a single flood or a single problem or single fire that will take out more than one availability zone so we have in the you know occasionally had things like tornadoes or floods that have had impacted availability zone so when i do this i have availability so in this case i'll just choose a and b and i've got another copy of the data and the server in b and all rights are synchronous so whenever i write something doesn't come back to the client as written until that successfully written in both availability zones now this does mean that your rates will get a little slower when you turn this on right because now for every rate I'm adding about four milliseconds round-trip to go into my local availability zone right it remotely get the confirmation that it's written and then go out so turning this on will make your writes a little slower won't affect your read time but that but of course the upside is what if something happens is own a it will automatically detect this automatically do a crash recovery and come back and Zumbi okay how long's crash recovery take and we're back to that architect answer it depends I mentioned earlier that Postgres is really fast in crash recovery a really big really busy Postgres environment you know 15 to 16 terabytes and lots of open transactions that'll take about two minutes to do its crash recovery a smaller more like one terabyte size I should be able to do that only a few seconds my sequel Oracle can take minutes or even hours depending on how many transactions are in process and how long it takes around the redo logs but the good news is this is all totally automated you don't need to do anything as soon as one goes down the other one that's going to automatically promote itself to primary and as long as you're using endpoints that are DNS addresses so using a DNS endpoint not an IP address it will automatically change the endpoint so clients will be able to reattach once it becomes available again but this takes all of the headache out of high availability all you do is click the box it's done for you so each host is going to manage its own set of elastic lock storage of EBS volumes and then there's an external observer that watches all of this which is itself redundant and then through DNS that will make sure to send information to the new primary and then of course the first one is replaced but it will be replaced as the secondary as the mirror so how long does it take that to happen well if the failure is just that inst failed this doesn't happen much but you know easy two instances are occasionally gonna fail they're machines that'll take about five minutes for it to get provision and come back on the other hand if the whole availability zone was lost it might take us you know a while to rebuild the datacenter solski how long that goes I would point out that one of the guarantees that we make at AWS is that in every region every availability zone has enough surplus capacity so that if I lose an availability zone there's enough extra capacity the other availability zones to cover it so we do so for example the Frankfort region has three availability zones which means each availability zone has at least 150 percent of its current capacity so that I have extra to go in case I lose one does that make sense everybody by the way anybody ever use ec2 spot so this lets you use ec2 at about eighty percent off but with the warning that in two minutes warning we can take it back that's what that is that's that excess capacity right it sits there in case of an emergency but while it's just sitting there we'll let you use it but that's also why we might say but with two minutes warning I might have to take that back because well I live in Oregon I love Oregon it's a beautiful place I wake up in the morning and look at the beautiful snowcapped volcanoes out my window and one day one of those volcanoes will erupt and take out one of our availability zones in Oregon so we know it's gonna happen sooner or later all right so my failover is all taken care of I have another option in here that's really useful which is called a Reid replicas so Reid replicas is I need the data more local or I need faster read so when you do a read replicas it's an asynchronous copy that can be in the Sam availability zone as your primary or a different availability zone this is the fun run in a different region so that I can have another copy of the data someplace so one reason is to just hire read/write I need I need to be able to have higher number of transactions for second reading so I can add a read replica or a couple of read replicas another reason is I want things to be lower latency so I might have some processing and Frankfurt and some processing in Dublin so I'll put a read replica in Dublin and it'll work off the Dublin copy third reason to do this is I want my developers to have their own copy the database to play with because here's an interesting thing you can write to a read replica but things you write on the replica don't get replicated everyone with me so here's my primary here's my replica if I give this replica to my developers they can go add their own stuff it just won't get but it only replicates one way so that way they always have the most common current data for the production system but they can be adding or testing their own things now warning if they drop a table here then it's not going to auto recreate it but I do have the way to play it around play around with it so I have a number of uses for doing this we could only do the read replicas with the open-source engines we don't do this with the Oracle or the Microsoft if this give me a number of capabilities so sometimes they get people get confused and say well why would I use one or the other or should I use both well the when I'm doing the multi is a it's synchronous read replicas asynchronous but is very scalable each primary can have five read replicas each read replicas can have five read replicas so I can repeat this infinitely and have a crazy number of replicas if I want to but it's asynchronous because we allow it to be in other regions and if I'm gonna have you know a primary in Frankfurt and a replica in Singapore I don't want to have to wait for that round-trip and every write takes a long time actually I think the most distant region from here would be Sydney in Australia it's about 230 milliseconds one-way and sorry I can't make that any faster actually had a customer asked me once why do we drill straight through the earth to get a shorter distance I didn't want to tell him that I thought he was an idiot so I said well we'll look into that there may be an engineering problem there but okay one of the fun things about living in Oregon is sometimes people do construction projects and set off small volcanic eruptions when they dig too deeply into this down another reason here is in multi easy only the primary instance is active the secondary instance is basically only does two things one is wait for a failover and then the other is when you do your backups it goes off the secondary because hey it's sitting there it's not otherwise busy it's a good place for the backup source on the other hand with read replicas all the replicas are active and can be used for read scaling read replicas don't normally do our automated backup although you could run a manual backup off of them multi easy of course is always in to AZ's and in the same region while read replicas the replicas can be wherever you want it to be and when the engine upgrades are independent from the source instance on read replicas so if I upgrade a engine instance usually this is a minor upgrade right so this is going from say Postgres 972 to Postgres 973 that's actually going to be sorry 9 6 2 2 9 6 3 there's no 97 that is something that I would normally do in in the multi easy it's going to happen on the primary and then flip over and happen on the other side read replicas are independent and of course a multi easy I have automatic failover now if my primary fails and I don't have multi I can choose to manually promote a read replica so I could use those for height or sort of high availability some people will do that all right so there's uses for both of these what I find most interesting here though is this other piece that we talked about which is backup when every running RDS the backup is completely automatic it does a full backup once a day you pick when that happens by default it's at midnight UTC right so a 100 mile European time but you can change that if there's a better time for you if he using Malta's he it takes that off the secondary if not it's exempt the primary in addition it backs up all the logs every five minutes so anytime you want to it keeps all of those for as long as you tell it the default is seven days the maximum is 35 days so this means I can recover to any point in the last 35 days so when I want to do a recovery it's pretty simple you set up a new database cluster and as part of the process of setting it up it says you want to recover from a backup and if you say yes it'll give you a list of all your backups and then when you pick one of those backups it says when do you want it you want the most recent or do you want any five-minute point going back to last month and pick that point in recover this is sometimes useful when you have data corruption and you need to recover to the point before it or when something else strange has happened so we call that a point in time recovery there is no performance penalty for backups we have designed this in such a way that when you're backing up you will not see an impact on read or write performance and of course it's storing it in s3 like I like most of your backups would now you can also do this manually on demand by taking what's called a snapshot they'll go ahead and make another backup so you might wonder hmm how can I do this with no performance penalty and the answer is that's why it's managed service we do this for you so I that automated ones I can specify the retention window and keep them in support of point in time with manual staff shots I can do that whenever I want it and they'll stay until I delete them well of course the automated backup will timeout after 35 days or whatever the appropriate number is so I could use both of these they're both effective and then restoring it as I said you create a new environment and just tell it I want to restore from the backup so finally I do I just secure that well secure by default it's normally running inside a virtual private cloud so therefore all of the traffic going in and out is going to be LS HTTP you can turn encryption on or off there's no good reason to turn encryption off you should always be running encrypted when you run on database when you do that on the open source engine it will be encrypted and it'll use the Amazon key management system if you're using Oracle or Microsoft you could use Amazon or you could use their native TDE so as transparent encryption and data encryption engine and of course running inside of EPC means that I have all of the database protection and turn off public accessibility and so on because normally there's no reason for anyone on the internet to access your database right database should only be accessible by applications in the V PC the only exception would be an administrator and if you're an administrator you should have a way to tunnel him through about to a bastion server now a little bit of a warning here I can control access to the database through I am so I can use the Amazon Identity and Access Management to say who's allowed to write who's allowed to read right down to a table level I can also issue grants from the database right I can use Postgres or my sequel or Oracle or Maria or Microsoft to say who's allowed to write who is allowed to read what happens if you do both the most restrictive one RINs so if I'm going to do both systems be careful that you don't lock people out by accident because you granted information of the database but didn't give them permission with I am or the other way around then finally encryption I just click the I click the button that says encryption you will pay no penalty for encrypting the data how do we do that with no penalty we offload the encryption because hey we've got a really big cloud with a lot of machines in it so we can use some of those extra machines to offload your encryption load so you won't pay a computational price for it so it should always be encrypted when you use the key management system you own the master key without the master key there can be no access or no way to decrypt or understand or get to the data the we automatically do six-month rotation and other pieces the big world I have to give you here is don't lose the master key we don't keep a backup the master key is yours we don't want the copy when a government comes to us and says give me his data we want to be able to say sorry I don't have the key go talk to him because it's your data we shouldn't see it either it's none of our business so we strongly believe in our customers privacy so that means you have to manage the keys or at least the master key will manage all the other keys for you which are indeed encrypted by the master key and then how do I know what's going on well I could use cloud watch so and I should use cloud watch cloud watch will let me both see what's happening on the instances and be able to set alerts so like anything don't run out of CPU don't run out of memory you don't run out of disk space right what do you do if you run out of space or if you're running out of disk space add more what if I'm running out of memory yeah well then I'll have to switch to a bigger instance or I need to go wait a minute why am i running out of memory so then I could use performance insight performance insight lets me see what's happening per query so I can look in here and say in this case I'm looking at by sequel so you can see specific queries and see what's causing it so in this case I'm getting some high performance and it's all in the blue when I go down my to big blue queries are doing these big insert insert into statements so that I can decide do I want to rewrite those statements so do I want to slow them down or otherwise throttle them but I could also be looking at this by host or by users or by other things so this actually gives me that low level understanding of my sequel and Postgres and of course where we will soon be extending this as well to a sequel server and oracle so this gives you the ability to get a really low level understanding of what's really happening inside the database this uh this gives me memories cuz my first job is a data scientist I worked for NASA and JAXA and my nickname was doctor heat death because I would put queries that would not complete until the heat death of the universe and in NASA IT they had this flow chart for troubleshooting and one of the trees on the flow chart said is dr. brisk when running a query if he s killed the query did that fix the performance problem I guess it's good to be known for something but today we could actually see those things and through performance insights give the information need to actually pick specific queries and kill them and of course when service events are going to happen I can report that out of cloud watch or directly out of RDS for things like 17 different things here around availability or backup or deletion or so on and then have it sent alerts so the alert can be an SNS alert sending a text or an email or it can be a lambda and actually run some code whatever you want to happen when things go on and of course all of this goes into cloud trail or audit system so I have an audit trace of what's happening in my database and then it gets then it gets maintained so you define a maintenance window typically two to three times a year we're gonna have to do maintenance that will require an outage usually that means changing the minor version of your database if you do not want us to do that you can tell us don't change the minor version and we won't if you're on anything other than Microsoft we can do all the operating system changes without a reboot yes we're now all entirely on linux kernel for so we can do all changes dynamically remember the meltdown virus or meltdown flaw and in gel we were able to update all of our millions of RDS users within 24 hours with no downtime except for the ones who were running on Microsoft because I required a little downtime but on those you'll get an alert now we do occasionally deprecated versions will usually deprecated minor versions when they're more than 5 or 6 years old so you will get many weeks or months of notice that that's coming and then we'll we'll help you get up to a current version that you can use all right and finally how you charge for this now it's by the hour by instance and for storage like most other things but two tricks you can do to save money one of them is reserved instances reserved instance essentially says if I prepay for a year I get 30% off if I prepay for three years they get about 55 for self your option this is a flexible RI so if I prepay for say an R for large and I discover I really need an hour for extra-large well that you flexibly move that prepayment over this it's hard to predict a year in advance or three years in advance but my favorite trick in this is stop so when you're done with the database one choice you have is to go do hit delete so to leave will wipe that database out and wipe the disk and it's gone but often a better choices stop stop is like hyphenating your laptop when he hit stop it'll write the memory on to disk and then turn off the processing so you're still paying for the storage but storage is cheap and you're not paying for the database anymore and when I need it again I get it start and in less than a minute I'll have that database back at a running State this is great for test systems it's great for development systems you know or anytime you have you know times that people are not using the system hibernate it stop it and stop paying for it all right so it's the basic piece on RDS it's a useful service it's one of our biggest services you have a huge number of customers that are running their businesses on it but we had a problem we got a lot of customers come to us and say you know what I love open source it's flexible it's got a good price I love Oracle or Microsoft and IBM they're really reliable they don't go down so could you give me something Amazon that has the same reliability as IBM pure scale or Oracle RAC but still has the same price point it's open source so we went yeah that's a tough problem okay so if we're gonna do that we're gonna have to redesign how a database works so we did we call that Amazon Arora we redesigned the storage level of a database in order to let us get to enterprise level availability reliability and performance while still at open source so how do we do that well I could say we did to have three billion engineering I could say we did that by cheating and how we cheated is we created a new storage layer the storage layer is a shared storage volume that has thousands I'm actually more than thousands of processors in it as well as dedicated storage and it means that I'm able to do things much much faster and much more effectively so one trick is unlike RDS when you use Aurora you do pick so that white thing up at the top which is doing sequel n transactions that's an image so that might be again and are for large an hour for for extra-large whatever you'll pick one of those you don't have to pick the storage cus will dynamically do the storage for you and it's actually storing six copies it's going to do two copies of the storage and each of three availability zones it'll do that on a common caching layer and that lets us get five times the three-putted my sequel or three times a through photo Postgres now wait a minute that was that even possible well let me make this the first clear it's not five times faster five times more throughput a query that takes five sub milliseconds will still take five milliseconds but a thousand queries that take ten minutes will now happen in two minutes why is it not as much of improvement in Postgres cuz Postgres is more efficient than my sequel it's not as much room to improve but when we look at some actual benchmarks we get numbers like this so using in this case the same are 3/8 extra-large that's 32 by 244 you might notice the rate performance on the Aurora is up there above 110,000 well with my sequel on RDS it's down around 20,000 the read performance on a were I'm getting 600,000 said of 100,000 now how do we do this by doing less work we're cheating here's how we're cheating I'll use my sequel cuz it's a good illustration but Postgres is similar on the left is how old my sequel works so whenever I do a transaction on my sequel it has to write five things to the disk it's got to write binary log the bin log and it's gonna write the log there's got to write the actual data and then it's got to do what's called the double right which is making sure the data matches the log and finally it's got to do the the metadata of the frm so it's got to write all that to the disk that's step one and then of course if I'm using mirrored disks which of course I am in production it's got to write down all to the mirror and then if I'm in high availability it's got to write all that to the replicas again and again so even ignoring the high availability each transaction of my sequel requires seven I Hobson because I got to do all these rights now in Aurora I can cheat I only write the log wait am i what happens all the other stuff all those processors down on the storage layer they then take the log and convert it into the bin log of the data the frm and and the right double right so I still get the full my sequel semantics but most of the work is done at that lower level rather than up at the top and then the other right is that what's his Amazon s3 that's back up so it's writing a continuous full backup at all times so this is how I get more three plate I need less ions because of the tricks we're doing here on Aurora I only need one I up for transaction set of seven and that's why I get higher throughput so we've seen this across my sequel we see this across Postgres we have lock management really simple whenever it writes it doesn't come back as written until it's written to four different storage clusters and whenever it reads it checks three and if the three don't give it the same answer it'll use the one with the newest timestamp so if I always write to four and always read to three I'm always going to pick up the new one so that way I have reliable acid transactions not eventual because there's some eventual consistency underneath right I write to four eventually writes to the fifth on the sixth but for any given transaction it's immediately consistent so a whole bunch of things that we do in there for we do a whole bunch of tricks like smart selectors and for read performance and my favorite one that we recently introduced is parallel query which means when I'm doing a complex query instead of running it up at the top level that pushes it down to my thousands of processors shared layer so for some things we can get ten or twenty or a hundred times the throughput if it's a highly parallel query we also do some tricks for write performance and for multi data access and we allow read replicas to have higher speed read and we've now announced and are currently in preview which means you can use if you want to it's kind of a beta program for what's called multi master which amounts I can have multiple write masters as well so I can have up to fifteen read replicas and fifty and write replicas so my cluster can be up to thirty one systems the original primary plus up to fifteen plus up to fifteen so that gives me the available the ability to have really high scalability for performance total data I can store 64 terabytes if you need more than that we can actually increase it on request up to 256 terabytes so should you have a 256 terabyte single database and probably not but some of our customers do it anyway so on the other hand no matter how fast it is if the database is down so how do we make sure the database is always up well that's why we have so many copies of the data six-way replicated storage so the volume is across hundreds of storage nodes so let's say availability zone three completely fails no problem still running let's say zone three fails and another storage cluster fails okay I haven't lost any data but since I no longer have for running storage clusters I can't write anymore still let you read I lose another storage cluster now it's gonna turn the database off because we don't want to lose data but of course these things get replaced underneath also think about failover I don't have to do any crash recovery because all the data is still integrity all I have to do is switch the master so when I'm doing crash recovery on Postgres I'm picky on Postgres just some of the best performance as I said about two minutes if it's really big 30 gigs of redo you know about nineteen seconds if it's just three gigs of redo when I'm running it on Aurora the number is way down over here in the lower right corner no redo you always recover in three seconds or less if you have multi-master you recover in 100 milliseconds or less because there's one endpoint going against all the masters so all it has to do is figure out to deprecate one so a lot of our customers say I need a relation over database that never goes down that's what this is designed for I can have as I said up to fifteen promotable read replicas what does that mean well but I said that three second recovery I got to have at least one read replicas I have something to recover with but of course I can read from them when they're live and as mentioned I can do cross region read replicas those will be asynchronous all right so availability is where the hardware failing we can do zero downtime patching so when I need to change my database version not major upgrades but minor ones I can do that with zero downtime in Aurora when I want to do large reorganizations it does a trick called fast cloning which means I can do large DDL almost instantly so using open source say my sequel RDS my sequel a large DDL change to could take hours with a warrior what happened in seconds that's I'm able to do that parallelization and we are continuously backing it up and I can recover that back up to any point in time as needed again out to the last 35 days but we have a neat trick called backtrack so I can recover it and say hey I wanted to recover to this point yeah I want to recover it at that point and it's non-disruptive so you can play around with where you're recovering it to the one thing you can't do recovery to is is the future of the backup it's like it's not actual time travel does let you pick a spot was working with one of our German financial customers who fortunately in a non production system somebody managed to put a bad piece of code that replaced all the fives with sevens okay you can't like they know okay just make all the sevens five so that that won't work I don't know they did this piece of code it was interesting and they realized it about half an hour later so this was really useful too okay we need to restore to right when that happened and kind of moved around a bit until they found something that had a seven and meet at the right spot and of course they still have performance insights I also have this cool trick about fast database cloning which is on demand I can do what's called a copy-on-write clone so I've got my running database and I say make a clone of it so it's gonna make another copy of that database instantly sub second I want to have a fully functional database to use totally separate from the first one with the same data initially it's the first one now how do I do that in a second even if it's a 60 terabyte or a hundred terabyte database didn't really copy the data just made a whole list of pointers so this is why this is called copy-on-write let's say I delete a row from the original database well the pointer won't work if I do that so before I delete the row I'll write the row over here and then I'll delete it and let's say I update a row over here well I'll write the old version over here and then I'll update it changes over here don't need to be replicated this makes sense I know I have another copy of the database it's great that's a test copy it's a development copy and of course I use it as long as I need it and then drop it so this gives me a really effective way to do test copies earlier I said wait don't I want to use a replica for the test copy both are good do you want your chest people to have a point-in-time copy of the data or console the upgraded copy of the data depending on the model you use one one approach to the other so that I have ways to do tests in dev environments so combining these together it gives me a full way in order to manage my environments across a number of different use cases for almost anything you need to do with relational databases this is our goal this is what we try to get to and this is something that we're very happy that people can do now normally I would ask if you have any questions but I can't do that cuz I use the whole hour talking so double check my time here yes so I would say go visit our sponsors they're really nice people we're helping us out including the ones that I somehow left off this slide unbelievable machine and all cloud and I will be here during the next break if you have any questions so I do have one request for you go out there and do really interesting stuff as a relational database thanks for your time
Info
Channel: Amazon Web Services
Views: 4,808
Rating: undefined out of 5
Keywords: AWS, Amazon Web Services, Cloud, cloud computing, AWS Cloud
Id: lQEMV_Qgjrw
Channel Id: undefined
Length: 60min 27sec (3627 seconds)
Published: Wed Apr 25 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.