Managed PostgreSQL Databases on AWS

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

so good morning everybody and thanks for coming to AWS migration day part of the 2019 edition of postcards conference I'm Kevin Jernigan I'm a product manager in the Amazon RDS team focused on Postgres both RDS for postcards and Aurora Postgres so AWS we really appreciate you spending time with us today and for some of you you know later this week and a lot of you will be staying for other parts of the conference and you know we love events like this conference because gives us the opportunity to talk with lots of customers in a short period of time and we quickly can get very deep on customer requirements especially for database workloads in AWS and you know as some of you may know we we spend most of our time building features for customers about 95 percent of everything we release is based directly on customer requirements so before I dive into the presentation I want to get a sense of who's in the audience so maybe you can tell me with a show of hands how many of you are new to Postgres ok we've got a few people new to Postgres how many of you are Oracle users ok how many are sequel server users and how many use other commercial databases like db2 or Sybase or davidís like that great so I've got a got a pretty good cross-section of database experience so I'm going to just briefly review the schedule for today it's all in this room today and if you're wondering what we mean by migration we mean migration from on-premises data us and of course more specifically from commercial databases such as Oracle and sequel server to Postgres specifically RDS Postgres and more Postgres which are our managed Postgres services so here's the schedule today we start off with this session which I'll give an overview of Amazon RDS and the two flavors of postcards that are available in RDS well then dive deep into Aurora Postgres and move from there to a focus on migration from commercial databases to Postgres culminating in a session on amazon.com migration from Oracle to R or Postgres and then we'll wrap up with and ask me anything session about post grads on AWS but let's get started so before we dive into post res manage post coordinator yes I want to cover a little bit of history first starting with me so I've been with AWS just over three years but I started my career 32 years ago at a small company called Oracle back in 1987 I was a product manager in the Oracle database team I stayed there for four years as the company grew from 800 to 8,000 people but then I left and built a consulting business with some other X Oracle people in the 90s and we mostly did work on data warehousing but we did work on Oracle and Informix and IBM udb Tripoli which became IBM db2 l UW and Sai Basin and etc but then we sold the company and I started another business a software-as-a-service company built on top of Oracle in the year 2000 but nine years later when it was clear my business wasn't going to take over the world I went back to Oracle so in 2009 I started back at Oracle running a product management team in the Oracle database team focused on storage and performance related features and then late in 2015 I decided to leave Oracle and come to AWS and so I've been a database ever since but of course I made a big transition from being an Oracle expert for almost thirty years at that point into the world of Postgres and just to put that all in context so you may have seen this graphic this is from the hospital Honor Institute and this is a a timeline of relational databases and so it's you know way too detailed to put on one slide but I just called out a few important database projects from the middle of this and you can see Oracle way back in 1977 postgrads that actually is the mid 80s post-grad started not as opposed to craz SQL but just Postgres without the sequel part and we'll talk more about that in a moment it's fun to look up this graphic though and so you know totally public to me and it's easy to find and you can trace the history of all kinds of different relational database projects so while I was trying to grow my SAS business in 2006 Amazon launched this service called AWS initially with s3 sqs and ec2 simple you know kind of basic services for providing storage and for just managing queueing and workflows and providing servers and three years later we launched Amazon relational database service or RDS with support from isequal then in 2012 we added Oracle and sequel server and in 2013 we started with post-grad specifically posters 9.3 couple years later we launched a roar with my sequel compatibility and then a couple years after that or or Postgres and like as you saw we're gonna dive deep into or Postgres in the next couple sessions after this one so I won't talk too much about Aurora in this slide in this session I want to dive into what RDS is all about with some specific details for RDS Postgres and Aurora Postgres so but since the beginning of amazon.com way back in 1996 we've run some of the world's largest and busiest production relational databases so we understand how expensive and complex it can be to manage those databases to manage the database administrative functions like patching and performance optimization and backup and recovery and disaster recovery and replications and failover and all that and it's tough to do that for constantly changing applications and so adapting those operations to the cloud also requires you know different skills new skills and new kinds of training and so when you look at your deployment options for postgrads you know the the traditional way of running post prayers are really any databases on-premises and in you know PS you have a couple different options you can run post grads yourselves on ec2 so self-managed post grads or you can move it into the manage of armaments provided in RDS now let's take a quick look at what it means what it looks like to run post grads on ec2 so an ec2 you spin up a server you attach some storage you load whatever software you want such as Postgres and you have full access to the server in the OS and the database you set those parameters you have remote access you can install third-party apps you can all you know whatever extensions you want but that means also you're responsible for all of that you have full responsibility for managing upgrades and patching and backups and and for you know most responsibility for security of that environment and if you want to set up a CH a and replication it takes a lot of time and complexity not just initially but an on an ongoing basis so what if you could do all that with in an automated way reducing your costs and improving your availability and improving your performance and so you know what if you could have a CH a and D are that's done automatically for you and with a single API call or single click of the button you could turn on H a and D are that you know this whole approach allows you to get the full advantage of enterprise level capabilities for your databases even if you're just a small start-up you don't need to hire a bunch of expensive experienced staff you can literally just click a few buttons and and implement those features and so that's why we were created RDS we really created RDS way back at the beginning because we've given customers those basic capabilities of service and storage and and some other services and they asked us to make it easier to run databases and so we started with my sequel but then added these other engines you can see here we technically think of it as seven engine since we think of Arora as two different engines and so RDS automates the time-consuming administrative tasks such as hardware provisioning and database setup and patching and backups while also providing cost efficient and resizable capacity I frees you up to focus on your applications so you can give them better performance and H a and and other capabilities without having to worry about all the details of the underpinnings and so you can actually focus your people on the specifics that are unique to your business rather than doing the generic standard things that everybody has to do with databases so I'm going to dive into a bunch of different things that you get with RDS and the first thing is security so we don't really consider security job one we consider a job zero Security's you know it top priority for us and for customers and so being able to secure your database environment is really important we have a bunch of features and RDS that allow you to do that first thing is security groups so it's easy to install and RDS database instance and make it publicly accessible you can use security groups to lock it down but it's easy to make a Mis configuration of configuration mistake that will expose your database to a wider audience than you want and so you know you probably don't want to actually make your database fully publicly accessible so you can use security groups combined with virtual private cloud with VPC to really protect your database BBC allows you to put your database in a private address space it also allows you to carve out different subnets within your address space to further segment and excuse-me segments and isolate your application components including your database so when your database is launched inside of a V PC you get to control which users and applications access that database and how they access it now once it's running inside a V PC you have lots of options to manage connectivity to that database you can create a VPN connection from your corporate data center into that V PC you can use direct connect to link your data centers to an AWS region to give you more consistent and lower latency connection to your V PC you can peer two different V pcs together allowing applications in one V PC to access your database in another V PC you can grant public access to your database if you want to attach an Internet gateway to your V PC and you can control routing using route tables that you attach to each of the subnets in your PC so they've got lots of flexibility while still protecting access to your database now another part of security of course is encryption so we provide a key management service that manages keys for you you can you can provide your own customer keys and you could use default keys as an alternative one of the problems that default Keys is that that won't will prevent you from sharing encrypted snapshots with other accounts now you may say why would I want to that well a typical setup is you'll have a production account that manages your production databases of production AWS account well then you'll have separate abuse accounts for your dev test users because they shouldn't have the same privileges and they shouldn't have access to production resources so you need to take a snapshot of your production database to share it with the dev users but if your production database is encrypted and you're using the default keys you won't be able to share it you need to use your own customer manage keys then you can share those encrypted snapshots and also you should really use a separate key for each instance it's a really segment access so that if a key gets compromised it limits the blast radius of a compromised key and one other important thing you can take an unencrypted RDS instance are DEA's post for his instance take a snapshot on that instance restore the snapshot into an encrypted instance that's how you convert unencrypted to encrypted and that's a pretty straightforward step that's also a feature we just launched for or Postgres about a week ago so very easy to convert unencrypted to encrypted with your own kms managed keys now let's go through some of these security features together and this you know picture we have an application a set of applications attached or accessing a database instance and you've got you know the the snapshots or backups of the database and you've got backups of the log files well how do I secure all this well I can start with the security group as we discussed before which I can protect access to my database instance and I can use SSL so that the traffic from my hosts my applications to the database is encrypted and then I can put it all inside of epc which is you know an even better way to secure the the whole environment and then I can enable encryption at rest which encrypts the database storage but also encrypts the backups and the backups of the log files so it encrypts all all the different types of data stored at rest for your instance but what if in the application the user decides to try to set SSO mode to disable to disable SSL well that would allow them to access the the database with an unsecure connection so an RDS and Aurora we allow you to force us a cell to be on so if you set the RDS to force us to sell parameter to one that will force all connections to use as a cell so somebody tries to connect with SSO mode disabled they will be blocked they won't be able to connect so you can really lock down your environment with all these different settings let's talk about I am for a minute identity and access management so there's a couple different things you can do with I am an RDS and Aurora first he can use I am to control who can do actions on your instances things like creating an instance or modifying it or rebooting it deleting it you can control that with I am now that's not a database user this is not controlling access inside the database we'll talk about that in a minute this is controlling access to this controls who can do what to your databases you of course can set up database users this is a standard Postgres stuff to control who can access which tables and objects and schemas but then we've also integrated both Aurora postcards and RDS postcards with I am for authentication so I am auth as we call it for short generates tokens you know very secure tokens using AWS signature version four that have only have a 15 minute lifetime so they get rotated frequently it's really hard to to break through the security of this approach this is seamlessly integrated with both RDS posters and Aurora Postgres so it's a fairly fairly new feature for both of those services and finally it's something very recent that we added is support to restrict password changes on your postcards instances so this simplifies integration into third party and homegrown password management tools enabled with the simple parameter and you know has the flexibility to assign roles to allow certain users to modify this this policy and one of the reasons customers are concerned with security is they have certain compliance requirements they have to meet and you know we have lots of different customers and startup and enterprise and public sector spaces running a whole wide range of applications and workloads across lots of industries and so some of those workloads require they meet certain regulatory requirements now to support the audit and compliance requirements of these customers we work to achieve certifications that you can run your workloads on RDS and be able to achieve a fully compliant application and for RDS we currently offer nine compliance at two stations including sock one two and three FedRAMP PC DSM PCI DSS and and HIPAA now with these certifications it means you can build and run your applications and workloads related to those compliance regimes on RDS does not relieve you of the responsibility of making sure your application meets the appropriate requirements AWS takes responsibility for the compliance of the RDS service and the infrastructure related to it and you the customer take care of the applications you've built on top of AWS so with this shared responsibility model you can take the audit findings from your application and combine them with the appropriate attestations from AWS as third-party auditors and have a complete verification of compliance for your application running on AWS so another feature in RDS is what we call parameter groups it lets you set up custom parameters specific to your instance to set things such as force SSL or to preload specific postgrads extensions turn-on auditing turn on huge pages other features it lets you create standard groups that you can then copy across multiple instances or multiple groups of instances it avoids errors it avoids configuration drift and this is a standard feature of RDS for all of our services and RDS postcards you just have a database parameter group and/or a post pairs we have both instance parameter groups and cluster parameter groups because as you'll learn in Aurora we support clusters where you can have multiple instances attached to the same database underneath and so some of the parameters need to be cluster level that apply to all the instances some of the parameters can be different for each instance so we both have both instance level and cluster level parameter groups in Aurora now in RDS in general we support multiple ec2 instance types and so you can see the main ones we support here it starts with the t2 and t3 instance types and those are smaller instance types there there they have you know as it says your moderate networking capabilities they go up to 8 V CPUs and 32 gigs of ram and they're good for smaller or variable workloads they do support some level of bursting of CPU usage and at the low end the t2 micro is eligible for free tier we give you certain amount of free processing on the t2 micro you know on a monthly basis general purpose m4 m5 instances are have better networking performance say of course as you can see you go a lot higher in the number of CPUs in the amount of memory they're optimized for CPU they're good for running CPU intensive workloads and then the R fours and our 5s have twice as much memory per CPU as the m4 m5 so they're they're optimized really for the biggest have you heaviest weight database workloads you know for workloads that do lots of database operations and might have lots of lots of connections high connection counts so RDS postcards runs on all these instance types auroral postgrads currently runs on r4 and r5 we are looking at adding support for smaller instance types in the T class that's something that's coming soon so for storage and RDS postcards you have two choices for storage one one choice is what we call general-purpose storage it's a lower cost option it's SSD storage has a maximum of 32 terabytes in size and you know the I ops available is determined by how much storage you've allocated and there are some burst capabilities but if you need a certain if you need to guarantee a certain level of AI ops are available to your application to your database then you would choose provisioned IAP storage which is a higher cost but gives you the ability to provision up to 40,000 iOS for your database which is pretty high level for for most applications same as gp2 goes up to max 32 terabytes now that's RDS Postgres as you learn a little a little later Aurora Post pres runs on its own special or a storage system which has totally different characteristics and so you'll learn a lot more details about that later today let's talk about backups for a minute so in RDS Postgres we automatically take a daily backup of your entire instance and you can control a lot of the parameters on how that's done but basically we do that daily backup you can set up retention so it'll be retained for up to 35 days we'll also archive all your wall or wall files for up to 35 days so that you can do a point-in-time recovery to any point in time in your retention window if you're running multi a-z and you learn about multi isie in a minute then we'll keep multiple copies in each availability zone and also if you're running multi isie the backup is taken from the standby not for the primary now that's all RDS Postgres a word postcards is different because we totally control the storage layer implement it's special purpose for rora we take automatic continuous incremental backups there's no backup window there's no performance impact same as RDS you can have up to 35 days retention and point-in-time recovery within that thirty five day window but there's nothing to worry about scheduling or worry about when your backups are happening they're always continuously happening all right let's go through some scenarios with availability and multi a-z so an RDS we have this feature called multi a-z or multi availability zone now in an inner region we have we will always have multiple availability zones and you can think of an AZ like a data center so you know in the Northern Virginia region some people call us East one we have a bunch of AZ's I don't even know how many we have but the idea zjz is a separate data center on a separate floodplain a separate power network but close enough together to do synchronous replication for most use cases so the multi is a feature when you enable it for RDS postgrads initially you might be running with an application on top of a database in 81 when you enable it for RDS postcards we automatically make a full-size copy of your database and a separate AZ and then we set up synchronous physical replication so that that AZ that secondary copy always has all the changes in the primary as of transaction commit time and so that just you know runs that way you can run applications in AZ to still connected to the primary and easy one and like I said there's physical synchronous replication going on now when the database when that primary database fails we autumn adequately detect that we promote the secondary to be the new primary and then in DNS we remap the see names so that the applications that are trying to connect to that end point keep trying to connect eventually they'll get connected to the new primary and then to make sure multi is e is still happening once that AZ is recovered we then start the replication in the other direction to make sure you still have that second copy and a separate easy now mote that's all kind of automatic you know and transparent to to the applications another form of replication that we support and RDS postcards is read replicas and customers use read replicas for a couple reasons all sort of three reasons one is to relieve pressure on the master by offloading read traffic to the read replicas another is to get data closer to where customers are you might have customers spread around the world you have your master readwrite database in one region but you can have read replicas in other regions so that they're read traffic for your customers will get a lower latency and a third use of read uh purposes for disaster recovery to have a replica in another region so that if Lee or your master fails in your primary region then you want to feel over you'll have a replica in another region that's ready to go so let's kind of walk through some of these steps so first off let's say you have an application running and it's attached to a primary and you have multi AZ enabled but the picture we looked at a minute ago separate from that you can set up a synchronous replication to hydrate multiple read replicas and then you can route some of your read traffic to those read wrap because to offload that traffic from the primary that's all great what happens if the primary fails well the secondary will become the new primary and it will continue doing the replication so it will continue to hydrate the read replicas and yury your eventually consistent regional still happen and of course it flips the replication around so now your rights are going to the new primer if you want to do an upgrade you can actually upgrade the read route because independent from the primary remember there's separate full-sized copies you can even have different parameter groups in those read replicas if you need to as part of the upgrade process now let's talk about cross region replication or our agent replicas as a way of reducing latency so in this picture we've got a primary and a multi a Z secondary and a Reed replica all in US East one in Northern Virginia and let's say I want to have a reed replica in in Ireland and EU West one well that's easy enough to do with RDS postcards you just set up that cross region Reed replica and point your application at that Reed replica if you want to then start migrating you can use that set up you can turn off writes from the application in US East one wait for the last set of changes to get replicated over to EU West one and you can decide that and you can break the replication and decide that you want to move your app to u.s. one now so you start up your application on that new read/write master and so and then you can turn on multi a-z there and now you're done moving your application from one region to another with minimal downtime so let's talk about upgrades for a minute so as some of you know the Postgres community is a very regular schedule of releasing new miners and new major versions new major versions come out once a year late in the year September October time frame usually new major versions of where the community adds new features minor versions of where the community fixes bugs and they generally release a minor every quarter now we support support minor version and major version upgrades with minor version upgrades you can actually enable automatic minor version upgrades so that we will automatically apply new miners in the maintenance window for your instance you don't have to check that button you can actually apply the miners on your own schedule if you want but a minor version upgrade is pretty simple we shut down your instance replace the binaries and restart the instance because that's all a minor version is it's a it's a new version of the of the Postgres binary that has bug fixes security fixes in it and so that's pretty straightforward major version upgrades are a bit more more involved when the Postgres community releases a new major version they release a version of PG upgrade that helps you upgrade from old majors to new PG upgrade has to go in and change things in your system catalog based upon the changes that community has made in the new major version so there's some some real downtime there where things changes made to your database now when we run major version upgrade for you we don't do that automatically because it's we consider it too intrusive and there's a risk your application might not work correctly in the new major version so we're not going to automatically upgrade your your your instance four major versions what we recommend you do is take a snapshot of your production instance restore it to a test instance run the peed run the major version upgrade there and then do application testing on that upgraded test instance to make sure that you aren't introducing functional problems or performance problems for that matter with the new version of Postgres and so we recommend you do that before actually doing the upgrade now you can actually minimize downtime for these major version upgrades using database migration service or DMS so let's say you have you know an application running against RDS posters 9.5 and you want to upgrade RDS Postgres 10 well your first thing you do is you create a new instance the RDS post was 10 instance you use the schema conversion tool to copy the schema from the production instance to your new target instance and then you start a replication instance of DMS you connect it to the source and the target and you select which parts of the source to rep to replicate to copy from the source of the target may be the entire database so all the databases in the instance and then you let DMS truncate whatever was there which presumably is nothing in your target and and load the data from the source of the target and then DMS can also do ongoing CDC ongoing change data capture so we'll keep the two databases in sync indefinitely until you're ready to switch over right so it will keep the database you know running this way like I said as long as you want you don't have to migrate immediately notice that there's been no downtime so far all these steps don't interfere with your production use it's only the only downtime is when you disconnect and reconnect so this minimizes downtime even for a major version upgrade this is all assuming you already did the testing I talked about in the previous slide where you took a copy and upgraded and tested your application to make sure it works all right so RTS post-grad specific support for minors so we support all the latest minors of the of the major versions that are supported by the Postgres community that includes nine for nine to four to twenty nine to five to fifteen nine six eleven 10.6 and as of last week we now support Postgres 11.1 so some of you who pay attention might know that 11.2 is the latest miner for RDS for post grads and we will be adding support for 1102 very soon in the RTS post present barman along with the miners we support lots of extensions from various open source efforts that build extensions for Postgres we currently support more than 60 in the RTS Postgres environment and we support almost as many in Aurora Postgres we support all the extensions from RDS Postgres and it were postcards except the ones related to outbound replication a feature that's coming forward postcards but isn't here today so we support almost all the same things you can see I crossed out the line that says postcards 11 is available in preview because it's non preview anymore it's in production we have a special preview environment for RDS where we make pre-release versions of postcards available for testing and so postcards 11 was in that preview environment but now it's GA we do plan to make postcards 12 available in that preview environment some time before the community makes it GA and the community if it stays on schedule post goes 12 will go GA September October so our goal is some time between now and then is to get posters 12 up in the preview environment so you can test your applications and your workloads in an RDS managed environment for post goes 12 and just a graphical picture of the extensions we support and in Postgres you know way back in 2013 when we launched RDS folks was 9.3 we started with roughly 32 extensions we actually try to add new extensions with every minor that we add so we always are looking at extensions that our customers are asking for and evaluating which ones are the right ones to add in the next minor and that email address is where we encourage you to send request for specific extensions I you with your yeast cases work look you know why do i why do you need this extension what's what would how are you using it that really helps us figure out and prioritize which extensions to add in the next miner and that's true for both I know the email says RDS Postgres is true for both RDS postcards and Aurora Postgres we would love to get that kind of feedback now a little more detail on the extension supported these are some of the new ones we've added support for in recent releases for some of you or FCE might be really interesting that is an implementation of some of the built-in packages in Oracle's Peele sequel so when you install an Oracle database they install a bunch of pre-built packages that you then write a bunch of other PL sequel code that calls all these standard packages if you're trying to convert all that PL sequel into postgrads it's nice that some of those built-in functions already exist in Postgres so you don't have to re-implement them yourselves or FCE gives you a lot of those built-in packages the common ones that you're typically going to use in your own PL sequel makes it much easier to convert your PL sequel into PL PG sequel and as a side note I believe that SCT this game a conversion tool from the database migration service is now or FCE aware so it can automatically convert your PL sequel and take advantage of the or FCE built-in packages now these are some of the extensions of a larger list of extensions supported in RDS Postgres that you know our customers use a lot one that isn't directly listed on here that's probably the most popular is post GIS for geospatial capabilities but PG routing is on here which enhances post yes in certain ways but you know these are just some of the one the extensions that customers tell us that use a lot I'll call it a couple others PG audit turn on PG audit it's detailed auditing information for all the access and sequel statements being thrown at your database Walter JSON is one of those logical decoders where if you enable replication from RDS postcards you can use wall to JSON to convert your wall records all the change records into JSON format and then send them off to whatever tool you want to consume that JSON PGN plant works well together with a feature we've added to rural post squares called query plan management helps you put hints on your cruise control the execution plans of those queries and log FTW is an extension we wrote it lets you run sequel statements on your log files on your post-grad log files where diagnostic and error messages are you can actually use log FTW to run select statements on those log files so it's one way to do log analytics from inside your running instance alright so let's talk a bit more about replication for a minute and RDS Postgres there are three kinds of replication that we we can support one is kind of purely at the logical SQL sequel level you can do statement based or trigger based replication that's fully supported in RDS Postgres it's not the most efficient way but it's sometimes you know the best way for depending on what your needs are we support standard Postgres logical replication at the engine level both with replication slots and with the newer PG logical capability this enables tools like DMS database migration service actually sits on top of replication slots and also enables third party tools that might use replication slots of PG logical and then we've already talked about what we do down at the physical engine level to support read replicas and multi a-z so let's take a closer look at the logical replication support to enable it you set RDS tautological replication to one and then users who have the right privileges the RDS for replication and RDS super user role can access those replication slots we provide support for event triggers and we now support native logical replication PG logical wall to JSON and decoder raw and this lets you set up things like you see in that diagram you can create your own custom logical replication handler and process the changes that are being made your database into almost any format you can think of and send it to almost any target you want you can of course send those changes to another relational database it doesn't even have to be RDS or R or it can be an on-premise database if you want you can set it to no sequel database you can send it to an s3 bucket in the file you can put it into a Kinesis stream and consume that stream somewhere else so you have lots of things you can do with the capabilities that we support an RDS for logical replication I want to change gears a little bit and talk about monitoring so with the R or Postgres we launched a new feature called performance insights and we've added since added support for performance insights to RDS postcards versions 10 and 11 and two other RDS engines so it's not limited to postcards at all we just started with the R or Postgres performance insights gives you a view of the load that's running on your database based upon the sequel statements so we'll sort those sequel statements by load show you the weight States those statements are waiting in it'll help you identify the top sequel the ones that are generating the most load so you can focus your tuning efforts in the right places works well with other tools like query plan management from Aurora postgrads and PG stat statements it also allows you to pull the data out in an API so you don't have to use the graphical user interface the GUI that we provide you can pull the data out and consume it in other ways as well in addition to performance insights we also provide a tool called enhanced monitoring now enhanced monitoring gives you as it says more than 50 different OS level stats about what's happening on your database instance I didn't really call this out directly but an RDS no worry you can't log into the host and run your own you know commands top and vmstat and all that so we provide those same stats for you through enhanced monitoring you can set up enhanced monitoring to give you granularity down to one second granularity if you need to and all that data is also available through cloud watch so cloud watch is another it'll be a service it lets you graph those kinds of graph various metrics it lets you set alarms on those metrics so if you want you could set an alarm saying that well if the CPU usage on my RDS instance goes a below above 80% I want you to fire off this alarm which you know rings my pager or you can automate things we have some customers who will set an alarm to cause their database to automatically scale up if the CPU gets above a certain level so they do a scale compute as we call it which means that you know it's literally API call you say well I want the I want you to change the instance on this this RDS database from this size to this size and so some customers automate all that just by putting alarms on cloud watch and cloud watch data of course can be pulled out through api's as well we also recently added a feature to RDS postgrads where we you can automatically upload your cloud watch your post cards logs to clock watch so those post card log files that I mentioned before you can use log FTW to access them in a running instance you can also now have your RDS postcodes log files automatically flow into cloud watch and then automatically fill in an s3 bucket and lots of customers want to put all the log files for all their different application components into one bucket and then do log analytics on the whole pile this now allows you to include your Postgres logs into that kind of setup we are working on the same feature for we're Postgres that's that's going to come soon so I want to just kind of review quick tips based upon what we've covered and most of these apply to both RDS post squares and Aurora Postgres the first is for production applications we always recommend you use multi ez4 for high availability to make sure that you have resiliency across lots of different failure cases as you'll learn and there were post grads we have multi a-z it's done in a slightly different way has some interesting benefits but we'll dive into that in the next couple sessions we also recommend that you use the automated backups that are available with RDS and typically we like like to see customers keep at least three weeks of backups around just cover you against various failure cases because you can remember you can do point in time recovery across that entire range we also recommend you actually do test your restores not necessarily because it'll be bugs in our backups but there might be bugs in your process for doing restores it's all about your people having the right playbooks and the right training and the right familiarity because when you have to do a restore under pressure because you have you know something burning down your data center or in your running instances you don't want to you know be doing it for the first time in months so it makes sense to always practice restores you should probably take periodical logical exports dumps of subsets of your database maybe certain schemas or certain tables you might need to be you know doing more precise restores than entire point in time restore your entire database so it makes sense in a lot of cases to take periodic logical exports as well I don't see why you wouldn't force SSL you should probably turn on forces to sell and all your environments just to make them more secure strongly encourage you to think about enabling Auto minor version upgrades so that you we will be able to automatically apply patches and security fixes from the community when they're available rather than you know you remembering to do it at some point so I know some of the environments you need to schedule down time very carefully so you might not want to enable it is a recommendation obviously not a requirement we you should also make sure that you follow upgrades for your extensions as well because the extension has its own schedule they're not done by the community the eachother own little community so they don't have their own schedules for upgrades and enhancements you really should enable performance insights it's a free feature that doesn't cost anything to turn on performance insights in your environments by default it will maintain seven days of single second granularity data of your running instance so you can go back in time and see what the quote was generating the load four days ago at you know 1:30 p.m. you can actually ask us to keep more data than one than one week and there's a nominal charge for the storage we'll keep data for up to one or two years of performance insights but again the initial level there's no additional charge you should really turn on enhanced monitoring at least four ten-second granularity if not finer these are all really useful tools for keeping track okay I'll take your question yeah you can choose specific versions it depends on the version of Postgres so there's some dependencies there we can talk afterwards because there's a bunch of experts in the room we can talk about specific extensions and versions you should enable PG stat statements works well with performance insights helps you to just see more deeply into what's going on with the sequel statements in your instance and you should look at putting alarms on things like transaction IDs I didn't get into the guts of how posters works and you can find out a lot in the next five days about transaction IDs in vacuum but one of the risks is if you're not monitoring how you're using transaction IDs you can run out of transaction IDs running out of transaction IDs is a really bad thing in Postgres it can render your database unusable we have some automation in RDS that monitors transaction IDs to warn you that you're going to run out that you're you're heading in a bad to a bad place and we try to help you avoid that but you should really learn about transaction IDs make sure you know how to avoid running out of transaction IDs you should of course you know configure alarms on things like database load and in free space free disk space and memory and swap uses just in general so that you know when you're getting close to any limits in your setup now one thing to note about free disk space in RDS Postgres you need to pre provision storage for your database either gp2 or provisioned i up storage we have to pre provision it and then you start putting data in and you start filling up that storage that's why we say you should configure alarms on monitoring free disk space because when you run out of space then you have a problem so you don't want to run out of space you want to see when you're getting close and add more storage in Aurora we automatically grow the storage for you there's no pre provisioning of storage it just automatically grows as you insert rows all the way up to 64 terabytes so there's less of a need to monitor for free disk space in a row and again you'll learn more about that later today so there's a few forums that we maintain for getting more information about RDS and Aurora and ec2 and of course those forums for all the other AWS services as well so I'm just listing you know these three here what's really useful to go into these forums and post questions because often other AWS customers will answer and of course will be we always monitor the forum's answer questions as well as part of the service teams but these are great places to get to get information so I've listed a few RDS Postgres customers here just want to cover little details on how they're using RDS and some of the benefits so most of you are probably familiar with instacart they you know right away to order same-day groceries online and to avoid you know the complexity of building a new production database from scratch they actually turned us to use RDS for postcards instead of self-managed Postgres now this allows them to add millions of new products and items to their database every month without having to focus their engineering team on all the block-and-tackle just running databases so they really see that they're getting a lot of benefits from focusing their engineers prima out pretty much entirely on improving their their application experience for their customers and not worrying about managing the database infor is one of the largest ERP application providers out there you know they they generally focus on building applications for specific industries unlike the traditional providers you know primarily ICP and Oracle you know they have a SAS model based entirely in the cloud and by using AWS and RDS Postgres and aurora Postgres you know they're able to quickly deploy their apps in the cloud across a wide range of industries and wide range of applications and then vessels are startup based in San Francisco they provide their online subscribers early access to some of the best webseries music videos and TV segments across multiple devices now they built and launched their video platform entirely on AWS and by using AWS and RDS Postgres saying it's pretty much the same story they can focus their time and resources on optimizing their applications and their experience for their customers and then finally I'll talk a little bit about Wave and we really have a quote from wave wave helps users locate their contacts on a private map and they like to say there's a new wave as they call it opened every second and I have a quote from their CTO he says we found AWS to be the best way to achieve a vast scalable pool of resources that can be triggered on demand with that kind of flexibility we were looking for the same pay-as-you-go maintenance-free experience for our databases and Amazon RDS was the solution it simplifies database maintenance set up an optimization handles the heavy lifting by automatically managing connections backup security and updates also with multi a-z we can make changes with no downtime whatsoever so obviously the CTO wave is pretty excited about what they're getting from RDS now I'm gonna have I'm gonna cover just one slide on a roar Postgres because we're gonna go much deeper later actually in the next session a world postgrads we built Aurora because customers were running RDS and using RDS to run databases from other people from other companies right or the projects we were basically running other people's databases and RDS customers were seeing you know my sequel and postgrads and Oracle and sequel server and they have started asking us for the combination of capabilities between some of the enterprise database features they liked and the simplicity and low cost of open source and so that caused us to build Aurora now the core innovation of Aurora is to separate the logging and storage layer from the other layers of the database engine and weave reimplemented the logging in storage layers and that's what we call a row of storage Aurora Storage maintains six copies across three availability zones of all your data and it does that by what we did was change the engine the database engine so that when you write a log record we actually write it to six places a so initially in parallel and we wait for four responses and you'll learn a little later why four is the right number of four out of six and so we rate log records to the storage nodes and we have also changed the engine so we don't write dirty database pages so we reduced a bunch of i/o out of the system we push down a bunch of database work into the storage notes because the storage nodes are very database to where they take the log records and they apply them to database pages down in storage that storage system will have hundreds or sometimes thousands of storage nodes in a single over a storage volume spread across those three daisies so with lots of Vioxx available lots of processing available down in that storage system so we've offloaded a bunch of work off the database nodes down in the storage nodes which is how we deliver three times more throughput than standard Postgres in some cases customers have told us they see up to 11 times more throughput it's very much workload dependent so as lots of cool stuff we're doing with the word because we fully control that storage layer and again you'll learn more about it later we launched Dewar Postgres with support for postcards 9.6 we've add Postgres ten support we will soon add support for postcodes 11 so that's coming a little bit later this year and finally taking a big step back what's the bigger environment all this lives in so RDS is getting on the box up there where you can see the engines we support an RDS but we support a lot of non relational database services as well you know dynamo ElastiCache Neptune document DB imago compatible service we launched recently as well so you know there's there's a bunch of other database services we offer to our customers because we know lots of customers want to use the right tool for the right job and so sometimes they need to do graph processing or they need to do you know caching with Redis or memcache D compliant so you know there's lots of different services we're making available just because customers keep asking us for these things so that's all I have time for I appreciate your time I'm happy to take a couple questions our next session is going to start in eight minutes though so yes yeah so the question is why do we have a constraint of 64 terabytes for Aurora as the size there is no real constraint other than we just haven't had a lot of customers asked for more that was what I used to say to that question a couple years ago we've had more question customers ask and we are going to increase that limit so but there's no hard technical limit other than not that many people ask for it yet I mean there is a limit we have to do testing and validation all that right right other questions yeah right so question is about future parity between RDS and Aurora where they converge where they diverge so Aurora itself is built by taking the posters open source code and changing it to interface with the Aurora storage system but that means that's down with kind of the lowest layers layers that you generally won't see as an application developer or as a user so the behavior above that is mostly identical because it's mostly the same code now having said that we are starting to add things in the upper layers like query plan manager qpm now qpm is something specific to or postgrads because you know we had to build it into the Aurora Postgres environment that isn't available and RDS or standard Postgres but you not to use it so technically in a world postgrads it's kind of like Postgres plus things so you'll see us continue to do that add things to Aurora postgrads that may not exist in standard postcodes having said that there are a bunch of things we're looking at that we want to contribute back to the community because we want to see them get into mainline Postgres not just a RDS but anybody's Postgres so there are some optimizations we're working on in the in query optimization for example that will work with the community but of course the soonest we'll get anything new is posters 13 posters 12 is done functionally they're just pushing you know through the process to get it out so we'll release stuff and we're post cards that won't get in the community for a while but and so you won't see you'll see Aurora always be a bit ahead because it takes a while to get back through the process right so the question is what other plans to add TD transparent data encryption are you referring to you know column level encryption well so the encryption at rest we have how is that different from the data encryption for Oracle and sequel server now it's our encryption implemented sorry its encryption implemented at the storage level by us but so it's it's conceptually the same the part that's different for TDE and Oracle and sequel server you can actually do transparent column encryption as well and that's why I was asking if you're talking about column encryption because the encryption at rest in RDS and Aurora is functionally the same as encryption at rest that you see from TD and Oracle sequel server all right but the column level stuff that's different we don't yet have transparent column level encryption so in RDS in Aurora we support an extension called PG crypto which lets you encrypt columns we have to change your application it has to be aware of the encryption alright and TD you know full a full TD he allows you to encrypt columns without the application knowing about it either so that's the difference but maybe we can we can talk afterwards if that doesn't fully answer your question no no yeah we do not have t to eat for column level we do for for the entire storage value we've been talking about it we should talk more offline yeah right so one of the challenges a lot of those extensions are what are called unsecured or unsafe or untrusted because they allow you to run arbitrary code on the host which would then allow you to do arbitrary things to the host not that you want to necessarily but somebody might someday and so it creates security holes that we can't afford to open up in the RDS or Aurora infrastructure so the challenge there is yeah figuring out how to convert those to things that are supported and are safe and trusted and in our environments you know we have SCT that helps convert some things but I don't think right now ICT supports those kinds of languages and converting them to other like in the P o PG sequel or P o v8 which we do support so yeah that's more of a manual process at this point if you really have a big you know install base of code like that you would if you didn't want to change anything I could you'd run self-manage Postgres on ec2 yeah yeah I mean if you have that legs a bunch of Lexy code that runs in extensions it can't be supported and RDS then you have to do something if you want to get to RDS or R or my answer is the same for RDS in Aurora there aren't any different they both have the same you know approach to handling extensions and the other question I think we should probably let the next presenter set up because he only has two minutes to setup and so but thank you thank you for your questions I'll be around all day so we can keep talking [Applause]

Info

Channel: Amazon Web Services

Views: 4,296

Rating: 5 out of 5

Keywords: AWS, Amazon Web Services, Cloud, cloud computing, AWS Cloud, PostgreSQL, Amazon, Aurora, RDS

Id: hdQ-geGBsq4

Channel Id: undefined

Length: 55min 50sec (3350 seconds)

Published: Mon Apr 01 2019