AWS re:Invent 2018: Deep Dive on PostgreSQL Databases on Amazon RDS (DAT324)

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hi everybody wow this is just really cool seeing a room like this full of people want to learn about Postgres seeing seeing Postgres grow over the years and decades and and now we were room full of people you are learning about Postgres so I'm uh I'm Jim well Jen ski I am a database engineer principal database engineer at AWS and part of the Aurora Postgres and RDS Postgres team and unfortunately we have a a screen over here and we have a clicker that doesn't work but we'll uh we'll go on so we're about halfway through reinvent now we're the middle day of the middle of the day of the middle of the week and a lot of the Postgres sessions have have already happened but there's still some more related sessions that are coming up throughout the week there's also a new one that just recently launched about learning about RDS on vmware check that out for a deep dive into that that's tomorrow but what is relational database services or RDS two simple way to run your relational database in the cloud it's available in six different engines cloud native of aurora and comes in two flavors of my sequel and Postgres we also have the open source engines of my sequel maria DB and Postgres which we're going to focus on today and the commercial engines of sequel server and oracle when you want to run Postgres on AWS there's a few different options you run that self managed on ec2 and put your data on EBS volumes you could run it in a managed service either through RDS or Postgres or Aurora with Postgres compatibility and now the clicker is working cool but when you're doing doing that you have a few different choices you might have post quest being open source some people have really customized Postgres created building it with custom configuration options and doing that when you when you have something like that you're going to have to run it in a self-managed environment and when you do that your DBA is you're gonna have to take on more work of setting up the platform installing Postgres configuring high availability and monitoring and whatnot and they don't have as much time to deal with application level problems when you use it through a managed service your data your DBAs now have time to work on the business critical problems whether it's doing schema design earlier on in the process so this way you have good schemas as you're building your applications or doing query tuning RDS in a row or Postgres allow you to will take care of the platform issues for you so RDS for Postgres closely follows along the Postgres community's release cycle the Postgres community has yearly major releases they're released out in the early fall you notice the major release numbering scheme has changed a little bit it used to be at the dot release so 9.6 is a major release over 9.5 but starting in Postgres 10 it started having the first number as the the major release so 10 and then the latest major releases post-course 11 which is available in the preview environment if you want to test with some of the new features and functionality in Postgres 11 and we'll talk about some of that today also post course has quarterly patch and update maintenance releases apply them quickly they're usually there for bug and security fixes so Aurora is built from the ground up leveraging AWS services it's really ed it gives you some additional performance characteristics and it automatically scale your storage up to 64 terabytes you don't have to pre allocate the storage it'll scale it with you and it gives you additional durability and high availability options and speaking of high availability how do you set it up Postgres with highly high availability the first thing you do is backups even if you have automatic failover or replication or anything else any database you need backups any database you care about you should be taking backups that includes your your development and and you're testing instances you may want to be able to roll up back from a change having those backups will help with RDS you have the ability to just click in and setup daily backups you know your backups will run during your backup window and it will take every five minutes take a snapshot of your your transaction logs and be able to save that off so this way you don't lose any data in the event that you need to restore from backups when you're setting it up for production by default RDS is give you a seven day retention set that up for more like three weeks you could have it up to 35 days retention it's not uncommon that you want to be able to go back and look at a backup from a couple weeks ago to help diagnose a problem you're experiencing today so if you have those backups you can go back and look in time but when you're talking about backups the most common reason you're going to restore from backup is not because of some catastrophic failure on your database server it's because of human error somebody accidentally dropped the table truncated a table had there was a bug in a date and maintenance script that you need to go back and fix some of the data with point-in-time restore you can restore your database to the instant before that event happened and you don't have to restore it back to that main database you could store to another instance and it doesn't have to be the same instance size a lot of times if you want to get back that drop table you want a temporary or loose story there's no reason to restore it to the same instance type that you had it in production it could be smaller with a smaller amount of i/o ops in order to get that table back and load it back into production but for any production server you want to set up automatic failover and with that you'd set up multi AZ with multi AZ your synchronously gonna replicate your data out to a second availability zone and in an inventive ating sort of failure RDS will automatically promote that standby and redirect your DNS entries so this way your application will reconnect to that and you'll be serving your customers again but with a highly available database you need to secure it and for anything with security you start with communication to the database you want to have encrypted communication to that database the same way you do to your web servers you want to use SSL even if you're running everything in a protected VPC a lot of times DBAs may connect from their laptops and other things you want to ensure that everything is encrypted across the entire path to your database server wenyan to create a new RDS instance there's an SSL certificate on there that you're able to leverage but it's optional it is their optional but you what I recommend is turn it on all the time and you set the parameter force s SL you won't even want to do this on your development servers because a lot of times you'll take data from production roll that back into your development servers and it just wastes encrypted all along but with encryption over the network with SSL it is a two-way a a two-way communication between the client and the server so the client also has influence over that it'll request a type of connection that is going to have so with Postgres with Postgres how the client requests the type of SSL connection it's by setting the SSL mode parameter in your connection string there's six different options there two of them disable and allow never use when you do it when it's set like that you're probably going to be sending things over clear text the default is preferred when it's set if we prefer if SSL is enabled on the server the clients going to use an SSL connection so by default with RDS having SSL enabled on the server and set default client and having it for prefer your by default you're gonna have SSL connections but really what you want to do is set it to require just in case somebody disables SSL on the server you want to have that client also saying I'm only going to talk to you if your network is encrypted so this will guard against disabling that SSL on that server but you might have higher requirements from the security perspective where you could set it to see verify CA what that'll do is verify that that certificate coming back from the server is from a known certificate authority this will essentially verify that you're connecting to an RDS server prevent somebody putting a some another Postgres instance in between or something else that's responding on the default port it will ensure that it's coming back the right way but you may have even more stringent requirements you want to verify the hostname as well and setting it to verify full will verify the hostname be careful with that if you have things where you want to set spin up multiple read replicas and you want to load balance across them order you're setting a DNS entry for convenience it'll verify that the the hostname over the hostname given to you by RDS when you create that database so if you create different DNS entries on top of that these connections will fail but when you have a secure communication now you want to authenticate and connect to the database Postgres by default uses md5 with a salt for authentication to connect to authenticate against database users you'll create your user and give it some sort of strong password you could have a valid until if you'd like and that works really well for a lot of people in a lot of cases that's enough security for them but some people have more stringent requirements some people have five or ten thousand Postgres databases and they need to centrally manage their users in the event that a user leaves the company you want to be able to disable access for everybody for that user for all the databases you also may have more complex password requirements in order to accomplish this we you have the ability to authenticate with Identity and Access Management you could just enable it for your RDS and Aurora post quest databases there we go and when you do this you're you're authenticating against I am and from I am you're getting a token that lives for 15 minutes and using that token in order to be able to connect to Postgres so this way you're never authenticating directly against Postgres it's leveraging I am for that authentication and post-course will go out and verify that against I am that you have a valid token and you can configure that so you now have a new role inside of your Postgres databases as you update to the latest versions RDS I am so if you have existing users that you now want to enable I am authentication you just grant that role to that existing user you also have the ability to create new users and also grant that role and a notice on that create user command we're not setting a password because Postgres isn't using that password you're authenticating against I am NOT against Postgres Postgres is trusting I am to say that that user is valid so you have to create a user account within iam when you're doing that set it as the same name as the database user it just simplifies management and administration of that if you have different names trying to associate them could get tedious having them the same name makes it simpler also that user needs programmatic access in order to be able to generate a token once you have that user you need to give it and granted a policy for RDS DB connect once in that policy you have to give it a little bit of information what region is this database in what's the account that that database is in most importantly what's the database ID here you could give it a specific database ID but give it the wildcard star because if you spin up a read replica off of that master or if you try to restore a database you'd have to change your policy in order to login to that restored database or that read replica by giving it the wild card you restrict it down to the databases within a region for a given account and then finally you have to give it the database user you're connecting to this is the association between odd the I am user and the database user again keeping everything all the same just simplifies things then when you connect you need to generate that token whether or not you're connecting with post quests via the command line that we're going to see here or you're using one of the GUI tools like PG admin that's hokum is essentially used as a password with Postgres on the command line you have the number of different environment variables like PG passwords so if you set the results of the the generate DB auth token command as the PG password you can now use that and this is what what that token looks like if you were connecting with the GUI tools you would copy and paste that token in order to connect but with on the command line you don't have that because you're set you said it via the environment variable but it does require SSL there we go so once you're connected you need to authorize different objects within there you know you you have the same ability that you do in other databases of locking down individual objects Postgres allows you to grant and voc privileges on individual objects to different users and roles so you could revoke update privileges to your bi user you could also go more fine-grained at the column level so you could revoke select privileges from everybody for a sensitive column like Social Security number Postgres also has row-level security so you can create policies on something like a sales table so a Sales Manager could only see the rows that it could he or she is allowed to see through your given through your policy so once you have your database it variably it's gonna scale your databases are gonna get bigger they're collecting more and more data so you have a number of different instance sizes for RDS that you could use everything from small a small T to one v of virtual CPU up to a 96 virtual CPU m5 instance you could also scale your storage up to the night you can now scale your storage and RDS up to 32 terabytes and have 40,000 provisioned a UPS but with any given instance you want to take the best advantage of the resources that are in there and a not uncommon scenario and a lot of schemas you have one or two really large tables and a lot of other small ones and within that it's also not uncommon that you have a small set of hot data and a lot of historical data maybe it's a sales table that you're always dealing with the most recent sales orders and last years and two years ago sales orders you only do on yearly reports or those sorts of scenario is primary prime example of a for something that's a good candidate for partitioning what partitioning is is taking one big large tables let that up into multiple small chunks and making that appear is one large table out to the application server this gives you a couple advantages one in performance when you're running your queries Postgres is able to prune away large amounts of that table even without accessing it and also around maintenance if you need to perform maintenance operations on the partitions it's a lot smaller chunks and that's not something new that's been in Postgres Postgres is a table partitioning for 15 years it's but how it's been done is it used to be done you via a innovative technology called table inheritance Postgres has had table inheritance that allows you to create child tables on underneath those parent tables and you restrict access to the individual partitions or child tables based on constraints and then Postgres would use those constraints to prune away the partitions but because people you would use table inheritance for other things other than just table partitioning Postgres didn't have the ability to know how to route the data into there so before Postgres 10 you had to create triggers in order to route that in there it's I like writing code and all but writing partitioning triggers is not always a fun thing to do it's just overhead of doing that and you're adding overhead of just writing a trigger in the first place starting in Postgres 10 we now have partitioning syntax and what this allows us to do is create actual partitions on there we're giving postgrads more metadata about what we're trying to do with that table as opposed to just straight table inheritance we could have a lot of other uses here most grass now knows we want to use it for table partitioning but like any large feature in Postgres it really is something that evolves over time with yearly release cycles Postgres iterates on large features like this over and over again so in Postgres 10 if you wanted to create an index on the parent table you would get an error you'd have to create it on all the individuals tables partitions themselves but in Postgres 10 we did get that partitioning syntax for range and list partitioning and the big thing is the triggers are no longer needed giving us a big formants gain on our data loads and what those performance impact are is so if we had that orders table that we've been looking at we wanted a load in seven and a half million rows which is about a gig of data with the old version with table inherence with triggers that would take about eight minutes to load with the new partitioning syntax that now that same amount of data loads in about 40 seconds but more than a 10x performance gain giving us performance very close to loading data as if there were no partitioning at all on that table but Postgres does continue to evolve in Postgres 11 again which is available in the preview environment you can now do things like create and creating indexes on the parent table and that will push down the indexes on to the individual partitions you could also now create primary keys and foreign keys which you couldn't do before also has the ability to move data via updates so if you update a a column that's the partitioning key which would force that road to move to a different partition Postgres will now move that to the different partition you also have the far default partitioning so this way if you have data coming in with say a date you weren't expecting say january 1st 1970 and there's no partition for that that row would be pushed into that default partition the biggest advance is really partition elimination at execution time before Postgres 11 Postgres would only prune away those partitions at optimizer time so you'd have to have something in your where clause to tell Postgres eliminate all those other partitions but that's not how people write code people write code by joining two other tables and me you want to be able to partition prune via that joint so here's an example of just summing up all the sales orders for the the mondays in q2 of 1992 right we know you know if we wanted to get that with monthly partition we really only need to access three partitions but Postgres can't interpret that by just looking at the sequel it's really getting that back from that dates table so with Postgres 11 now it could do it as part two and pruning at execution time that's a 35% performance improvement by being able to prune that at execution time and that gives us a huge benefit over what Postgres nine six was which was a 74% improvement over the old way of doing table inheritance with that I'm going to turn it over to Jeremy is fun all kinds of stuff to talk about with Postgres push hard and I hope you still have lots of room in your brain I'm gonna use the rest of the time that I have today to talk about two more topics that I think are important to cover this year reinvent related to Postgres and the first one is having maintaining multiple copies of our data we're all doing this right in fact I've put an example of one of the reasons on the screen to your left right high availability that's one reason another reason might be you want to squeeze that downtime window as small as possible while you're doing a major upgrade or maybe you're moving from one platform to another right maintain those multiple copies you might need access to more compute capacity than you can get from the biggest single server available another thing might be that you have a geographically distributed application you want to move your data and your apps as close to your end users as possible all over the world these are all some of the different reasons that we want to do replication maintain multiple copies of our data and Postgres as we've been talking about this isn't a new database we've been around for a couple of decades the database has and replication is nothing new the first production replication solutions for Postgres were over a decade ago now an RDS Postgres we have eight patterns for replication that are really common and I want to run through all eight of these real quickly and the first category is what I'll call sort of sequel base but really it's there's ways you can do replication where you don't need anything special from the database outside of the sequel standard first of all you can manage it entire yourself outside of the database or secondly there are replication solutions that are based on triggers this is the oldest way of doing replication in Postgres this is what's been around the longest and you might be tempted to think because it's been around for so long maybe it's getting obsolete but that's not the case in fact I've talked to people just this past year where in spite of the complexity and the cost and it made sense for them to manage it themselves or another thing to think about you could have a Postgres database in one of your data centers that you haven't touched for like ten years you could still use a trigger based solution on that database to move that data pretty much anywhere you wanted to however there are still some drawbacks to the to these approaches if you want to manage it yourself there is there's a lot of complexity there there's a lot of corner cases and it's your responsibility to cover all of this you also have to own the maintenance of that so that's that's hard that's expensive trigger based solutions these have been around for a long time and they're very well understood it's also pretty widely acknowledged that they increase the right IO load on your source database and there's also a noticeable CPU overhead that you get so the thing the thing with these drawbacks is the clicker will go for me is that you can't really solve these problems by just tuning some parameters right they're inherent to the approach if you if you want to solve this problem of having all of this inefficiency and you don't want to have all the complexity of managing it yourself we need to take a different approach to replication and that brings us to this a whole different approach which Postgres again brought in still around a decade ago now which is physical replication now if you're not familiar with the concept in a nutshell what physical replication means is that at the end of the day your your replica your copy kind of looks almost like what you would get if you just copied the files from one server to another and this is not the way sequel based replication works you can have the same schema you can have the same rows the same data you could even run the same sequel statements in the same order on to databases but if you open up your files in a hex editor what you're going to find is they they don't look the same for example the transaction identifier z' and the data structures that are underneath your tables those are going to be different in fact I have to sort of take a short tangent here and point something out that you can see this for yourself and if you're on Postgres 10-5 inside of RDS in fact this screenshot here this is an actual screen shot from a hex editor that you could use to open up your Postgres data files from RDS and take a look at the byte level what's going on there the color coding in this hex editor is taking each individual byte and its mapping that byte back to the data structure in the source code of Postgres that you can download from the internet it's a level of transparency that's absolutely amazing I'm really excited that we're able to give that capability in addition you can also use sequel there's an example at the top to query against the data structures that are underneath your tables right and coming back to our two Postgres and replication coming back to Postgres and replication the the physical area so physical replication yes let me let me just move on to the next slide we have basically two patterns for physical replication and RDS multi a-z we introduced earlier so multi AC is the checkbox that you hit and the great thing about multi a-z is that you don't have to worry about tracking the state of your cluster you don't have to manage the endpoints and you don't have to try to make sure that you configure your database so that it's impossible to lose any data we've done all of that heavy lifting for you and made sure that it's set up properly so the only thing that you have to do is check the box that's the reason that we do recommend to pretty much all of our customers for your production systems to use multi a-z in the second pattern for doing physical replication and RDS oh and I want to be careful here so on my sequel RDS the read replicas are not physical replicas those are those are more like their logical replicas they're more like what else describing earlier but and we're here for Postgres in an RDS Postgres read replicas are in fact physical replicas so we benefit from the the efficiencies of just moving the log records across so where as multi a-z is designed to give you higher availability read replicas have a different goal read replicas are designed to allow you to run queries against your replica even if that replica is on the other side of the planet and a completely different region you can run your queries there there are still some drawbacks with physical replication though maybe you guessed I was going to go there first of all you can't just replicate when you're doing this kind of a model you can't pick one table out or just part of your database and replicate that another thing is that you can't go across different different major versions of Postgres another thing is that you can't you know in your replica your copy you can't make another table that you start doing rights on that table it's it's pretty much read-only so we've solved the inefficiency problem that we had with sequel based replication we don't have all that additional IO and CPU overhead but we've lost all the flexibility that we had and this brings us now to Postgres his answer to that which is to build a a replication solution that can meet all of these needs now I'm gonna start from the bottom and I want to kind of step through how this thing is built because I think there's some really important lessons to learn as we look at this remember that one of the first problems we were trying to solve was was a write amplification problem really that with triggers when I flipped that switch to turn on replication now every time I make a change to the database instead of one right I have to do two writes because I have to I have to make my change and then I have to write keep track of that change over in some other place for the replication stuff there's a simple solution to that and it's it's used it's used in a lot of databases every modern database is has a log and that log it could be called a redo log it might be we call it a right ahead log or we say wall for short and that log every change in the database just goes into this log well the simple solution for that double write problem is that we take advantage of the log we already have all the changes in this log if we can just start reading that log back and using that then we don't need the second set of rights for replication and that's exactly what Postgres did the very first earliest physical replication solutions for Postgres 10 years ago we just copied those logs over to another server we replay those logs on the second server and bingo we've got a physical replica but there's a problem here the problem is that I have to wait for the file to get filled up before I can copy it across so my my copy might be five minutes behind it might be it could be up to like third depends how you configure it so to solve that problem Postgres introduced something called streaming replication and the key thing here is this new process called a wall sender it's a process that runs on your database server and what this process does is that he instead of waiting to fill up the file before you copy that file to your second server this process can as soon as possible send those bytes over the network to your replicas and this works beautifully with streaming replication and Postgres even on old versions of Postgres your replica your physical replica instead of being a matter of minutes behind it's now seconds if not milliseconds behind the master but there's still another problem because what if I have to for some reason shut down my replica for a while I have to do some maintenance or there's a problem and while it's shutdown the the source system is happily chugging along and and he goes through and he didn't need all those logs so he cleaned some of them up and then I bring my replicas back online well that wall sender process can't send the changes because they're gone so now I have to go to my backups pull the logs out of my backup system apply in the old-fashioned way this is not optimal to solve that problem Postgres introduced something called replication slots and really it's just a little bit of metadata inside of the database but it tells Postgres oh hey somebody's reading the logs and they haven't read this one yet so don't so keep it until they come back and they read it that beautifully works that worked beautifully it solved that problem and with with replication slots combined with streaming replication building on top of the right ahead log we actually have a great physical replication solution where the logs don't disappear the replicas can can stay current and have a very low lag from the master and with all of this in place we started thinking what if we can make that wall under process smarter what if we could upgrade the brains of that guy what if that guy could understand the records that he's reading instead of just blindly sending the bytes over the network so in Postgres 9/4 still a number of years ago Postgres introduced something called logical decoding and what logical decoding did is it did exactly this it just made the brains of the wall sunder like a whole lot smarter that the wall sundar now he can understand the changes that are happening he can group them by transaction and then he can keep track of the status of those transactions if a transaction aborts or it rolls back would just throw it away if a transaction commits then at the point that that it commits then we can send ship that thing over the network to wherever the destination is and then Postgres did something that is like totally the Postgres style of doing things and they made that little piece that sends it over the network they made it a plug-in based idea which is like yeah the more you work with the Postgres thing you see this is this is very much like something they like to do to have things be extensible and pluggable so what that means is maybe I want to build some kind of a service that consumes all the changes coming out of a Postgres database but I'd like to get it as JSON okay so I just used the JSON plug-in and now all those the wall sender when it sends those changes over the network to my service that I just built I get it as JSON or maybe I want it as raw sequel you can do that too you just use that plug-in and then any downstream system that's able to understand simple basic sequel statements can receive those changes as sequel and you can plug them in down there with all of this in place we're finally in the position to sort of introduce what was actually a real big announcement last year in Postgres 10 and we called it logical replication but really if there's one thing I want you to take away from this this little part of our talk today it's this I want you to see that the big logical replication announcement Postgres 10 it was not like this big feature that was built from the ground up merged into Postgres 10 and shipped out it was actually just this small layer on top because in this again by the way going back to what you see this also when Jim talked about partitioning you see the way the Postgres database is developed and this is kind of an important thing to understand is that the it's very common for them to iterate on features right so they'll get kind of a core functionality in and then they'll harden that they'll mature that they'll they'll work that out and then they'll add a little bit more the next year and then they'll add a bit more than next year it's a great process it's not super marketing friendly sometimes but but it results in a wonderfully engineered piece of software so last year and Postgres 10 we got logical replication which really the the three really big pieces that I see that that we're left to do were the sequel sent well first you need is a process to apply the changes on the far side receive it and apply it then you need that some sequel syntax got to work that out you got to work out a security model so what needs superuser and what doesn't how are you gonna do grants and privileges and things like that but this was all built on top of logical decoding well let's take a look at that actually let's go back to our other slide you see we have physical replication and Postgres read replicas are built on they leverage the log for it for the efficiency of it they're built on streaming replication and replication slots we have the we have the the database migration service and there are a whole bunch of third-party solutions for taking data and streaming at all the changes out of your Postgres database those are built on the log for efficiency streaming replication replication slots and the logical decoding framework that we introduced in Postgres 94 and then we have the publication and subscription the in Postgres 10 the logical replication that was introduced there and that's again built on the log for efficiency stream your application or replication slots and the logical decoding framework that's been in there since 94 so if you're looking for a logical replication solution and Postgres today it makes all the sense in the world to start by just taking a look at what's right baked inside of the engine it's simple it's available it's it's just a couple of simple sequel statements where you can create publications where you on a table by table basis if you want to you can say this table published the changes for this table and then over in another database you can say I want to subscribe to that publication and receive all those changes yeah it's efficient it's simple it works well it is also worth mentioning one other important choice that you have on RDS as well which is that we have an extension called PG logical quick word about extensions because this is a concept that not every not every database has an extension in in Postgres is is really almost like another it's another piece of code that can kind of plug into the database to give us skills that it didn't have before make it sort of like an up you know upgrade its capabilities sometimes extensions can be even that that's separate bit of code could be maintained by a different group of people maybe an overlapping group of people a really fairly widely known one is the geospatial capabilities there's an extension called post GIS and it's a whole community of people that maintain this code you plug it into your Postgres and then when you type you it's a sequel statement that says create extension and when you do that create extension it will load everything in and it will really it's like adding all these capabilities to your database engine that you didn't have before we also have one called PG logical and PG logical is a logical repla it's a set of code that will enable Postgres to do logical replication it's been around for a bit longer in fact I I wouldn't be surprised to see some some more capabilities trickle from PG logical over over the coming years into the the core stuff some of the people that work on P geological are some of the same people that have been helping develop the logical replication capabilities in core Postgres so if you need support for older versions of Postgres if you need a little bit more sophistication and how you handle conflicts or DDL or sequences take a look at the PG logical extension as well so logical replication is something I was really excited to talk about but this last topic I actually think this might be the single most important thing that we cover during our presentation today which is monitoring when you think about monitoring in RDS I would like you to think about think about monitoring in terms of four main tools that you have in your tool belt performance insights enhanced monitoring cloud watch and then third-party solutions or custom things that you build yourself on top of our API performance insights is one of the things I was just telling somebody before the talk started I'm a huge fan of performance insights I've been I've been talking about it with everybody that I talked to over this past year performance insights is like having a window into the inside of your database to see what's really going on in a nutshell what it's showing you you at any given point in time you have a whole bunch of connections some of those connections are active they're processing a sequel statement some of them are not doing it they're idle and what performance incise is just summarizing the active sessions for you so like in this picture over time and it's tell it's going to tell you what sequel statements are currently running but then it tells you more than that tells you what they're doing and even it'll break down each sequel statement to tell you for that sequel statement maybe 60% of the time was on the CPU but maybe 40% of the time I was doing i/o and then you can go from there and you can pull your maybe your query plan or your execution plan you can drill down on it but the great thing about performance insights is that it with performance insights I'm able to much much more quickly get straight to where I need to to go next in addition to the two query plan one other point or thing I want to point out is that we do now on Aurora this is on by default it's not always on by default and RDS Postgres is there's another extension there's an extension called PG stat statements and this gives you a detailed statistics about the sequel statements that are running in your database generally I recommend for people to turn this on because the value of that data is huge when you're trying to troubleshoot systems it'll tell you things about the average execution time how long does that sequel usually take to run which you get P I won't tell you did this sequel run for 30 seconds or 23 seconds but it'll tell you it was running so you need the the statement stats to pull things like that one other point a quick tip I wanted to throw in today is that I normally do this but this is I've seen this one super clever thing that a couple customers did it's actually worth passing along which is put a comment if you're able to sticking a comment at the beginning of your sequel statement was just something unique in that comment no I filename and a line number a gooood something but the key is it's searchable and the really clever thing there is that when a sequel statement pops to the top of performance insights or PG statements what that enables you to do is is oh there's that string I can search on that and like in in basically like a second you're right at the line of code in your application of source code you know exactly where that sequel statements come from and trust me I've spent like I've spent so much time working with people and trying to figure out okay exactly which part of the code was running this sequel statement cuz there's like three different places that look like maybe they could have made that sequel clever little trick that I wanted to sort of mention just because I've seen that turn out to be so useful now we're performance insights is like having a window into your database I think of enhancement monitoring as having a window into your operating system it's almost like being able to log into the box with a shell you can see your top processes list you can see a whole bunch of metrics from things like page tables to to memories detailed memory statistics the load and the CPU usage IO stuff about what the operating system is doing the one thing that I do want to point out with enhanced monitoring that I often find myself telling people is to pay attention to the granularity so when you turn it on you can choose how frequently do I want snapshots every 15 seconds do I want to take a snapshot every 10 seconds five seconds every every second literally just last week I was working with a customer and we're trying to solve solve answer some questions and with even though they were on an old version of Postgres with enhanced monitoring we could identify the exact second when the kernel changed its behavior because of memory pressure on box and that level of granularity is invaluable when you're troubleshooting so so I mean I would encourage you to flip on a em and run it with a low level of granularity that you're you know don't do it at like a minute it's really valuable to have less than that finally cloud watch and cloud watch has been around for a while but the thing that I find most valuable about cloud watch is not just the metrics but what you can do with them right so you can use cloud watch to make sure that you get a lot you get alerted that you get a page or a notification of some kind of a threshold gets the crossed and there's four things that I often I find myself often telling customers to keep an eye on and it's worth sort of pointing out right here first of all is your transaction ID usage so Postgres has we have a metric that's that will tell you how many transaction IDs have been used and basically what this is giving you is this telling you so in Postgres there's sort of a garbage collection process that's called vacuum and it's it's an important thing to keep an eye on and what you kind of just want to know is that there's a metric that'll tell you about your transaction ID usage and if you if you zoom out a little bit you should be able to see there's kind of a normal range that that thing runs in and you want to know if it goes outside of that range so setting an alarm such that you know if this is doing something unexpected secondly there's a metric call and this is a new one called DB load or database load and this is coming from performance insights and what it's telling you is at any given point in time it's real simple it's actually just telling you how many of my connections are doing something right now the neat thing about that metric is that if you build an alarm on that and I've actually been doing this for years even on like other database systems and other places and I found that simple little metric it can catch CPU problems it can catch all kinds of problems because often when something happens one of the ways that manifests itself is sequel might slow down and if sequel slows down your connection number of active connections starts to go up as your application picks up more connections from its connection pools so this metric can can alert you if there's a CPU problem or a whole bunch of other things the third thing that you need to watch is memory on any date bass really any database it doesn't matter the engine when you have high connection counts memory management is of paramount importance to quote a colleague of mine actually who says that it's it's really true and it's certainly true with Postgres so we have a couple of cloud watch metrics free able memory swap usage you want to keep an eye on this - it's just one of those things that it's good to be proactive if this is something that's good to watch and possibly alarm on and the final thing is your disk usage right it sounds kind of it sounds kind of like old-school a little bit but your disk usage for most people it's gonna be a line with a slope but the thing you probably want to know is if that slope changes unexpectedly right a little bit earlier I talked about replication slots you could as one example you could build a service and your service is using logic the logical decoding framework so you can efficiently consume all the changes from the database well what if somebody shuts down that service for one reason or another and they forget to clean up the metadata in the database you know the database doesn't know if that was temporary if they're gonna come back online in 15 minutes and they need those logs or if they're shut down permanently so it's actually possible that you could accidentally get into a situation where you shut something now the database is not able to clean up logs they're gonna start using disk space that's the sort of thing that the sooner you find out the better right if you find out 3-4 days down the road you have a lot more work to do with dealing with it and it's great if you can find out quickly with these sorts of things so it is slightly crazy to try to cover five different topics in a deep dive session that's only one hour long but we've given it our best shot and I think there's a lot of valuable stuff in there I'm just gonna leave this slide up and Jim and I will be available for questions over here on the side if anybody wants to come up and chat thank you [Applause]

Info

Channel: Amazon Web Services

Views: 7,861

Rating: 4.9595962 out of 5

Keywords: re:Invent 2018, Amazon, AWS re:Invent, Databases, DAT324, Amazon Aurora, Relational Database, Service (Amazon RDS)

Id: 6YmcteXoBvU

Channel Id: undefined

Length: 46min 25sec (2785 seconds)

Published: Thu Nov 29 2018