Deep Dive on Amazon Aurora with PostgreSQL Compatibility

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

welcome everybody this is uh the short time during this conference that I could put in my my day job hat on for for an hour and talk about AWS most of this conference I put my community hat on and try to help arrange this thing so you'll see me running around a lot and I'll take off my my nice shirt put on a t-shirt as I move around boxes the rest of the week but I'm Gemmell Jen ski I'm a database engineer part of the Aurora RDS post quest teams and we'll talk a little bit about Aurora go deep a little deeper than what Kevin just did and then end it off with some stuff some brief stuff about performance but we'll let Jim Finnerty after me go a lot deeper on the performance aspect of it but we'll just try to build on it throughout the course of the day that's kind of what we're trying to do today is educate everybody about Postgres in the AWS world leading up to this afternoon of how people are using it so one of the things that I know at least I do is from the vernacular standpoint I talk about Aurora and RDS right it's really Aurora is part of RDS it's one of one of the main engines of of RDS it's part of the same platform but just in order to distinguish between RDS Postgres and rora Postgres i end up calling it RDS and rora a lot of folks at AWS do the same thing and it's kind of just in the vernacular of everybody but Aurora is part of RDS so all the same tools apply your your same console and everything works but it's just part of the same family that's there but will will intermix the terms a bit and throughout the course of today you'll hear say RDS and that means already has Postgres which is a lot closer to the community versus Aurora Postgres which has the Aurora storage underneath it so one of the questions that get asked at the end of Kevin's presentation is what's the difference what you know talked briefly about some of them but when you're making the decision you have RDS Postgres sitting on top of EB s volumes we're a Postgres sitting on top of or storage but from your application they're going to look the same right you're going to connect with the same JDBC drivers ODBC drivers P sequel it's going to look like a regular Postgres database it's going to give you the same backup and restore capabilities point in time recovery even though Aurora Postgres will do it in a different manner you have that same capabilities same thing with high durability and high availability Aurora is built to give you higher availability and higher durability but you're able to get it in both of them from security from doing things like I am a thon tent ocation so if you have higher security environments allows you to be able to log in without passwords by getting a token that's only alive for 15 minutes same thing we read replicas you know cross region snapshots and scaling out one of the things two of the things that aren't there in a rural post quest yet is cross region replication and outbound logical replication so if you want to be able to replicate your database from east coast to the west coast it's not there yet in Aurora Postgres it's on its way same thing with outbound logical replication as Kevin talked about in his presentation of being able to have replication outside of Aurora that's also not quite there yet but it's also on its way but those are a couple of the the differences between the two today right and the idea is to have them in parity we have Postgres 10 and Postgres 11 as kevin has just got announced last week got released last week so you have post-course 11 not one available and RDS Postgres so it's no longer in preview it's in GA so our storage is log base storage what that means is as you're making your rights you're just sending your transaction locks right that gives us a lot of advantages one of it when you have a traditional database when you're talking about Postgres other databases have you as you have a bunch of work to do it goes into a queue goes into a log buffer and that log buffer has to flush out the disk you have to f synch it in order to be durable right so when you have water so as you're having more and more data flowing in more and more transactions in a group commit some scenario where you have a lot of concurrent transactions going on what happens is that log buffer gets full and you have to wait in order to flush it out to disk in order to be able to go to do the next amount of work so it has to flush out be able to F sync and then come back and do more work one of the advantages of the architecture or our storage we don't have that same bottleneck right what it is is you have a a each each one of the transactions what it's going to do as it's ready it's just gonna flush things out the storage right so this way you have multiple things you have five different transactions all as they're ready flushing everything out the storage but we're keeping track of the durability as Kevin mentioned in his talk we need four of six quorum for durability so when four of the six storage nodes come back and say that this is durable it's going to be able to acknowledge the commits back to the client and we're going to go a little bit deeper about storage nodes and how that works but at first nobody came back and said we're durable yet we just flushed out those five transactions but after you wait a little bit you know you know fractions of a millisecond two of the six came back on for a transaction to a and B so it's almost there it's halfway done then you wait a little bit longer transaction a hits four it's good it's durable it's across four of the six but if you notice C also as it for but we haven't acknowledged that back to the client Postgres is a acid compliant relational database everything has to happen in a specific order so we can't acknowledge it until B has been at all so there so it has to go in order right so it's going to wait in order an acknowledge C has gotten back to the client until B is also done so if you wait a little longer then you're going to see that B and C e also getting acknowledged they're durable they hit their 406 quorum and also a is also at 5:00 but we also have to wait there because D isn't quite at the four of six yet so this way each day each transaction as its ready to flush out the storage it just goes ahead and does that there's no waiting for the log buffers waiting for everybody to F sync on that same transaction log volume one of the other big advantages of Arora Postgres is we write a lot less so in a traditional database you have to do things called checkpointing and what a check point is is making sure that all your changes that are that you've made recently in your database that you've written out to your transaction log are flushed out to the main data files themselves otherwise you're gonna have a really long time there for recovery trying to replay all your transaction logs so periodically you have to do that and in Postgres you have to you end up doing extra work when you're doing that called full paid writes and we'll get into that so if you're writing something in Postgres you're gonna do an update and one of the things about Postgres with multi-version concurrency control so different transactions gets are not waiting on each other though where you have locking an update and Postgres is it a logical deletion of the old version or inserting a new one so every update really is affecting two different rows right that's written a transaction log and we're getting that full page after a transaction after a checkpoint Postgres has to write out the full page out to the transaction log and we'll get into that in a second why why we need to do that but then after that we insert a new row that gets written out their transaction log right and then we do our checkpoint flushing out to the data file so we want to be able to rewrite that and write that page actually down to the data file itself on the storage right also what we want to be able to archive that log being able to push that out so that in the event for point and time recovery we want to be able to get that log file back we could roll forward to any point in time so or backups and that's pushed out to s3 but one of the things that o of why we need that full page rank is just the mismatch between linux and Postgres Postgres writes everything in 8k pages Linux writes everything in 4k pages in the event that there's an error or a crash in the middle of flushing out that page you get what's called a torn page four of that 8k might have been written out to disk so this way you're in that event your database would be corrupted or at least that page would be corrupted so what Postgres does in the event of a recovery it takes that full page that's written to the transaction log and overwrites that entire page in a recovery scenario so this way you don't get any torn page scenarios right in Aurora we up that update that row we need to get those two versions of the row just like any update Postgres works the rural post cards works the same as Postgres in that effect with multi-version concurrency control and that's going to be important when we talk about vacuum in a little while and that's flushed out to Aurora storage you don't need that full page right because Aurora doesn't do any checkpoints all that work is pushed down into the storage nodes themselves right and then we just happily go along add that next row will flush out Aurora our storage out to s3 for backups but there's no there's no checkpoints and because of no checkpoints we don't need to do full-page writes it's one of the ways that we get higher throughput as we're as your application is running you're just writing a lot less the storage with Aurora as opposed to what you're doing with RDS Postgres or community Postgres so what happens inside the storage notes so we we talked about that Rory you have four or six quorum so on a very small database you might be across six different storage nodes big databases you might have to be across hundreds or even potentially thousands of different storage nodes this is what happens inside one of those right so when our aurora read/write node goes and makes a change right it flows into an thank you that's in memory it then goes into the update queue where it's persistent to storage right at that point it's durable and it could acknowledge back so that's the entire aspect that's synchronous waiting for that acknowledgement back from the individual storage notes when you're waiting for that 4 or 6 quorum it's coming into that queue flushing it out to the update queue so it's persistent on storage and then you get the acknowledgement back you are persistent in disk at that point and behind the scenes that update queue goes and does some extra work asynchronously that you don't have to wait for it's just going to be doing things behind it because it's intelligent it understands the wall format is doing things it's writing out that update to the data blocks because when you need to do a read its you need to return back those data blocks all right so you get it's going to coalesce those those changes into the individual data blocks right it's also going out to the hot lock were their most recent changes one of the advantage to the Aurora storage is with that 4 or 6 quorum the the different storage nodes use peer-to-peer gossip in order to be able to fix one another so the event that you don't have to coordinate that from the main Postgres engine the individual storage nodes fix themselves in the event of any sort of errors and that hot log pushes stuff out into into the individual pieces right and all of that is pushed out to s3 for storage all the time so you don't have to create a backup window or anything like that all of everything's being backed up automatically for you behind the scenes into s3 one of the differences between Aurora or Postgres and RDS Postgres RDS Postgres you have to give it a baton it's a backup window Aurora Postgres it's always backing up so this way when you're when you do an individual read request it's pulling it back that data block which has been coalesced from from the changes that's been pushed in there she's just the show some of the differences here of that and how you get additional throughput we created a test table that has a bunch of different type of data in there nine nine different columns right we just random things that are in there we indexed every column right that might be not something that you would ordinarily do but you might have 50 or 75 columns in there and you might have nine indexes on there right you might have thirty indexes on there not not uncommon to see many indexes on on wider tables but just creating a large number of indexes on a fairly straightforward table what we did is we did an insert workload test and across the the the y-axis is the insert per second and it's just time across the the x-axis and what you see with Postgres with base Postgres it starts out running pretty fast at around 25,000 inserts per second pretty fast but it drops off radically very quickly it's a very steep drop and what's happening there is as your indexes are getting bigger and bigger you're doing more full page rights as thing is your check pointing you're gonna have to do more and more work in order to be able to maintain that right so it drops off radically as you have to do all those full page rights because you have to do that not just for the main table you have to do that for every index that you're maintaining so one of the things that DBAs typically do when they have to go through a lot of transit going through a lot of transaction locks they increase the size of their transaction locks so this way you spread out those checkpoints more so increasing your your transaction log size your wall size the right a headlock out to 16 gigabytes from the the default of two it chugged along fine for about 20 minutes right around 25,000 TPS and then you saw that drop-off again he started getting that same sort of effect but Aurora is across the top it was a little bit more than it's about 27,000 TPS or so but it was consistent you're not doing those full page rights right so you this way you're not getting that jitter or that massive drop-off of having to be able to write that that additional work out to storage because Aurora doesn't need to and looking at it for the same thing of updating right so if you're updating two columns you get it you see that standard Postgres or RDS Postgres at 3700 TPS whereas an R or Postgres at 17,000 you see that the same effect in updates as well because an update again is a logical deletion and an inserting and a new version of the row in Postgres do you get that same effect that every time you do an update it has to be able to insert new things into an index but what like I mentioned one of the standard things at DBA Zoo in order to get higher throughput is to increase your transaction log size in order to be able to spread out those checkpoints right but that comes as a a trade off whenever you're going to increase that transaction log size ya have the expense of recovery time so the in the event that server crashes for whatever reason or if you even simply do a restart of the server because you want to change some parameters that require a restart you have you have to be able to shut everything down checkpoint everything out and the longer there is that longer it takes to be able to do that so for doing a a simple test here where we have three gigabyte a redo or wall it'll go through with a about twenty eighteen thousand rights per second which is along the x-axis here and it'll have a recovery time of nineteen seconds so if your SLE lays for getting that up after some sort of crash it's really low great but if you're you're giving up a lot of performance of only getting about eighteen thousand rights per second you can't increase that make that that size bigger more about ten gig right but when you're doing that you're gonna be you're gonna get you know thirty seven thousand or so rights per second at the trade of your recovery time now your recovery time went from nineteen seconds out to fifty seconds you go even further of making that 30 gigabytes right you get even more performance up around over forty thousand rights per second but now your transaction your recovery time is up for two minutes right so - one of the unfortunate things when you have databases that do the checkpoint you're trying to balance performance versus your SLA s out to your customers right it's it's you're not making a technical decision you're making a business decision in order to add how to be able to maintain your database I think but that little dot out in the corner that's kind of hard to see is Aurora because there's no redo right recovery time only takes three seconds because there's nothing - there's nothing that's to go through a recovery and you're doing a hundred and thirty-five thousand transactions per second because you're off that checkpointing the full page writes you're able to get that higher throughput and you don't have to be able to trade your business rules for the recovery times so let's talk a little bit more about the base architecture we've talked about storage nodes in the 406 quorum what does that mean so whenever you create a new instance of Aurora it's spread across three availability zones so when you when you're talking about Aurora storage you want to think about it as just essentially clustered storage that's there it's automatically spread across three availability zones for you and you have multiple storage nodes in each one of those so when you're creating your application it's going to create each one of those storage nodes and inside of that when you ask for something it's going to create ten gig segments in each of those storage nodes right and that'll automatically scale for you up to the 64 terabytes that Kevin mentioned and then when you have your applications across multiple availability zones we're all connecting to the readwrite nodes and it's just gonna write the transaction log records out the storage right no need to do the flushes out of the actual data pages itself Aurora Storage is handling that for you and it's gonna write it out to those six individual storage nodes as we were talking about it's gonna read those blocks back it's gonna try to get it from the local availability zone where it's gonna be faster you keep track of what the highest the highest transactions are so this way you're getting back the right versions but in the event that something happens to one of these storage storage notes one of the segments didn't get the last updates for instance right we did four or six for the quorum one of the six never actually made it their network clip whatever right we're we understand in the world of computing nothing's perfect something didn't get there that's where the peer-to-peer gossip comes into play it'll automatically get it from one of the other storage nodes that have the latest version the Postgres engine itself doesn't have to be able to repair it Aurora storage is going to self repair those for you Postgres doesn't need to worry about that again doing less work or our storage is dealing with that right so in the event that a whole storage node goes down some of your data is on one of those stores you know the whole thing goes down again Aurora Storage will automatically take the the different segments that were on that storage ode and replaced them into other storage notes completely transparent to you in the application and then when you're speeding up a read-only replica right read-only note when you're doing that in or or Postgres you need to be able to make a whole copy of the data in Aurora Postgres you have shared storage there's no need to be able to create that copy of the data it's just gonna pull that back from the stuff from the shared storage and there's going to be a little bit of communication between the the read/write node and the read-only node in order to be able to sync up the caches and really invalidate the caches that are there and you can create many of them more than one to three you can create up to 15 read-only notes so this way in the event of a failure of the readwrite node you could just go ahead and from it'll just go ahead and automatically promote one of your read-only notes so going a little bit further into why four or six quorum why did why did it go along doing that so when you you're talking about RTS Postgres we have multi AG you have synchronous replication from one availability to a second one why didn't we just do the same thing with aurora so inside of RDS Postgres right we do a commit it gets flushed out to an EBS volume comes back great when you're talking about a single availability zone really under the covers you're making two copies of it question back there we'll get to that we have slides for that right so under the covers EBS isn't just writing out a single copy of it it's also backing itself up or it's making another copy of it so when you write it out to an EBS volume it's also synchronously writing it out to a secondary right and has to come back acknowledge it and then go back so this way you're already getting multiple copies of your data with EBS but in the event that the entire availability zone goes down or that instance goes down you want to have it in a second availability zone so you go ahead and create a second one great but really for things like Aurora you want to have even greater availability across multiple availability zones so why didn't we just add a third one compared to what was there right so we do commit when you're talking about the multiple availability zones as to get synchronously flushed out to the the secondary and the tertiary right and the the local availability zone is writing things out to local ubs volumes it's going ahead right the secondary is kind of progressing a little bit more maybe the the tertiary was a little bit further away geographically it took an extra fraction of the millisecond for it to get there so it's a little bit slower starting the work of flushing everything out to the EBS volumes all right so we acknowledged we acknowledged back from the local ability zone right finally we get it back from the secondary but we're not done yet right we have to wait for that tertiary to finish in order to get all three availability zones all synced up synchronously right so what happens if you actually lose an individual availability zone or that instance that tertiary instance goes down we have to fence that off so this we don't do any rights to it while it's repairing itself right so all the rights we go to their primary and secondary when the tertiary comes back up it has to catch back up again right and has to do a post course would have to do all the work of being able to do that but we did testing of looking what that would look like under a right workload of having two copies versus three copies the two copies being what RDS Postgres does today right so at the 50th percentile right difference between two and three replicas six milliseconds versus seven really not that big of a difference if you wanted that extra availability great you're you know giving up one millisecond big deal right and then even as you go 90 or 99.9 you might be able to do that trade of 22 versus 28 but that 99.9 that one in 10,000 transactions is four times longer you get a lot of jitter right of things that are there so this way that's model won't work as we add more and more synchronous replicas you get more and more of that jitter you have more the possibility of one of those transactions going slow right so it doesn't really work which is why the four or six quorum smooths out that jitter right so when you do a commit Aurora is going to push it out so all six of them all at the same time right and it's going to get back things from most likely from the local availability zone first and then you're gonna wait from one you get back to we got the four or six we're done we don't have to wait anything from that third availability zone we have four of six already committed right and gave us the acknowledgement back right and then finally the comeback great right so in the event that it missed one of those Aurora storage is going to resync that stuff the engine doesn't have to wait for it so to give you idea of just getting rid of that some of that jitter did another test using sis bench right we have things like over the course of 20 minutes not a very long test of the response time in in milliseconds Aurora gives us a nice even flow across the bottom there of nice very low jitter whereas Postgres with a single AZ not even going multi AZ we get all these jitters so if you look at the bottom of those that blue line it's fairly close to Aurora right not that big of a difference right but that jitter going up and down all over the place those are checkpoints right every periodically every couple minutes you got to flush out all your data pages and do a lot more work starving the rest of the system from just flushing out those individual transactions all right so you whenever you do a load test on a traditional database you're going to see that that wave action whereas Aurora you're going to see a nice flat line right so even at it so at the very height you know that low point there you're not going to see that big of difference between RDS Postgres in Aurora Postgres typical thing I see as people are benchmarking RDS Postgres versus Aurora let me do a single AZ an RDS Postgres said spread out my check point says 60 minutes and then run it and it hey look they're all about the same why would I use Aurora it's because they haven't done checkpoints once you start doing checkpoints all of a sudden you start seeing that and the average is go away off question same availability zone yeah it'd be on a client putting the same availability zone so we don't get the latency between the engine driver and the main database server itself it's architecture around pushing a lot of the work that you would normally do the postcards engine has to do with the background writer flushing out the data pages and doing checkpoints of taking that same essential work and pushing that down into Aurora storage so instead of having one giant checkpoint you're doing all that work of coalescing those data pages across many storage nodes you're doing in parallel and not impacting the main engine itself yeah right right so you're only writing out those transaction logs and then Aurora Storage takes everything out that you would most courses normally have to do and does it in parallel across many storage totes right so talking about the replicas and clones which we had a question about right so in Postgres when you need to be able to create a replica so RDS Postgres you ask for a replica we take a snapshot of the EBS volume create you know move that over to a new EB s volume spin up a Postgres read-only note it attaches to that EBS volume right and then we have to catch up so that might take hours depending on the size of your database in how how much transact how many transactions you're going through your read right now right if it's a very heavily loaded system it might take a long time to be able to catch back up again right so as as you're doing updates in Postgres whether it's going to be community Postgres or RDS Postgres when you do an update need to be able to write that out to the EBS volumes but you might have to actually read that block into memory first in order to be able to update that and then once that comes back asynchronously it's going to go send out that that log record out to the read-only node on post Postgres and get has to do the same amount of work it might have to pull that log off of storage B in order to be update that and flush that back out the disk again right Aurora you read right now it flushes things out there are starch the read-only node reads things off of our storage so when you do that update the readwrite node does the same thing that would with regular Postgres might have to read that block out of storage in order to be able to update it it'll then send that asynchronous replication request out there and it doesn't have to do anything to storage if that block is in memory it just invalidates that block but it doesn't have to make any changes because you have shared storage between them it doesn't have to duplicate that change the way it does with traditional Postgres so just as an example PG bench is one of the common benchmarking tools that the post-course community uses says simple for tables supposed to represent a banking application with counts tellers branches in history and one of the nice things about that that trick question okay the question is is if we're getting everything out of Aurora storage why do we have to send that asynchronous replica because that replica may have the block in memory and page so we have to invalidate that so it gets the latest version out of storage right so we have to send that invalidation message so it knows that it's no longer valid so then one of the nice things about PG bench is that we could do a rewrite test where it's going to touch all the tables or a read-only test or just going to touch the account tables all right so so if you're running that it's just going to run a lot of selects across the accounts table right and we'll do our asynchronous replication from the readwrite sending all its changes out to the read-only node right and it will apply it there but when it's doing that it's going to have to pull off all four tables into memory it's gonna it has to in order to be able to do the updates it's gonna have to pull all that stuff into the read read-only note in order to apply all those changes back into storage whereas with Aurora Postgres since it's only sending those invalidation messages right it's just going to do the in-memory updates for the things that happen to the accounts table here's a graph that I'll kind of illustrates that a little bit so the that orange line is your transactions right and what happened at about 23 or 24 minutes or so we did an update of the p PG bench history did a backfill of updating all the roads right we had the populate a value in there which happens sometimes in production you roll out some new functionality you have to update everything you add a new default value of some sort and what happens at that point when you're talking about Postgres all of that has to be replicated out there read-only node so that green line is the trend replication lag so things were going along great it was keeping up and then when you did that giant update all of a sudden things started lag all right quite a bit this is in seconds right so that's up to 600 seconds so your lag started going up to about 10 minutes so instead of things being there in milliseconds being basically up to date that giant update all of a sudden made your replica just wait for that giant update to get consumed right because all your updates have to happen in order or an order database everything's transactional has set everything all happened in same order so nothing else could go through until that giant update has been processed by the read-only note button rora we did that same backfill you don't even know this difference right and if we look at this this is in millisecond so this is lag is going along in about 10 milliseconds even when we did that backfill it's down to about 1 millisecond because that PG bench history isn't being accessed on that read-only note the the messages go over there and just get discarded so it didn't have to do any process so one of the other cool features is cloning creating a fast clone so where as replicas you're doing things in order to be able to create it so this way it keeps up to date with the the primary readwrite node clones you're creating something that's completely separate so you might have a reporting application that every night at midnight you have to kick off reports from yesterday instead of having a read read-only node trying to keep up over the course of the day spin up a clone at midnight and what that does is it creates its own readwrite node a completely different instance right but it's gonna be accessing clone storage so if you have a 20 terabyte database you don't have to make an entire copy of it what it does is just creates pointers out to the the main data itself right so you're not paying for that extra 20 terabytes right you're just creating that clone storage all right so come to that 20 terabytes comes up really quickly really as quick as it takes in order to be able to spin up the instances and attach to the clone storage right so when the application starts interacting with the node it's going to when it asks for a block it's going to pull it off of that clone storage that pointer and it's really going to pull it off of that main main main storage question know you see that in a second right so the question is what happens when we make changes right so if we go ahead and make that change what happens it's a copy on right so it copies that block from that main storage brings it down to the clone storage so this way you have two distinct copies of it and it breaks that that pointer link right so this way depending on as things diverge they're gonna have you could potentially have duplicate of the entire database if you update all the rows in your database but it's only going to do it as you start writing and making changes to that you might insert new rows into that clone storage it's just gonna write it a to the clone storage area right or from the main application you start writing logs there it's just gonna write it out to the primary storage area right or if you make a change to it it's gonna do the same copy unwrite flush out that old version out to clone storage and write the new version in the primary storage area so this way it wins the instance that you take that that clone there the same you're pointing all the pointers are out to the primary storage but as you start making changes to either primary or the clone it starts diverging alright that's why you're able to make those clones very quickly if it's a read-only application it's gonna be what it was at that instance of when you took that clone so you might be at midnight every day so this way you don't have to worry about you know all the changes that are happening over the course of the day your database is good at midnight right one of the great things about this is you want to test things on your primary database maybe an update you're rolling out some new DDL you want a time how long it's going to take in order to create a new index do it on a clone you're basically doing it on production right without having the overhead of you know restoring from backup right you're just doing it off a clone right and illest well it this way you could take an essentially an instant copy of your production database and do whatever you will with it as opposed to having to restore from backup take a snapshot restore from that which will take a long time of duplicating the entire storage with Aurora storage we have intelligent storage good Jim so it's yeah it's it's basically a near-instant copy of your production database and it'll run the same as your practice the same production database so that illustrates that here doing PG bench kind of throttled it at 20,000 TPS right and that's what the that orange line is of doing 20,000 PG bench transactions and then we requested a clone at about 20 minutes right and then grant who did this test kind of walked away and didn't keep track of when it actually started but when he came back to his desk he he kicked off the same test on the on the clone about 20 minutes or so later and it's doing exactly the same TPS you start up the same instance ice for that clone it's going to give you the same performance it is essentially your production database is the exact copy that comes back very quickly so going a little deeper on replication right so with logical replication as Kevin mentioned and it is talked before we're taking the physical changes in the wall log the transaction log and converting essentially that in the sequel statements right and with that we could you know do a lot of cool things with it there's a lot the logical decoding plug-in right that allows different types of formats is coming out instead of getting out straight insert update delete statements out of your logical you could also get those changes as well as JSON so the where you could process it in different ways a lot of the Postgres drivers like the Python driver the the JDBC driver supports reading the Postgres logical replication stream so you could write it in that and be able to access those your changes and just feeding you directly off that stream by essentially making a JDBC connection so what logical decoding works is by pulling from the main server itself right and then you could take that that application you written in Java or Python and push that into something like Kinesis right or it could go use DMS which uses the same mechanism in order to be able to do that to go to another go to an RDS Postgres instance could go to s3 could go to dynamo right so this way you it's change data capture off of your main database being able to push that wherever you'd like right or out to redshift what a lot of people do you have your main OLTP database running on Aurora you want to go deeper reporting out to a data warehouse running on redshift but one of the nice things in Postgres 10 is the ability to be able to do that without having something else in between right so we had an ec2 instance we had DMS that are sitting there Postgres 10 has a publish/subscribe mechanism so this way Postgres could be able to talk directly out to the the master server and be able to pull it out directly itself so you could have when this is released out into Aurora and it's out in some private previews now you could talk to the guy back in there there in the corner if you want to have access to it the better-looking one what you're able to do is have as your changes are happening in Aurora it could flow out into a community Postgres database maybe it's on-premise maybe it's on ec2 so if you wanted to be able to leverage some of this stuff with Aurora but still have an instance running some PLR or some PL Python you're able to have your main transactions go through there and be able to run some of your reporting through PLR on an ec2 instance because you could subscribe off of the main aurora instances right or it could go out to an RDS post-credits or another Aurora instance right so cross reading the replication question good right yeah that's that's absolutely true so the the folks that originally wrote P geological a company called second quadrant they they pushed that into the community the community didn't take all of you geological they took pieces of it so yeah you could use P geological to be able to do other things like replicating DDL for instance that sequences yeah and sequences so there's other aspects to it and an Aurora or Postgres and RDS both wonderful when our Aurora Postgres supports logical replication it will support things like P geological the way we do with RDS post-crisis yeah that's definitely something we could talk about offline there's some folks here whether it's Kevin or Dennis that back there in a corner about getting involved in in some of the previews of that that Europe will be able to take take a look at and you know the best time like it give you soon right but those folks could give you better timelines out after the fact so cross region replication right so you're running on us east you're working for a company that requires a dr plan right so you want to be able to have everything running isn't in another region as well so what cross region replication will do we'll introduce two new things a replication server and a replication agent that's sitting there at the storage level so this way when a transaction comes in right it's going to get pushed out to our storage invalidate the read-only nodes on the same region they're also pushed things out to that replication sir so that replication server will push things out to the other region in the replication ain't engine which will do the same sort of work that the rewrite node and push everything out to or our storage and also invalidate the caches in the read-only nodes in the other region so in the you have the ability to have multiple read-only nodes on that other region so if you want to have things that are closer to your customers of being able to have reporting and other read-only queries running in a different region closer to your customer you get up multiple of those it also in the event that you lose some of the transactions get lost along the way right again Postgres itself doesn't have to do the work to catch it up the replication server all pull things out of aurora storage on the local region and be able to repair that on the the remote region right so this way it doesn't affect your transaction throughput at all so one of the use cases of that is if the region goes down right how do you how does your business survive in the event that a region goes down right you could promote the the other replicate to one of them having a read-only node and this way it will happily reporting your applications at that new new instance and away you go some other customers are using that as we roll out new regions so you're running in US east but if there's a new region that's more advantageous to your customers being able to replicate out to the other region so this way your switch over time is a lot smaller you can replicate it in this way as you have a little bit of down time to wait for things to catch up you're able to do that switch over in a fraction of the time as opposed to reestablishing that new region from a snapshot right so this way as we're adding more and more regions as we're constantly doing right you could take advantage of those of moving to a more advantageous region region for your business so caching we're almost out of time here in Postgres you have a when you're talking about RDS Postgres we allocate about 25% for your you're out for Postgres and for the OS and then it's main cache shared buffers we allocate about 25% all right which is kind of standard practice in the Postgres world if you read about that and the rest is for linux page cache right of about 50% right so in the event that you're looking for something it's going to try to pull it out of shared buffers it's not in there it's gonna try to pull it out of the Linux page cache and if it's not in there and then it's gonna pull it back from EBS storage and it's gonna have to go all the way back up the stack again of repopulating that so you get duplicate buffers you get double buffering for that 25 percent that's in shared buffers right with Aurora we have the same 25 percent for the OS and Postgres but we allocate 75% for shared buffers you're not getting that same double buffering because we're writing everything out - Aurora storage not to the local file system right but one of the advantages of of being able to have that Linux page cache is the event that you have to restart Postgres because you change the max connections it doesn't have to it doesn't get rid of that Linux page cache things are still in cache so when you start that backup and you still have a pretty warm cache when you do that in post karora Postgres we didn't have that same benefit so we had to create a new thing called survivable cache so then the event that you have to do a restart or just you restart Postgres your page your shared buffer stays there and your not to repopulate everything all right with that survival cache again trying to speed things up a little bit here you know doing this for you know scale a PG bench of read-only scale of 22,000 right which is a working set of about 350 gig running on an instance with an r4 16 Excel which has 488 gig of ram gives us that 75 percent cache is essentially everything's in shared buffers and that gives us in a read-only note 689 thousand TPS a lot of transactions that really selects that you're turning through really quickly right when we did that with Postgres with a 25% cache we're getting four hundred seventeen thousand transactions for it's still really impressive numbers but 1.6 times slower right so we were just limited by I ops because it's because of the double buffering not everything was in RAM right so you read some of the good postcards community lists they say will reduce your shared buffer size and use all the Linux page cache when you do that down to 10% it gets even slower it's down to X and that's because it has to do more work of checking shared buffers then checking the Linux page cache to pull everything back out again right so there's no no double buffering there but just to prove that Postgres does about the same when you set that up to about 75% you're doing 682,000 yes which is essentially the same as number so you get that same throughput of pulling everything at Aram there was no no changes in there are Postgres for that particular use case but you don't get that survivable cache go for another couple minutes and when I get to the performance section I'm just going to stop because Jim's going to go deeper about that one of the other features is cluster cache management so what what happens what do you do a failover when you want to be able to go to a new a new server whether it's planned or unplanned goes along you're doing a PG bench at 350,000 TPS and then you have that failover at about 600 seconds the RORO Postgres pretty quick to fail over time 33 seconds problems when you fail it over you're at a cold cash essentially so this line here is the the p90 so waiting to get back to 90% of what your transaction throughput is in order to get the same cache level right it takes three hundred and forty seconds to get to that same performance number in order to warm that cache back up again what cluster cache management does is it takes these different read-only nodes and you assign a priority to it right so priority zero is the highest priority before I fail over target all right so all the old ones are one so this way it'll prefer going to that fell over priority of zero and if you enable that with a PG CCM enabled turn it on what it does is it sends a the the read-only node sends her a bloom filter request of here's everything that's in my in my shared buffers but the readwrite no don't respond back of okay here's all the things that dress is all you have to load right and then it'll go ahead and start loading those things into memory names of shared buffers and eventually it'll be fairly close to what the the master is the net effect of that is when a fellow ver happens any when you have CCM enabled that that 340 seconds now goes down to 32 seconds because that cache is right warm to what that master was with that I'm going to jump past all this and I think that the last thing is the performance and that's all Jim will cover all that and with that questions this is a lot longer talk than 50 minutes thank you [Laughter]

Info

Channel: Amazon Web Services

Views: 2,868

Rating: 4.8709679 out of 5

Keywords: AWS, Amazon Web Services, Cloud, cloud computing, AWS Cloud, PostgreSQL, Amazon, Aurora, RDS

Id: cMJV2vvNsts

Channel Id: undefined

Length: 53min 56sec (3236 seconds)

Published: Mon Apr 01 2019