PostgreSQL Streaming Replication

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi I'm Creston and in this tutorial we're going to cover PostgreSQL streaming replication we're going to go over a few slides that explain some concepts and then we'll jump into the command line to setup database replication alright let's get started with PostgreSQL streaming replication so in terms of replication in general I'm talking about physical replication where your entire database cluster is replicated to another server typically called a replica it does this by using the wall files to stay in sync that is those are the right ahead logs so those are constantly produced on a master er that's an archive log mode and those are can essentially be transferred to a replicas to be replayed to keep up to date and in sync with a master now it replicates everything in the cluster so you cannot replicate only certain databases or tables or data if you're looking to do that that would be essentially logical replication as opposed to physical replication replicas can allow reading of data and act as a hot standby to replace the master so these can be up and ready to jump in the masters place in case of a failure and you can also have it so that you can do selects from those replicas a replication can be synchronous by default it is asynchronous meaning a change is committed on the master it's written to the right ahead log and stream to the replicas and then it's updated and committed there however you can make it be synchronous so that before it changes committed it needs to have been updated on one or more replicas however there are some caveats with this and you want to have a very good network connection between and have them in close proximity when doing synchronous replication too otherwise if it can't contact the replicas your entire your master database cannot commit transactions in addition replication can also cascade so you can essentially have tiers of replication so a master can synchronize to say a first tier replica and then a another tier of replicas can sync from those tier 1 replicas now you'll hear the terminology log shipping and streaming long shipping is essentially taking me while wall files and transferring them to the replicas to be replayed so that is an optional or you can also stream it that is where the replicas connect to the master or if you're cascading another replica and it streams those changes over the network to be able to update the database and keep them in sync that way that is streaming and then as a caveat all replicas must use the same major version so be sure to keep that in mind when you're using physical replication alright let's jump into the live examples ok I am using Ubuntu for this example and I have already installed Postgres version 10 which is the latest version as of this video and the first thing we're going to do I'm going to use this Ubuntu specific command where it essentially just lists the clusters better available in the system so P GLS clusters so by default when you install Postgres Tanna on ubuntu it's going to create a cluster called main you can see it's version 10 and it's at this default port this is important because the replicas I'm going to be setting up is going to be on the same machine and it will be on a different port and with a different cluster name but by default this is where the Ubuntu installer and places the data directory the log files and the configuration files are in a similar path for each slash etcs last PostgreSQL alright so we want to create a replica cluster so again I'm going to be using this to specific command on other distributions or other commands you can use so I'm going to create a version 10 replica cluster okay so that has been created it has not been started you can see the status is down and just to confirm that you can use PG CTL cluster just to look at the status and it says no server is running there you can also use the system D commands to check the status as well so the main cluster is running and I can also check the replicas and we can see that is inactive or dead okay so as the next step I want to create a separate archive directory where the ball files are going to be produced for each of these clusters for both the main and replicas so I'm going to put it this is the base of the Postgres directory the home directory and I'm going to put it in the PG log archive and a folder for the main and for the replicas so I'll go ahead and create those two directories now I want to go ahead and configure the main cluster for replication so again the configuration files are in this path on Ubuntu they may be different in your distribution I'm going to look for the wall level so we want to make sure that this says at least replicas so that's good I'm going to go ahead and turn on wall log hence as well this is important if you want to use PG rewind you don't specifically need it to do it for this example but it's it's something that I suggest doing if you're wanting to use PG rewind in the future make sure the archived mode is on somebody thought it's off turn it on and set the archived command now the archived command specifies where the wall fires are going to be archived - and again you'll need to remember that these will constantly be produced so you need a way to manage them and to archive them off the system or to a different location so that they don't fill up the disk because these will constantly be produced so you need a way to set it to archive them ideally also if your replica is usually a different server you may want to train may have a need at some point to transfer those wall files to that separate server so just keep that in mind in terms of the log management portion of this let's look at maximal senders so you'll need to set up how many processes you want to be able to stream information now also this can also be these senders are also used for backup so you also want to take into account any backups that you're going to be conducting as well especially if you're streaming the wall so you just need to keep this at a sufficiently high level for this example 10 is more than enough the which is set at the default now I'm not going to be using replication slots in this example so I'm going to want to increase my wall keep segments up this assures that the master server retains a certain amount of wall segments so that if there's any network disconnect between the master and replicas it can it will keep enough of the wall file so that the replicant can catch up once the network is back online otherwise you're going to have to transfer the wall files to the replica server before it can catch up again with streaming replication and then lastly make sure that hot standby is on and that's in the standby server section you can see by default hots damn eyes on so I'm going to go ahead and write out that file now next we want to edit the host base access file so this configures the master to be able to accept connections for replication purposes so I'm going to go ahead and edit this location I'm going to put this configuration right here above so it uses it first before these other replication settings so you can set up replication settings a number of different ways the example I'm using you shouldn't use in production definitely because it's it's local and it's trusted so ideally you're going to want to setup password-based authentication ideally using hosts SSL however in this example I'm just going to do set up a user called rep user in trust authentication so that don't have to worry about passwords in this particular example but it's not in the scope of this tutorial to show how to set up your host based access file for a multi server setup but you'll have to look into how to do that okay so that's been set and now we want to go ahead and create that rep user that we referenced and also give him the replication role AHA that user already exists however in your system that user probably won't exist okay now we want to restart the main cluster okay that's been restarted now we want to do essentially the same thing with the replica cluster that we created now that is stopped we confirm that but if not you're gonna want to make sure that you stop this replica if you haven't already now I'm just going to run through configuration the configuration is identical between them for both the PostgreSQL and the host base access file so I'm gonna do that off camera and I'll be right back okay we are back the next step is to we want to remove all the replicas database files because essentially we're starting with a clean directory that we're going to restore the database and stream all the changes from the master to be able to replicate the data fresh now when the cluster was created at the very start this new replica cluster it did create an initial a database cluster so we're going to need to clean out those files there so I'm going to become the post-grad user using this command and we're going to remove all the files in the replicas cluster directory now make before you run this command make sure that you're targeting the correct directory or that you're on the correct server because otherwise this will blow away your entire database so be very careful ok so that has been removed and now we can run a PG based backup because again it's going to backup all the files and stream the wall changes to get ready to start syncing up the data now if you're using version 9 you should add this command to stream the wall files has the base backup is happening because it'll allow you to become in sync with the master sooner however version 10 this is already the default so you don't need to use this /r gives you an initial recovery dot-com file however we're going to specify ours down here so we don't need to include that switch but you're welcome to if if you find it easier I'm asking for progress so it's going to give me a progress report of how things are progressing during the backup procedure as well as I'm telling it to ignore a password now ideally you're going to be if this is a say a two server setup you're going to be connecting across the network using a password so you're not going to use the switch but for this tutorial I'm and since I'm on a local system where both clusters are on the same system I'm adding this W switch u means user and D means the directory into which this backup is going to be placing the files all right so let's go ahead and run this base backup okay so it backed up very quickly gave us a little bit of a progress and now we want to go ahead and set up our recovery dot com file in the directory that we created so you'll see a restore command that we have used in a previous tutorial where it's basically telling us where to find the archive log files for the replicas cluster in addition we're specifying some other configuration settings so first is recovery target timeline so we're specifying the latest that means if the replicas switches to different log timelines it's going to keep on following whatever the latest one is so if you promote a master or promote excuse me promote a replica to a new master there's a timeline switch on the wall files so other replicas that start following that as a master it's it's going to automatically change to that new timeline and keep in sync with the new master so usually you're going to want to do this the next option is standby mode which means allow read-only connections and allows you to do selects on the replica database the primary connection info is a string that you can use to connect to your master server so there is a configuration settings that specify the user a password file if it exists which host you're connecting to the port it's on some SS and some SSL settings and lastly there's the archive cleanup command again you don't need this but I found it useful because it helps when you're running a replica it helps keep the PG log archive directory clean while the replica is running it archives any wall files that that appear there that have already been applied because ideally you're archiving them from the master however your setup may be different and maybe you don't want this command but let's go ahead and create this file okay now we can go ahead and start up the replica cluster it's been started we can verify okay so we can see that there's a this is the log from the replica consistent recovery state reach database systems ready to accept read-only connections and it started streaming the wall from primary at that time line so we are now in sync with the master okay now that the replica is in sync let's go ahead and create a database and create a table and insert some data into it on the master server okay we created the database insert some data now let's see if the data has been replicated so we're going to connect using P SQL we're going to connect up to the test database we just created and we're going to select all the columns from the post table and we're specifying the clusters port sorry the replica clusters port and we can see all the data is there okay let's simulate a failure by stopping the main cluster and I'm actually going to look at the log files of the replicas and you can see that it's complaining because it could not connect to the primary server because it stops so let's promote the replica using this command again this is a Ubuntu specific but there are other distributions I know you can use PG underscore or CTL to do it but I'm going to go ahead and do the promotion we can look at the logs again and we can see that it received the promote request here and then it's going to select a new wall timeline because it's a new server archive recovery complete and database system is ready to accept connections so now we can try inserting some additional data this time we're inserting it on the replica because it's no longer in a read-only State it will accept those two inserts and now when we query it we'll see ok now has 4 rows who it has become the new master alright that's how you setup PostgreSQL streaming replication now we just set up a single replica you can set up as many as you would like as long as you adjust configuration settings so that the master can handle the load things like having enough wall sender's processes etc please let me know if you have any questions I hope that was helpful if you want the commands used in this tutorial be sure to visit the link in the description below if you want to receive additional content and tutorials please visit scaling Postgres comm thanks
Info
Channel: Scaling Postgres
Views: 41,629
Rating: 4.9536681 out of 5
Keywords: PostgreSQL, PostgreSQL Replication, PostgreSQL Physical Replication, PostgreSQL Streaming Replication
Id: NaPnYQBBdyU
Channel Id: undefined
Length: 20min 52sec (1252 seconds)
Published: Tue Mar 13 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.