PGLogical the logical replication for Postgresql

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

so hello my name is patron Enoch I work at second quadrant and I want to talk a bit about logical application in Postgres with focus on tool that I'm developing that's called ecological first a bit about me as I said I work in second quadrant this was Chris developer and consultant I developed quite a few patches over last decade for Postgres and I also in my free time Co Co maintained Ichiban serve and some other technologies related to Postgres you can mail me or find me on github I don't really do social network much so so about the today's topic so logical application why do we bond logical application in the first place so Postgres already has something for replicating data its physical streaming replication that basically synchronizes the data in real time to different server or different virtual machine or whatever you want but it has some limitations because the way it works is that it basically creates exact bit copy of the master on the on your standby machine and that brings limitations like the table nodes have to be exactly the same you can't create additional indexes tables you can't have different security access defined you can't do any kind of data transformations on this and you can't duplicate confession because Postgres can only read the data of the same version of that produced it so what we want is something that lifts these limitations right you sometimes wanted to replicate just part of your database or you sometimes want to replicate multiple databases into single target or you want to online embrace you want to do you want to sometimes have the data replicated to analytical database so that you can add more indexes so that your queries are faster there you want to create temp tables things like that so that's why we actually started doing this logical replication thing and what by you should actually want to the logical application so before I go into details let me start with some history about how chico application in Postgres evolved very quickly so logical implication is obviously not new concept even in not in database world and not even in PostgreSQL world there are solutions that exist it for more than a decade like Linda's day or slowly but those are or trigger-based which brings many limitations with that as well a you have some external application that is to run outside of Postgres you have to monitor it separately you have to make sure that it starts correctly when the PostgreSQL starts and things like that it also has to bright everything into its own queue which means that everything you write on your master you'll write again into the cube which can means that your i/o doubles basically and since they all usually use tables as accuse it means that you're right the headlock also doubles figures you have to write everything into your table and then into the cube table so but they work reasonably fine but again lots of limitations the other problem bit specifically these two that I mentioned total distance alone is that they are not really well maintained anymore the feature last feature your ease of each of these was several years ago so that for me personally a bit worrying as well so we in second quadrant started thinking about doing something about this and bringing something better and more integrated and eventually I get it into progress so I assumed that everybody was at the keynote where Simon talked about Postgres BDR so all this that I will talk about actually has roots in that same project as I said that or as Simon actually said that currently runs on nine nine four with nine six coming soon and it does speak more than just logical application it does multi-master with transferring ETL and stuff like that but that's not the topic of this talk the topic of this talk is mostly pea geological which is basically simplified version of that which doesn't require any kind of backing of Postgres it's just simple extension that you just install and it will into Postgres and it will work directly from there it works on anything on 94 and higher its main purpose is to do one wave replication but multi-master although it can actually replicate like through in bi-directional manner that can be done but that's not the main main reason why it exists and the main goal for this was to replace trigger based solutions like to long distance learning and also to serve as a base or code base for what they'll be in PostgreSQL core as a replication solution so a bit about architecture the architecture of this is basically evolution of this physical streaming application so when you do physical streaming application you basically have some applications that talk that can write to your master database where there is executed that executes the sequel statements and then reads and writes from heap and also writes to the transactional log the rights from the transactional log are then read by something that we call all sender that's special process that runs within the your PostgreSQL instance and this process sends it to another machine standby where there is opposite receiver that then writes it into write the headlock and from this right the headlock the data are synced back to the database and then executor can read from there let's relatively simple process and that's how current replication in Postgres works the logical replication basically builds on top of this and by adding some additional api's into this sender process where which do two things one thing is that it decodes the King the transactional log from binary data like change these bits in this file into these bits to actual logical changes which means change this row in this table to these values and then it exports some API like low-level C API which can be used by something that's called Oh to plug in to generate its own data stream based on these logical changes and that's the base technology behind the logical application then obviously you want to implement some remote site that can read these this stream of changes and apply them to the database the apply side in both in geological and whatever will be in PostgreSQL then just runs some applied process that looks mostly like another client to your Postgres database except that it uses local ID is to achieve better performance in writing and then since this basically writes not just the transaction log but writes directly the changes to the database this subscriber database can then be just normal Postgres instance which doesn't really have any other limitations it can be master for another standby for a physical replica it can be provider for another logical subscriber you can do writes on it you can do you can have different structures on it and things like that so it's just basically moving the logical changes and not the physical ones so what pacheco physiology could ask is basically implement this API that I was talking about and the worker on the other side and that's all the magic that is rest of it is standard Postgres streaming replication architecture unmodified basically so the goals for the project of PG ojika was to provide a selective application so that you can replicate just part of your database be a way to do online upgrades so replicating between different major versions of Postgres mainly from lower version to higher version even though the other way is also supported but it's not the main goal of that and also have some kind of basic data transformations and basically allow you to do data integrations stream multiple databases into single database or have shared few tables where you have one right master and then for few tables like configuration tables that are shared across many databases and then you replicate those few tables from single points to all the databases and then you can ensure that everything is synchronized without having to do it in application and things like that and what we also wanted to have and that's something that the other logical application solutions don't have is support for synchronous apply so that you can do synchronous replication if you need to that's one of the reasons why it all works very similar as physical application because be basically because we built it on top of the same architecture as physical application we get this optional synchronous replication for free and it works same way as in normal physical application so it's back transaction pair you can specify which notes should be synchronous and stuff like that we also wanted to have simple installation there some talks about that so it just installs this extension runs part of Postgres all of the configuration is inside the database there are no config files that would tell you where all the nodes are and stuff like that all these Postgres tables and it starts automatically with Postgres it shuts down all to articulate Postgres it's monitored by our sequel from within Postgres there is no external stuff needed it uses this logical it uses something that we collage achill decoding which is the API that I described quickly where it reads the transaction log and changes it into logical changes that's this feature was added in Postgres 9.4 which is why we only work on nine point four entire and also it executes figures if you set enabled as replicas which means that you can have some simple data transformations on the subscriber side this way where because the data actually will go to the trigger and the trigger can do some changes to it it has some limitations that I will talk a bit about but the basic transformations or basic triggers work obviously it doesn't trigger a normal stickers like if you just create a new trigger and it's not triggered because that would mean that if you had some post-processing kinney or insert and then it would replicate then you would get the post processing again which is not wanted but there is like special type of triggers that you can Marcus enabled replica by auto trigger and those will be normally executed it's very similar if you ever used slow nail on this that they do the same thing basically so how do you use this thing as I said you just install extension you also need to load it by APIs Chris Kuehl configure but the base basic operation is that you install extension you define something that we call node note is something that this basically name and connection string the connection string that you specify here is the connection string to that note so the my provider note in this case on the provider should listen on the host 10 10 1 1 and you should execute this in the database foo basically and then subscribe or you again create the subscriber note and then create subscription where you name the subscription and specify from where it should replicate the reason why you specify connection strings here is that the initial data synchronization that can happen then for the tables or even not just the initial one but if you add tables later the synchronization needs to know connections to the node itself so basically the subscriber needs to connect to the substitute to itself to be able to actually copy the data and since Postgres doesn't internally provide any kind of API to get local connection string you have to provide it as a user so that's why we actually require to provide the connection string the subscription name is quite important for monitoring later because that's what application name will be visible in the PG stats replication and that's where you specify in synchronous 10 by names if you want synchronous replication and things like that so the the subscription thing is very important for monitoring you should always have different name for each subscription so how do we achieve this selective replication is something that we call a replication sets which is term that we borrowed from Sony I believe and replication set is basically just group that defines tables or the group of tables that define the replication itself it doesn't have any special meaning it's basically just named group so that you can have different subsets of tables that you replicate to different places nothing else important thing to note is that unlike with physical application bass doesn't replicate anything until you actually add it to the replication set and if you define some table into three replication sets and then you subscribe with some subscription to all those three replication sets you don't have to worry about data being received multiple times the replication sets are like the replication still happens on the table level the replication set is really used just as a grouping and then union of those groups are are taken for replication so each change always travels only once we also provide some filtering on this replication set level you can filter by operation so you can say that by default all those replication sets replicate everything but you can specify that that the specific replication set shouldn't replicate for example delete which can be useful for example if you want to roll up your data into take the right house or all up database and you want to be able to delete your operational data in your transactional database but don't want those deletes to propagate you or a lot of database then you just say don't replicate deletes and it will just keep to skip the deletion and replicate on the inserts and updates and you will have all their historical data and can remove to a new data if you want to yes so the question is if if the target database can have different tables than the ones that are replicated yes you can the you can have additional tables you can have you can even have additional columns in the tables that you are replicating and things like that so yes you definitely can have additional schemas yes even the replicated tables are still writable so what happens is that we don't actually limit you in times in terms of like what's writable even if you replicate the table to different node on the different node this table will still be writable and the geological actually has mechanism to detect if the change was made locally or was made by replication and then if the local change conflicts with what's coming from replication there are some conflict resolution methods that you can specify by configuration I don't talk about that much here but there is like you can say that by default is basically the replication always wins so if your application conflicts with the local changes the changes from replication will will always be applied and but that is possible to configure it in a way that local change is always been or that the replication stops on conflict and the new and it's left for user to the sulfate or that we will pick the latest change by commit time stamp and so you can choose this even if you write locally that how it will behave when it detects some rights that are not compatible with what's coming from replication okay so back to the filtering there are some predefined application set default default insert only and DDL sequel now video sequel I will talk about when I speak about digital application but the default is basically just replication so that replicates everything it's just pretty fine for you it doesn't have any really special meaning the default insert or name is replication set that filters updates and deletes and only the since I sent arcades the reason for adding the inside only replication set is a pretty faint one is that in current version of physiological the limitation is that if you want to update if you want to replicate updates or deletes your table has to have primary key Baker's we only are able to look up the rows by primary key or unique index which is the same thing with no pal obviously so we provide this replication set this replication set for insert only tables that don't have primary key you can put it in this replication set and they also replicate just to insert send and track it so how do I table to actual application we provide interface that's called replication set of table you provide the name of the set you provide the name of the relation and you specify if all the subscribers should immediately synchronize the data or not it's also possible to resynchronize the table later there is auto subscript from the individual subscribers so for example if you already have the data for the table on some subscribers you would say that you don't want to synchronize everything immediately and then would recognize on every subscriber as you need to buy a separate interface one thing is that unlike the initial synchronization of the data here when you add the table afterwards like the synchronization works slightly different yep if you define the replication sets before you subscribe or after you subscribe if you if you have some replications that's defined before you subscribe we can synchronize the structure for you as well if you at the tables later we can synchronize this actually it so you have to have the table actually already existing on the subscriber if you want it to be replicated that's some limitation that will probably not exist for long but for now it's still there we also support replication of sequences which has similar interface except that it's not called replication setup table about replication set up sequence you just add sequence there and that's it they are replicated somewhat differently than tables though because this infrastructure in Postgres that's called logical decoding doesn't support decoding of single sequence changes so we had to invent bit different way of doing that we are doing it very similarly to clone this status for example so you basically read all the sequences that are replicated say okay the sequences is changing at this rate so we should tell the subscriber to be this much in front so that it never falls behind the master and that's it so then you get some interesting artifacts for this and that is basically that subscriber is basically always in front of the provider in terms of the last value go to sequence so the provider can have value something like 1000 - while the subscriber has already something like 2000 yes there is question - valium doesn't really play into this we work on the bit lower level than that we just care about the last value if you cashed more the algorithm is is like dynamic so if we detect that you are consuming the sequence very fast it will increase the buffer and we will also increase the rate at which we update if you are not updating very fast it will make smaller buffer and update slower so like cache value doesn't really play into this at all you can use whatever you want there yes bad thing is you basically have triggers on the subscriber and then you can do then you can transform the data as they are incoming so you define inside trigger let's replicas enabled and then if insight comes the trigger will be executed and you can do something with it there are caveats with that and so current the thing is that obviously if you for example if you're inside trigger for example inserts two completely different table and that's like it's like instead of trigger or before triggered at it and smell then you will have no data in the target table so if the update comes the update doesn't find any row and then the update trigger would be executed so there are some caveats with this it's something that we actually want to eventually provide some custom figures that would be like not Postgres trigger but that you would specify function that's supposed to be called on incoming data and then you can like do the transformations on updates as well but currently the transformations are limited mostly to insert speakers if you do if you like move the data somewhere else then the updates will not work properly how do you monitor this it's since it's based on the streaming replication it's basically the same as monitoring the physical application meaning that PG start replication view and has the values of how far the subscribe what's its name and all this information that you would normally expect for physical application the good part about this is that if you have monitoring solution that supports physical application it doesn't really need any changes to support this as well you will automatically see c2p geological replication there as well there is one thing to note is that with logical replication in general you always want to monitor your application slot bay course and that even if you use application slots with physical application you should always monitor them because the logic because the replication slots will keep your transaction work forever if you disconnect your subscriber and that's true for physical application as well as logical one and which means that your disk will eventually run out of space if your replication is not happening and you still have defined the slot so you should always remember remember to monitor the replication source if you are using them and logic your application always uses them we also provide some additional monitoring functions we have show subscription status which basically shows the status of the replication worker on the subscriber meaning that it shows that if it's running what stage it is in in information like that it's very similar to the bow receiver status that's in 96 I think I believe and also we provide additional information about the status of the initial copy of the tables that you are they to do the replication because the initial data copy can run for a while so there is sense to monitor how what state they are in basically you can say you can see that there that the table is either being created or being synchronized in the terms of data or that the constraints are being created or that it's catching up with the main worker or that it's ready and replicate it there was an another question so the question is what how it behaves when you add the table up when the replication is already running yeah so we basically use copy standard Postgres copy we copy from one from the sir from the provider to the subscriber we create basically snapshot from which people sum from which we copy the data let's run in single transaction so you see it once it copied and then the table is being synchronized with the rest of the replication and once that happens the main replication takes over so yes it's not completely consistent while it's being added that's why we actually added this monitoring function because this monitoring function will show you all the steps that happen and once this monitoring function the show subscription table says synchronized it means that it's synchronized with the rest of the replication stream and all the data are like transactionally and isolation wise consistent with its rest of the database yes yes exactly okay so next thing is DDL replication and here we are slowly getting two caveats we don't do transparent DDL of replication of like what you execute will be automatically synchronized vdr does that but the geological doesn't do that yet we can synchronize the initial schema if you choose to but we can only synchronize the whole schema or nothing again that's current limitation that purple will be listed at some point but that's how it is we provide the function where you can say replicate me this statement to the all the subscribers or specific subscribers and that basically just you use this as if you want to create a table or altar table you just wrap that command into this function and it will just execute it locally and put it into application stream so it's executed at the right moment remotely as well and this is where the DDL sequel replication set predefined one comes into play because it's the default one that will be used [Music] the reason there is when you are subscribing to provide there you can specify what the replications that you want to subscribe right so if you have some video statement that should only affect some subscribers you can specify that this replication statement should only affects this application set and only the subscribers that are subscribed to those sets will execute that statement and the other subscribers will just ignore it so we basically allow you to filter the video statements manually or kinda manually if you want to and that's it about video application I guess there is one limitation again for this and that's creating the ex concurrently because in PostgreSQL creating the XCOM currently cannot run inside transaction and again in PostgreSQL if you execute function you are always inside transaction so you can't basically use the create index concurrently with this function and you need to execute it manually if you want to run it a bit about performance this is just standard PG bench on some AWS machine that might correlate it shows the comparison of some standard PG bench run between streaming replication physiological slowly and learn list it we are about five times faster than five times faster than the trigger based replication solutions but we are bit slower than streaming application although this is very dependent on your access access pattern mainly the right patterns be course and if you have lot of updates and lot of vacuum or if you do in general a lot of right ahead log writing like you have many indexes and things like that it can be actually faster than then streaming application because we don't actually replicate or the bloat that you generate because we only replicate the changes to the tables so there are use cases where we are actually faster than the physical application with the B geological there are also use cases where we are much slower then what shows will shown there which is for example if you have really really large transactions that change millions of rows yes yep I think that's artifact of the benchmark to be honest yeah because I like I mean you can also see that there is like maximum of 10,000 TPS or something like that or eight actually I do have custom benchmarks that do five times of that so I think it's actually artifact of the benchmark I actually didn't do this benchmark myself so I'm not really sure why it is like that but we definitely can do more transactions than sexually shortly again this really depends a lot on your workload the main the main purpose of this was to verify that we are faster than the degree based replication mechanism so to be honest but yeah to the caveat where we are really slow is when you have big transaction that changes out of take a lot of ropes because most of the logical or illogical replication solutions that I know of and not just in Postgres but in other databases as well our transactional based meaning that they will basically load one transaction move it over to the other side and apply the transaction and so if you have transaction that changes million of rows all those smaller transactions that were written behind it will have to wait for this big chunk of the day that would be dunce to be applied meaning that you will have some increased like time for for the period of time when this time section is being applied and then that's where like the TPS can go very fast doubt in terms of replication obviously the actual dps on the database will not be really effective sadly it still doesn't play very well with physical application so the problem there is that again logical decoding in the current releases of Postgres can't follow the promotion of the standby to master which means you can't really use the standard HMA totes with this and you used to either have to manually switch the replication over to the next to the other database we do provide some interfaces for that we actually allow you to change connection string to from where the replication comes from and stuff like that but it doesn't play very well automatically yet it also means that all the alternative is obviously to use just logical replication for h.a purposes as well but but there is not that much to link about for that yet so the failover part is is is slightly complicated right now and there are some patches to address this issue into Postgres time they were actually already for Postgres nine but they came bit late in the release cycle so it didn't get in but there are some patches in current commit fest for Postgres ten so hopefully that will be fixed there and the other limitation is that all the operations with p00 actually requires per user so if you want to create node or great subscription or add nodes at tables to the replication set and stuff like that you have to be super user it's not really delegate able although you can obviously create like you can create some security define the functions and stuff like that but you you can't grant any kind of roles to the users to do this do user that administers this has to be space let me tell you also something about future we will be releasing the geological 2.0 in a couple of weeks let's add a couple of interesting features one is cone filtering meaning that currently the table on the subscriber side has to have same columns that the provider has it can has addition it can have additional columns and then with some default values and those will be then added as a default values but it can't have fewer columns because if it had fewer columns and you did the replication and we don't really know if that's by accident and you are losing data or if you actually wanted that so what we added is another parameter to replication setup table where you can actually specify which columns should be replicated and then that way you can have 500 columns and only hundred of them replicated if you want to the only column like you can specify what columns you want the only columns that are required are the replica identity columns which is basically the primary key so primary key columns or have to always be replicated because of the rule cups but any other columns can be removed as you please and the table on the subscriber then doesn't have to have the additional columns at all this this will actually make it faster if you use it because it removes some work for for the columns that are not on it not really yeah yes question oh yeah okay it is yes the I said in the beginning that we basically use the same method as physical application so you just said synchronous commit remote apply or something like that or remote right and it will work automatically and it's per transaction just same way as physical application so that works fine yes there is a interface called replication set at all tables in schema or something like that and you can specified and she actually schema arrays not just thing yeah you can also the thing is that also the relation the relation parameter here it doesn't have to be name it can be OID so you can do actually join against eg class or something like that and and execute this function on like more complex filters that you want yes well there is no interface to at all all the tables as I said you can do join against PG class so that way you could add everything but you have to write the query yourself or you can specify the list of the schemas that we also add everything the main reason for this is those tables with all primary keys because you have to think about that a bit if you have some tables with primary keys what to do with them if you add them if you are okay with replicating just inserts or if what to do else with them so or if you need to add primary key maybe so we didn't want to make it completely super easy so that you don't have to think about it at all because you actually do and so the other feature for the PG logical 2.0 and that one actually add some overhead is row based filtering on the provider side where you basically specify expression which is just similar to very close or check constraint where you say only replicate rows where some values in these columns equal something or are smaller than something and stuff like that it this index is basically of the expression itself is basically same as what you would put inside check constrain it has the same limitations which means that we do allow volatile functions but you should be careful with them for we don't allow sub select since things like that and one thing to keep in mind is that this expression is actually evaluated during replication so you if you say only replicate rows where user ID equals user current user then the current user is your replication user not to the user that actually did the changes to the table for example and also if you say replicate everything where timestamp is smaller than now then it's replicates everything because now is when the replication happens not went right happen no the thing is that basically only the data that match this filter will be filtered out and if it's doesn't matter if it's update or insert so like if you do update or which doesn't match this expression lab date will not be replicated at all so and only the rows that that are affected and much this expression will be will be updated and so the stuff that doesn't match the row filter will not be sent to the remote site at all this is yeah this is all based on the transaction log file so and this is executed only on the provider so it if you want it to do more complex filtering you would have to do it on the subscriber using those triggers that's still possible but lot of people wanted to replicate small part of table that at least our customers we have quite a few of those that want to just replicate small part of the table and well don't want to send the data over network because network is the limiting factor so they want to execute that on the provider so that's it's for that vehicle yes you can have as many as you want it's multi - multi you can have even two between the same databases some okay so some other plans after 2.0 we want to add purely logical failover so that you don't have to use physical replication even for H a we want to improve those transformations so that because as I said triggers have some limitations we want to add PostgreSQL Postgres Excel support so that you can string the changes into your Charlotte database or if you if your share the databases Excel of course we want to paralyze some of the staff pickers of performance and we want to add transfer in the other application and obviously that is always performance improvements I expect that the Excel stuff will happen probably in the beginning of next year the rest PLC and while I have a couple minutes left I want to also talk about Postgres ten because most of this work is basis for what we are trying to get into Postgres ten I am currently working on a batch that basically adds some basic logical replication to in core there will not be the wrongful drink and some of the other advanced features that we are adding now but the basic replication should be in progress then and that's in couldn't commit fest I'm working on it for past couple of months some people that are not here are doing reviews the unrest around and Peter isn't drought so looks like August and should have the basic feature built in from from the start and that's it from me you can if you have some questions about either videography geological there is mailing list there is also some support channel at info at second code elcome and we have github link and some website about with the documentation and all that stuff so if you want to check questions well aggregations that was again meant as you write the aggregation yourself as a trigger yeah because there there is possibility that we might add like some expressions later that so that you can do the aggregations directly by a visual check of it up you have to be writing the code but currently diggers are the way well that depends on what you are actually want to what to do right I mean that really depends on the use case yes so VD are was the original project that like implemented all the infrastructure that's needed for this and but geological but the VD are I mean the thing is that video itself most made for multi master and there are lots of assumptions there in the architecture of EDR that are specific to multi master but that's actually not what most people need most people need to move data from some pipes to some parts maybe do some transformations maybe move data from multiple places to same place and things like that and so we wanted to make something that is simpler than VD are easier to maintain easier to install and have smaller code base and doesn't have all these assumptions in the code and also we wanted something that can be then easily relatively easily and put back into Postgres core and so that's why we basically simplified vdr so this is you can think about this as like a simplification of the initial BDR concept yes so the question is about if we plan to support other targets for the replication data in Postgres and not for PG logical but this logical decoding infrastructure that I talked about that basically basis for this on the provider side that's pluggable so you can write you can have like C plugins or extensions to Postgres that transform data and send it somewhere else and since you set about messaging systems there is actually already plug-in called bottled-water that sends this data to Kafka and it uses the same infrastructure that was developed as part of EDR and part of this that's in Postgres now and so you can use for example that there is also some plugins that just spit out a JSON format it changes and stuff like that so then you can use that for integration so not per se by this project but we thought about that so we made it pluggable so actually other people actually brought already some other integrations sorry this is like how the secrecy application works same as with physical application it uses the exactly the I mean the the the mechanism used there it's exactly the same so we actually don't even wrote the synchronous replication code we just raised the one that this is in Postgres because we use the same underlying architecture it will wait forever is unless you specify some other standbys that should be used instead cascading works yeah I think I'm out of time so if you have more questions then please find me the hole or something like that

Info

Channel: Citus Data

Views: 6,249

Rating: 4.7818184 out of 5

Keywords:

Id: ttWhmbTOokA

Channel Id: undefined

Length: 49min 34sec (2974 seconds)

Published: Mon Dec 19 2016