Everything You Need to Know About #MySQL Group Replication - #MySQL Tutorial - #Percona Live 2017

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

today we'll be talking about group obligation I won't be doing most of the top nor will you're doing the most of the talk but before anything else let me introduce us both my name is wish Dwyers I'm the MySQL replication lead so I basically handled mostly everything related to the replication of MySQL and AJ and whatnot except for MySQL cluster and then Nuno here is is pleading part of my sub team on the group replication part of things so is one that handles most of the things related to group replication and reports to me as I said today we'll be talking mostly about group replication and this is the proposed set of topics for for this session so just briefly go through the background for replication at MySQL in general talk a little bit about the use cases for group application detail how you can deploy the MySQL group replication plug-in talk about its features talk about its performance and if we have the time go a little bit into the details about how the plug-in came to be or how the plug-in is structured architectural II within the MySQL MySQL archetype architectural context and then just conclude the session so background what we'll be talking today here is about is about replication replication in general and then more specifically about group replication but replication it's more it's something more or less like this we have some change that is performed in some server and then we need to replicate move propagate this change to to a different server in this case we we see here server a and then server B as the replicas it's pretty simple but the devil lies in the in the details so this is the traditional or the high-level overview on MySQL replication architecture those four asynchronous replication and for group replication if you will so we have a server that execute transactions these transactions or the changes of these transactions are captured in a in a log we call that we call it the binary log and then there's some communication framework to propagate the changes from this server to another server there is a persistent buffer on the on the replica which we call the read a log and eventually these changes are applied apply to that server and that server is brought in sync with the with the other server so the communication framework is really what is most different what differs the most for MySQL asynchronous replication in my scroll group replication write in a single application this communication framework is the MySQL replication protocol it's peer-to-peer is asynchronous and so on and so forth and in group replication we have this communication framework that relies on them on app access implementations to secure synchronization between servers to use some properties like total watering of messages safe delivery and all these things working together just a little bit under the history side my square replication asynchronous replication exists for a very long time since 3.23 natively inside the server so it's been a while since it's since it's around so it's pretty much battle tested it's pretty much rightly deployed and so on in 5/5 we introduced this semi singles replication plugin which basically means that it's a single replication which requires an extra communication round between the master and the slave so that the slave can inform the master that the changes that it has received at all have already been written to the relay log to its persistent buffer on on the slave it late December last year with the group replication release for 5-7 as a plugin as well so it's it's been out there since five seven seventeen and as of two weeks ago we released my squirrel 801 and it includes is the first MySQL 8 release that includes the plug-in as well so both 5 7 and and they don't have the plugin in their release releases so what is MySQL group replication it's a single multi primary updated replication plugin for my as well built with built-in automatic distributed recovery conflict detection and group membership this is a very long definition but basically it means that in addition to do data replication there are some other properties that the plug-in gives provides the server with like which servers exists in the group has a server failed and so on and so forth this is all managed automatically by the by the plugin so in the end it removes from the user perspective removes the need for handling server failover provides fault tolerance enables update everywhere well is it's doing conflict detection and resolution automates group reconfiguration so you can just remove a server and the group will detect this and it will automatically reconfigure itself so ultimately what this tries to provide is a base for doing for implementing highly available MySQL database service and yes this is the overview picture so we have client applications engaging servers in the group issuing transactions on the different servers and then you have some coordination to to check for conflicts and to commit or abort transactions that you commit transactions that don't complete and if they complete you basically resolve the conflict by rolling back the the transaction that is ordered later so basically the first committee wins and the second transaction that resolved after that one will just go back just a little bit of description of the roadmap we can see here that it all started a long time ago when we release five six back then and then we started working on changing the infrastructure the replication infrastructure on the MySQL server to be able to do this this kind of more advanced and pluggable replication frameworks so we took this opportunity to refactor to re-engineer to start building this infrastructure that would allow us to later on release MySQL Group replication as as a plugin in this process we release a lot of library labs releases we engaged a lot the community a lot we got so many feedback we adapted some of the that feedback and and change the product so it's slightly morphed from what it was in the beginning and what it came to be to what it came to be when we release it as as a ga as GA plug-in we added a lot of features but the most interesting thing as well is like I said earlier we took this opportunity to do some refactor and restructuring of of the MySQL replication code and we build infrastructure and in the end we can reuse these other bits and pieces that we we have you know build these bits and pieces or components that we're building and adding up to do something else something else like for instance there was some a couple of talks about right side base fertilization for asynchronous replication all these comes from this work that was done to enable replication plugin but also other components in the in the regular replication regular asynchronous replication framework and this month in 11th of April we also release this 801 as I said but we also release this in ODB cluster which is a framework that builds on top of group application to provide a simple user a simple user experience and easy to use system based on on top of of group application so it's easier to deploy easier to start easier to manage a group replication cluster and Frederic and Matt Lourdes demonstrated it at the beginning of this conference in a whole-day tutorial so use cases there's several use cases for for group application for instance elastic core applications given its ability to auto reconfigure out to manage itself given the fact that it has a group membership service built-in it's easy to add and remove servers from the group the group will reconfigure itself it will do automatic or incremental state transfers if need be if deployed in single primary mode it will automatically select the next primary if the primary goes away and so on and so forth how so for if you have if you're doing sharding you could probably take this approach to do to build highly available shards so each shard is a group and then you you basically route the queries to the shard that you want to route and while keeping the shard highly available because it's good and it's also an alternative an alternative set up for master-slave replication with the additional guarantees that i there are additional properties that i have enumerated earlier on right the group membership the total ordering of transactions within the group the guarantee that you don't lose transactions if servers fail and so on and so forth so all these things are provided by by group replication and that i think i think i should stop now and give you a chance to talk about it and you'll be the focusing on the on the features so Thank You Lewis for doing the introduction and let's go down now go to all we can deploy to publication so we have two modes of deploying duplication the first one is using a single primary on which we have the group but only one member is allowed to do rights and the other ones are what we call secondary weeks which are out backups and can obviously be updates but only one is able to write this is our default modes this is the closer that we have to classic a synchronous replication setup and this is why because these symbols within about it's easy for you to migrate from one to another and also it makes the complexity that is attached with multi master - so we don't only have one primary everything makes much much simpler and you can make your applications work this also brings automatic primary election so when the group is May is bootstrap the primary is elected and in the case of the primary fails then the group itself elects another primary so this is piss-ass out of failover from routine there is no way there is no need to have a DBA to code implement or do manual fell over this is built in on the system also you can see with which is a primary member by selecting a global status bar which is the globe application primary member it will display the member ID that is the primary currently then we have of course the multi primary mode on this mode each member can do rights it is made for will the I available setups and of course if you have multi primary then the things became a little more complex and you can have conflicts if you have part of the transactions happening on parallel in different servers touching the same row then you have conflicts if the factions are not conflicting but like the ones on the slide they are touching different roles then okay those can go in parallel and commit but if they touch the same row then the rule is the first one to commit is the one that wins and the second or more than the death second is either is needs to go back the one inconsistency this is one of the main difference between the multi primary and the single primary but you have you have motive H a within these modes and now we would all do a small walk to the features and the first one its automatic suited civil servant recovery so when you create the group it is okay it is everything is going but you want to add another server to a group the system itself it knows how to handle that and you join the member the member contacts the group it goes to recovering states and while it's on that state it is fetching the missing data that the group asks and the new serve doesn't and while it's doing that it is recovering and it does not allow writes on it so it's protected for that when it does catch up yes with it can start from scratch but it will take more time of course yes so we advise people to provisioning before and then just do a small recovery oh I like a few seconds if you start from the beginning to take more time no the one it is providing data it's still online but it needs to deal with the rights and also donating the data and then when it is over it became part of the group it is online and everything is normal again so this is there is no need to implement to code anything it's just join and the server and the group will take care of this every action that we have on the group so I'm a server joining or a server leaving it updates a membership the group always knows what are the members of the group that is tagged with a version that we call VIP so anytime that the server joint or leaves the UID is incremented and you can consequence on from schema tables which is the view ID that you have and see how the system is progressing so if you have a group running and a server crashed you were on the OOH ID number for now we are on the ID number five he joined again goes recovering goes to online and it's back again so the system knows how to windows they sent it just works but more important than this is that even when you use a multi master setup this is still my sequel so you don't need to need to change your application for this to work we are a Mexico plugin nothing is new we use I know dB you have the same API the same guarantees we did some optimizations 22b to make the system perform better but it's built in we don't need to care about that and also all the monitoring that with provides on replication is based on from schema tables but that already happens with normal application so again nothing new we're just extending what we already have so what is the outcome of this we are no alien component we share the same look and feel that the server if you know how to work with Mexico you not work on true publication it's out of the box for for already hitting users it's the same so no be change for new users that want to use replication they only need to learn my sequel not my sequel plus some external library so it's all built in a lot of other blocks all it works also we since we are the upstream we do support fuji tid so like we have now one five seven and on eight zero and on five six also we do support Jake Ivy's the only difference that is that we have here is that instead of having a yd for each server we have for the full group so the full group behaves like a single instance but this instance the group supports fault tolerance so when it will commit the yd that is used to drag that commit or manually log is the group name and it will take care of that and it will increment and if you do do inserts like this one one and two the gjv will be automatically assigned but since we are the upstream and if you want to inject some tid you can this is the command set Italy next widely that you want the number and it works so full support this also allows us to replicate from external server so imagine that you have your old master and you want to migrate for your old master to a group what is the best way or simple way to do it you set up your master set up a group and you replicate from the master to the group and that is supported GDI these are allow it and everything works of course there will be conflict detection because that data data stream from the master to the group needs to be validated that will not conflict with the data on the group and you can also do the other way around I want to work slaves to scale out the reads from the group so you can hook up slaves on the group and to supplicate from there so like if it was a single master slave again full support out of the box no changes needed but since we are on multi master environment or multi primary what will happen if we all use the same Auto increment value if we all have parallel transactions that have the same Auto increment seed or offset then tuple transactions will have the same Auto ID and they work they will go in through conflicts and this will be really really bad for your performance and for your application so to avoid that kind of situations what we do is that we establish that each member as its own offset which is a server ID you can customize of course what by default is serve ID and each strands I end each transaction or each member as the offset of 7 so this makes that any member will never conflict with each others even on parallel transactions that they do use the auto increment why did you pick up seven and not for instance the group sighs because if you all start with the group size of three and then it will go to four and you forget or the server will take too much time to update that value then you have a window with the conflict will happen so that why we establish the default of seven but if you guys are sure that you'll never have more than a given number like three you can set the tree and maximize document range for your case but the object the goal here is never have conflicts that we can avoid days I don't have the link here but there is a really blog post about this I will give the link at the last slide on why we explain why the seven is a magic name so there is math behind that you can later read the blog post then of course everyone now asks what about upgrades so how can I update the server or can I create a member of an app with a group this is this was made built-in with that in on our on our minds so if you have a group and you want to join someone with the same patch version like five seven fifteen and five seven twelve is allowed to join with any restrictions so they are the same major version it must work and it does work but if you have a HCA group that you can make now since two weeks ago and you want to add a five seven five seven member it will not be allow it and why because that member may not be able to understand the message from the eight zero actually we our protocol it does support that five seven knows how to talk eight zero on good application but they may be behavior change on 8 zero that five seven is not aware or is different so if you will out to us five seven or five six or eight zero nine to join when that kind of situation happened they could be behavior change that will make the data inconsistency on that server so by default we don't allow lower member joins to join with a bigger person version if you really want like on a panic mode situation my cluster is on I need to add someone to make this alive you can there is an option to first start to make the liveness available but you need to be aware that that's a compromise so I want to make it a life but that may have issues because of the different versions on the opposite case so we have a 5 7 group and you want to join an 8-0 member you can it is allow it but 8-0 member is by default read-only okay is not able to do right why because then we'll be on the same situation as before they'd 0 would send message that 5 7 would not understand with different behavior that 8 0 + 5 7 half sorry [Applause] I guess please five yes yes because this behaves like a single instance but it is a group with photos of course if you applicate you need to be the same careful that you need to have on a single instance but yes so whatever we support here thank you the five so we support everything that the single instance support so that that's the main goal or main trade-off of being the upstream yes so if it's one side Wabo on the other side also we have a full built-in stack made by us so we have a built-in communication engine which is based on paxos it is as already implemented compression it supports multi platform windows or linux whatever my sequel does support we do support it has an elite membership it has the sweetest agreement it has Warren by its master fashion pricing SSL and deeply white listing this comes out of everything out of the box we do not require any third-party software it's everything made by us we support everything also we do not require network multicast why because then will not work on cloud installations so we are do not that don't have that requirement so that they can run an application on any network you want our safety measure so like the reload read-only modes like I mentioned before when the server or a member is on recovery it is automatically with with only so no one can make writes also on the unlikely event is making some echo unlikely event of a member failure the really only mode is set on that member to avoid the consistency so that if you have a split rain it only really mode aside and no one can go there and make writes also that change is assigned with states exchange to error state that's why we have the red box there [Applause] and then usual questions so is this secure yes we have full stack secure connections we have encrypted connections doing that we do click every connection we have encryption between each member on the Paxos layer and we also have inclusion on the client connections on top of that we also have EP whitelisting so we can restrict which hosts can connect to each member if you set a default value which is automatic what we allow is that each on each post the private networks that are active on those are allowed to connect so if you have a Class C Network it will be allow it by default if you have automatic or it can express what want what you want to allow to connect to that so if you are by default public IP on that machine it will not be allow it so safe by default avoid intrusion of someone on our machines and then of course we have parallel buyer support this is a really simple change for us you don't need to change anything it is configured the same way as we do for novel application you just set the parallel type to logical clock which I commit order it must be on it's requiring formulas you need to preserve the commit ordered on the group and then you should speak the number of parallel workers and it does work out of the box perfectly if you were on Louie's session before or jean-francois this morning you know that there is a new mode on 8 0 which is right side base transaction dependencies this this was made on route application and now was back ported to normal application so even we on application do not set this special mode it's already there it's already used internally so we have right side paralyzation it's really fast there is almost no slave no slave slots no slave lag of course it depends on the workload but if you want to speed up the - this time you should set this up that will make the skull recovery even more faster and if you can if you world jean-francois session or if you see the East blog later today you'll see that you can have in sometimes 600 percent improvement in performance so that's a really really good result and also the new feature that we have on 8001 we now support a fashion save point on Google location this was missing now it's there there is no big thing to do to say here but now we support that and what are the requirements and some are some are for by design some are not so by design will require energy be storage engine y-you know DB actually what we require is a transactional engine but we need special hooks to allow what we call hybrid transactions and currently that is only implemented on an ATV but it's there out of the box you'll need to patch anything with this require primary key on every table this makes our life easy and performance greater and it's not a big change for the applications we do require growth of EPS identifiers turn on we do require binary log it will require money log row formats being sets and then there is the optimistic execution if you are using multi primary and good panel transactions there may be conflicts and due to that there may be roll backs also we have soft limits of nine servers or group why nine we do support more and but on our test on our promise testings we saw that and also later ten from three to nine there were no promised equation or it was too small so we support nine which is more than enough to the usual cases that customers do have [Applause] things that are forbidden when we use multi primary so we don't have serializable on multi primary mode that is mostly impossible so I will not go into details but if you want offline I can explain why that this is not possible we don't have cascading foreign keys support on multi primary and also we don't have binary log events shakes on though without we do have checksums on our communication channels so one can hit place the other then in a few warnings DDL so it's always the black sheep on the room so if you wish issue took a concurrent evl on a multi primary setup you may end up with problems so it be careful or issued you single primary so then we don't have issues and also the slight front plate statement does not take a group so you do not take global locks it take the locks on the member that is running but not on the actual group so if you make that assumption it may have these results at the ones you expect so this is asking that you need be aware when you use a multi primary setup and so how does it performs that's what it matters right so we here are our results the gray line it's the single is single server that it's the same result because it doesn't change and then you can see on blue the results for the single primary mode which is slightly slower than a single instance because we need to pay the photos trade-off but if you go from ten to nine servers is the same performance so we don't see the gradation with more photons added to the group which is really good and then if we go to the multiplayer mode what we call sustained and peak sustained it's the maximum load the system can handle without having slave lag so if you want to have the group almost in think then we are using the sustained throughput and you see that it starts a little bit faster than a single instance and then it degrades but it waits linearly so it's a sustained degradative relation and what you can estimate on your workload but if you have a scale like okay I have my usual workload but at 6:00 p.m. I have a really big workload like people going out of jobs and go into Facebook to say something then that will be at the case of the big workload you can see that the system is even better than a single instance but I need to be enforced at duplication is not a right scalability solution it is a shared nursing solution and you see here better right results because the results are being split to the members and since the inter live with each others we have more room to make mark on it but it is it is not the goal of duplication to scale out the right it does but it is because the system is well designed and well implemented but that is not the goal and you'll never have really huge scale out because every single publication it does happen but it's mostly consequence not the goal and this is what I just said and now going to the architecture so how did we design and did we split the components that make the publication so this is built on top of proven technology this is based on Mexico's application we do reuse the most of the components and we provide the multi primary approach to application or single primary with fault tolerance this is based on the latest implementation we have company components or components and we make ups between them it is interface event water so since we make the right side pulsation on application it was easy to make it also useful on a simplification because use API is for that it is decoupled from the server core we are a plug-in and we usually resist a sister as listener for server events and we also use the capture pussy's that already exists for semi sink rather using that and we couple everything together with a new network infrastructure which allows us to make more guarantees and support the failures so if you have a insight on top of inside of each group on each member you will see that we have the Mexico cyber the API our plugin the communication API and then the communication engine if you guys know who publication since the beginning from two years ago we did start with chorusing and then on the middle no we need more as something better than this and we did change crossing tour passes incrementation without a nightmare because we have really good api defined so this structure that we have seen stay zero they pay off just in the middle of the off development and it will pay even more going forward and if you have a zoom into each component then you can see that we have the server here from a schema tables that we use for monitoring then we have our AP is what we call lifecycle capture applier so we intercept transactions going on to get the right sides we intercept the capture to get the data that they are trying to change then on other members we apply those strange so you need to have an apply api and that then will be our plugin components that we call capture conflict and will apply in recovery on below that we have how we want our application protocol that to use then we have the coop communication protocol component communication api the binding the engine and then the network and these two components was what allows us to avoid sourcing very easily and then of course we have the storage engine which is an OTB so to sum up what publication is it is a crowd friendly solution it is ray technology for deployments where we need elasticity and such as code base infrastructure we need to add service remove servers that is allow it from day zero it's built in it is well integrated with the server so we support our API is GTI be always application from a schema tables if you have your own monitoring solution for multiplication you can easily use it for group publication since we use the same interfaces also interrupts out autonomic and operations to be friendly with the DBAs so it is self-healing they don't need to have mono fell over or to implement other things that need to what what will happen on my master fails the group itself will take care of that it provides fault tolerance autofill over also enables multi primary plates everywhere and also it provides what we say finally a dependable Mexico service that was never available out of the box you could equal use the multiple server add your own pieces and make that but now we have that available out of the box it was you a on the ending of last year on five seven seventeen and is now available also on eight zero one but this is only the first step and not our goal so what we want is to give you a rich mature HH solution on which through publication is one of the base components or where is my pointer it went out of battery okay really under no I guess we have here publication on these three members groups which are the core of the design then it's each group can be extended with slaves to provide read scale out and each one will be a radical set and then on top of that we have router and applications and you can make charts and can make whatever you want and this will be available or of the box not yet but we are working on that and we did release replication block on the end of last year the weeks ago we released GA Mexico's shale a magical powder this is the first type for this picture and we will release more steps on this periodically so soon will can expect more news about this and will have eventually a Mexico HEA out-of-the-box solution no can I just say one thing yes please so this is the this is the roadmap and the strategy our strategy if you will and this is our vision it's more of what I wanted to say and this is all about what I also mentioned earlier about the energy big clusters and and having native HEA in the MySQL server being able to deploy it in a simple manner easy to use manner you know at it's at the heart of this of each replica set there's a group of MySQL replication servers and then you have you know hae reads scalability then eventually we will get right scalability through sharding built into all this picture and so on and so forth the interesting and exciting thing to me is that this I think this very same slide or some variation of this slide was presented I think it was a year ago or a year and a half ago a year ago or a year and a half ago and as as by Thomas at ities keynote at Oracle OpenWorld and the exciting part for me is that we see this being executed and delivered in a very interesting and fast pace so this is for me as an engineer and the person works daily on this kind of project is really really exciting times for the end user I guess it's also very exciting to see these all of these things being added together the key point on all this is to provide a better user experience to provide a better story all together so all these components are tested together all these components are are thought through together and all these components are together originally we're doing this in steps right originally we delivered the group application plug-in in 5/7 now two weeks ago we'll release the inner do a cluster tool set around group replication but already made some some reassurance or some assurance that all these things were tested together and should work very well together so this is I guess what I'm trying to say is that this is the vision and we're executing on it as you can see and this is really exciting times as an engineer working on this right so yeah and that's pretty much it do we have another slide you have one more slide so before that if you want to take a look and feel about another cluster Fredrik made the tutorial on Monday of course you cannot back in time but you can go to this blog and see and get the slides and try to do it at home or just read it and this and you understand on how then it because it's easy and it's it really made easy to work from the beginning and then just to finish so you can grab our packages so publication is GA and because they also gah just go to the usual well and get it we have our documentation in place and also the last one is our blog go there get the new news our fancy blog posts and they can go once and give us our feedback and that's it thank you thank you so questions yes we do have that not on the server but on Matt you can correct me on the shell right we have that checkmates online which under the cluster yes there's no but for a human parasite there's room for improvement there yeah but if you mentioned runtime checks we have runtime checks yes what Simon says is that he wants runtime checks at the beginning when you issue the DDL and you prevent it if it's not complying [Music] yeah okay there was their hand here somewhere yes please the groups as a fairly tactical team and once it detects that the primer is down if you like the new primary you need a majority to be alive to make a collection and everyone will make the same to manipulation now that the promotion policy is based on it's very simple nowadays it's just based on the UID which is let's say the smaller lexicographically your ID of the server's remaining servers they will all sort them and take the smaller one and just be better that one up there's no there's nothing like a weight or whatever right there's no more advanced or complex policy to actually drive the election process yes not yet we are we are plans for that but this is our force release we are focusing on local area network but yes we have plans for that this way yes

Info

Channel: Percona

Views: 8,954

Rating: 5 out of 5

Keywords: group replication in mysql, group replication in mysql 5.7, mysql tutorial, database tutorial, percona tutorial, percona server for mysql, mysql server performance tuning, mysql replication, mysql replication tutorial, mysql master slave replication, mysql master master replication, replication mysql, replication mysql database, mysql group replication, mysql group replication step by step, mysql replication types, mysql replication master master, mysql replication workbench

Id: IfZK-Up03Mw

Channel Id: undefined

Length: 45min 48sec (2748 seconds)

Published: Wed Nov 15 2017