Postgres on Kubernetes

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

welcome everybody i'm josh burkus i work for red hat my goal is to make deploying postgresql on kubernetes or frankly to make deploying postgresql in any form boring that is it shouldn't take you shouldn't be boasting about deploying postgresql on your internal cloud or on your public cloud because it should be something that everyone does with a single command now is what boring comes from is if you saw matt essay posted an article a couple of days ago um talking about how despite being 27 years old at this point i think postgresql is like one of the hot hip databases to use with all of your new technology tools and why this is one of the reasons why people like using postgresql even with new languages and new servers and new technologies is that it is boring and what he means by boring is it is not something that you have to think about as in you deploy postgres and then you save your data to postgres and a postgres says your data is saved it's saved um and a postgres says your data is replicated it's replicated and you don't really have to think about whether that's true or not or about sharding strategies or about um all kinds of other stuff that you have to with some of the more interesting scale out databases now um it's a good statement it's not hip it it's not hip it takes a bit to set up but after you're done you have a reliable workhorse well so what i want to do is i want to actually take out one part of that statement it takes a bit to set up and that's what this project is about and kubernetes is a big help with us so i'm going to explain a few things here i'm going to explain the petroni project and the tool in general for anybody who hasn't seen one of these one of the earlier presentations for earlier versions of the tool um explain what we changed recently for anybody who has seen my previous presentations um and then actually show you the sort of kubernetes native version of it and demo that as well as a brief demo of the zlando postgres operator and then talk a little bit about another postgres operator now for anybody who has seen you know this before or read any of my stuff about petronia et cetera what's changed since like the last kubecon is a lot tighter kubernetes integration um in order to just rely more on kubernetes for all of the infrastructure things and i'll show you that the second thing is that we have an operator now which is fairly alpha but there's an operator out there for anybody familiar with the operator pattern and again i will show you that and then the third thing of course is that petroni kubernetes etc is now being used in production and in a couple of different places um and so as a result we've learned a lot by having it deployed in production and relied on which improvements have gone into petrone so some terminology so postgres postgresql that's the database we're talking about um the you know one that's been around again you know since the 80s um and yet continues to add new things uh petroni is a high availability demon service that runs inside a container explain a little bit more about that in spilo um is a distribution that wraps petronium postgresql and some other common utilities in a container to be your container of postgres goodness um so you'll see those words again in this presentation and that's what the three different sort of things are so particularly going to talk a lot about patrony which is the high availability demon so this does a few things you run in your container as pid-1 it controls whether or not postgres is running in that container it automates failover and replication for your cluster of whatever defined size and it also supplies a management api um i for the things that you can't do through the normal postgres interface now question of you know what's the idea what's the design behind this what are the sort of specifically what do we think why we're doing patrony and not using some other systems and you know what our goals and there's three basic ones so we're looking for simplicity um usability and availability and what i mean by simplicity is we're really trying to the majority people right us people who are post-wars geeks we think of postgres is this huge constellation of services you know that does all these different things and you've got foreign data wrappers and you've got monitoring tools you've got all these things for us you know having postgres be big and complex is actually a virtue for everyone else postgres is one thing it's an endpoint and they want that to be one endpoint and not 16 different things um and so our goal partly with the petronium spilo is to make postgres one end point um and not 16 different things um to in you know rely on kubernetes as much as possible so that the petroni you know installed does not have to be complicated that instead we're simply relying on the tools that ship with or tools that we can enable with kubernetes and to supply a set of defaults that are what we believe just work for most users instead of requiring anybody to go through any kind of a complicated configuration for a simple database that's going to support say a web application for usability we've got a couple things one is there are some things in terms of controlling postgresql that you can't do through the port 5432 sql interface and that includes things like changing security settings um starting and stopping initializing replication etc and so we supply port access to that with a restful interface uh some configuration and plug-ins in order to support a variety of cloud environments like for example there's a whole set of plugins for different backup tools because for continuous backup to cold storage just in case you lose your entire cluster so you still have data and then the other thing is that of course unlike a lot of public cloud providers we want you to be able to run something that places no limitation on what postgres plug-ins or configuration options you're going to run you know because maybe you want some obscure plug-in for doing i don't know biological data and postgres that you can't get off of amazon rds um we want you to be able to install that so and then of course availability we're talking about fully automated replication and failover that's it should just work you've got three postgres nodes one of them is the master at any given time if the master goes away you get a new master so now let's talk about a little bit about what's changed recently um and by change recently i mean this is still a feature branch the um so um our petroni design um i started out um actually before kubernetes was really usable as anything but we had at cd and so we designed it where you spin up an fcd cluster to provide uh to provide a single source of truth i think everybody here knows what cd is and what it does yes yeah um at least by by this point in the conference you ought to the um and to supply a single certain truth run that xcd separate cd cluster on our kubernetes um and then the postgres nodes communicate to and update the xcd cluster and then at the same time the kubernetes cluster maintains things like i should have three nodes in this particular postgres set or five nodes or whatever and they have this storage et cetera so um and that works pretty well and that's actually what's in production in several places however we're looking at this and we're saying you know what that's still more complicated than it needs to be why do i need this separate scd when i already have a single search of truth which is called the kubernetes api so what we've done recently is get rid of that separate cd as an option there's actually always going to be reasons why you want to run a separate ncd for some clusters but again simple defaults and as a default you know instead of having a separate dcs everything goes to the kubernetes api so you know leader elections go through the kubernetes api um to some other structures and everything else so this is as simple as possible because the idea is now i can deploy a high availability postgres cluster that consists of three and only three pods instead of needing several different services so now because most people in the room have not seen any patron presentations before let me go over basically how this works in terms of high availability with an animation and then i'll go ahead and actually demo it in action so the basic idea here is we have our container with postgres inside we've got the patrony daemon running in the containers pid1 um and then what happens is we have kubernetes we tell it to create a stateful set and it distributes a bunch of these uh creates a whole bunch of these and then we've got a bunch up and running that are empty currently because we're deploying a new cluster so what happens is they all communicate with kubernetes api and they say we want to have a leader election um by updating actually a config map and an endpoint and then they have that leader election and one of the nodes wins that node becomes the master at which point replication the other two nodes start replicating from that master now things can happen to the master particularly we're in a container cloud environment if nothing else we can run out of resources on that node and kubernetes can decide that it needs to migrate that container right so if something happens to the master what happens is the master key in the config map times out and then the remaining replicas will do another leader election and one of them will win and become the new master and the other one will start replication from that new master and this will all happen in the background and then of course kubernetes will notice that it does not have enough nodes in its stateful set and bring up another uh another postgres which will then replicate from the existing master making sure you have constant you know this and then via kubernetes services and endpoints we can make sure that there's always a connection to the master available so let me actually go a little bit more into detail in that failover because it is a little more complicated than just making a call to the api so what happens is the master vanishes and it's got this timeout you know we've got a time stamp on it and so we know the master hasn't checked in in a while so then the replicas who are checking in realize the master's not there and they grab a lock currently what we are doing for that is we are actually having them redefine the master endpoint um because in the way that kubernetes currently works that's an operation that only one of them can win um the the winner then checks the current replication status to make sure that it wasn't already it didn't already have broken replication before the master went down um i if replication status is okay we'll update the config map indicating that it is the master and then the other replicas who were not able to grab the endpoint will remaster um off of that master um and you will have a new cluster so it's a little bit it's a two-stage failover um which can affect you if you have a really crappy network um because if your network support is really crappy then it's gonna be a little slow but um so let's actually do that so well actually let me show you what's in that before i do this because it creates very fast so so uh we've got here a sample manifest this is just a slight modification of the one given as an example in the kubernetes branch of of the petroni repo so you can take a look at this yourself later but basically we just have we define a cluster name in there we have labeled applications we're telling it to do three replicas um you know for obviously for high availability purposes i recommend always at least three um because there's a lot of things that can cause two to fail um the um and then we have our container here um i have a slightly modified image because this is a feature branch and i've been debugging um but it will be merged before the end of the year the um and then we just have some definition of the container and everything else and your usual volumes um uh and then we define a whole bunch of environment variables that are getting passed through to the patrony demon in order to configure the behavior of postgres um so the usual things password ports if we were going to put in any kind of performance modifications from the post was defaults like how much memory uses etc we would also pass them here um the um and um i've got this commented out because i've got it running in media cube and there's no actual volumes the um uh then we have to define a couple of other things we actually want to define a separate endpoint if people are not familiar with inputs because you might not be normally when you create a kubernetes service and you give that service a selector then in the background to create something called an endpoint that actually allows things to connect however you can create a service without a selector in which case there is no endpoint and you can define your own endpoint most of the time people use this to connect to services outside of kubernetes but in our case we're using this to redefine the connection specifically to the master node so that we update only the endpoint the and then we create this sort of headless service with no selector now i also want a separate service that's going to load balance to all nodes for read-only connections so i create that service as well and then of course we have to have passwords and secrets so okay okay so you can see this here we've got this extra label called roll you know i'm going to actually shrink this just a little bit and you can see first a master comes up and then replicas come up that's very interesting sorry about that that is a remnant of running through this demo earlier so we've got a master and two replicas right there uh show you this actually in the console because it's actually kind of an easier way so we've got our state full set of petronie demo right here um and then again here are all of the nodes um these are actually sort of fun if you look at the uh we we send all of the activity here to the log output so you'll get the leader talking about being the leader and you'll get the replicas talking about being replicas which is your main log out but we also funnel of course the postgres log output to this um uh in the process of this we create a couple config maps and those should also not be there okay um you know one of which actually has the configuration of the cluster um which would have any non-default items uh i haven't i don't have a lot of non-default settings in here one of the non-default settings i have in there is there's a tool called pg rewind that can allow us to bring individual postgres into sync if they've gotten a little ahead during a failover circumstance at postgre's application details um the um and but the more important thing is this leader config map actually gives you the is one of the one of the several ways we offer to get information about who's the current leader so this is a lot more interesting actually if i show you the failover oh not enough so let's kill the current master and as you can see when when network latency is negligible as it is when you're running this on mini cube um it picks up the fact that the master has gone really fast um any when it's slower than that it's slower than that specifically because of network round trip time um to actually sort of detect that things are gone and obviously when you do things like when you lose physical nodes from your network it can take a little bit longer for network to timeout and then kubernetes will spin up a new one and we will have a replica again and let me show you those end points here so we've got two endpoints one is the master only endpoint which is pointed at whichever one is currently the master and then we have our load balancing endpoint that gets created now one of the things that you can actually say here is wow that's nifty and that's really super useful but among other things uh if we want to actually do among other things doing all this passing configuration via environment variables in kubernetes manifests is really irritating um and really doesn't help me when i need to actually manage dozens or hundreds of postgres clusters so it would be nice if there was another way well there is um and that's called operators so are people how many people here are familiar with the operator pattern the whole operator idea yeah so i mean coreos created the sort of operator idea around a database at cd and it turns out that it's actually useful for databases in general probably for other things where i've seen it used as databases so there's a few reasons why you want to use an operator one is it makes it easier to keep documented in a git repo the clusters that you're supposed to have because you reduce the information to the information that's distinct about each cluster rather than having a lot of scaffolding in with your cluster configuration files um the um for that matter if you have divided up teams you can actually pass a much simpler structure set of information to define the cluster so that people don't have to understand kubernetes yaml structure and how to define kubernetes objects and stuff and have the opportunity to screw up your kubernetes cluster in the process and then the other thing is an operator is actually an active thing it is a thing that runs on your kubernetes cluster and that's important because databases require scheduled work like if you have continuous backup running they have to take a snapshot periodically sometimes you want to run other forms of maintenance and that sort of thing so simply a stateful deployment deployment you're not done and the big thing is hey i'm effectively creating this a manifest anyway so if i'm doing that why not have that run everything why not just create a manifest and have everything else be automatic so i'm going to show you one operator so like i said spilo is the sort of full packaging of petroni plus the other stuff so there's an operator for spilo which is what zolando currently uses in their staging cluster that hasn't gone to the production cluster yet um and what you do for that is you install the operator you create a manifest for each cluster if you want to modify clusters you do that by modifying the manifest so let's actually do that although it'll take me a minute because apparently i have a lot of remnants of the previous uninstall um so okay so pardon me while i delete stuff left over from the previous run of this because otherwise it will refuse to build it okay so i need to install a config map for the operator itself because the operator has a configuration and then i need to actually create a service account for it because this is something that's required for greeting operators and then i need to create the operator and then you'll see the operator logs in registers itself um that's the the spyware operator there so once i've created an operator no um i can by simply defining a manifest and passing the manifest to the operator tell it to create a cluster so this is what a manifest looks like for an operator you know it's still yaml but it is not you don't have to understand the kubernetes infrastructure to be able to write these which is really important if you're going to have say development teams creating databases for their own applications instead you define some users you define some postgres settings i don't really have a lot in this particular one because i'm trying to show you a simple one but you can define a whole bunch of postgres settings within that manifest so okay so let's actually go down here and now okay why do we not have no no i'm actually going to do hold on a moment i i still think i have remnants left over from other stuff so let me spin up a new cluster hey there we go yeah one one of the outstanding bugs if you go to the post quiz operator thing you will see that there's an outstanding issue for it failing to clean up when we delete clusters so there is actually an issue there it's being worked on the um but we go ahead and install that and why are we not seeing the spylo roll oh there we are so again like great like the the bear petroni one of them gets elected whoops no that's not what i wanted one of them gets elected is the master then the other two should replicate from it in just a second the um if we actually do let's refresh so we've got now our stateful set running our three nodes for that and the same some of the same config maps that you would actually see through um doing the petroni it's the same thing but again you've simplified it even further and this has a tremendous advantage that you can now hand this off to development or testing teams who can deploy postgres clusters as they need them without needing to understand how kubernetes works now uh spilo operator is not the only operator out there um there's another whole another project for postgres and kubernetes that comes from crunchydata hi there so one of the crunchy data staff here um that comes from crunchydata um they have their own operator that's been out for a little bit longer so it's a little bit more mature um their architecture is very different from uh their architecture is very different from the peptone architecture they were very much coming from the enterprise side of things and so they're really looking at having a very full service postgres offering with a lot of tooling um the um which is what their thing the crunchy container suite is what it's called up um the um and also the design of operate their operator is a little bit different um design of our operator is the spiral operator is extremely manifest driven um and there really aren't like other ways to interact with it um the crunchy posters operator is very command and control driven and what i mean by command and control is uh there's a client tool called pgo that talks to the operator and so if you're using the crunchy operator the main way to actually deploy things is to use this client to specifically communicate with the operator the um and they have the whole demo and everything else around that um and it's again it's one of those sort of trade-offs in terms of how you want to do things there are advantages both ways um that image did not pop in sorry i had your whole i had the whole architecture diagram of the container suite the container switch has a lot of things in it it's also like i said it's also a very different setup for high availability postgres oh and i am a little bit too far forward it's also very different setup for high availability postgres than the petroni setup um quick comparison i was actually going through and trying out both operators and going through this there's some of it i mean the biggest thing is that the spylo operator is still kind of alpha because it was created recently um things move fast we'll say that forever um sort of different ways looking in there one of the main reasons why it's still kind of alpha is we haven't really written any documentation for it um which will change the um uh there's not yeah there's not a specific cli um for uh the spiral operator there very much is one for the crunchy operator very sophisticated cli um depends on like cli's or not crunchy has got is openshift compatible now spylo is not really um both support rolling upgrades um of post price which is another important thing that the operator does for you because you can do that by specifying a lot of things in your upgrade strategy for new containers in a kubernetes deployment but you'd rather not have to write that since since the operator already knows what to do so some future work plans because this is active number one is obviously merge the kubernetes native branch into the main petroni um second thing is to actually right now the operator um i integrate the operator and the kubernetes native branch um which currently uh currently the operator uses the older version of pedroni um i'm working on openshift compatibility because i work for red hat and i have to work on openshift compatibility and besides there's lots of people who use openshift and use postgres on openshift so the um prometheus integration is a big interest of mine because i like metrics but i also want to package some other things up for it like powwow which is this postgres performance analysis tool um and then one of the other things that's come up immediately is support for more complex replication topologies um other things that have come out some other much more hypothetical things that have come out partly from being here at the conference obviously istio integration in order to route traffic have more sophisticated routing of traffic to the postgres nodes um the um there actually is a command line tool for petroni that's used for a few things and right now it's completely separate from the operator and maybe it shouldn't be we're discussing that also i was interested in brendan's metaparticle lock crd so we're going to look at that and potentially having something that's defined block operator rather than overloading endpoints to perform that function for us uh support for logical the new logical replication is obviously a big sort of hypothetical thing um and whether or not we could customize the workloads api and maybe be less reliant on operator custom controller behavior now there are a few things that we run into in kubernetes that are still limitations that make life hard for us there really aren't very many i mean mind you i've been working with a team that does staple set for quite a while and so there really aren't very many because a lot of the things that i had in my issue list got done but multi-data center multi-availability zone cluster is hard to do with stateful sets right now um i should you know kubernetes have a built-in leader election thing that's been discussed before we'll discuss again um upgrading stateful sets pretty much requires you to do an operator right now because the state full set has no intelligence about what order it should upgrade things in um and it could so again up for discussion um and then it would be really nice to be able to ex do more extending of kube control so that we could stop having all of these separate clis one per operator that's completely separate from google control so but if you're interested in this there's something going to use we really would be happy to have more contributors um open project um the um petroni i'll admit that we do most of our communication via github issues um there's not a mailing list or anything so if you have a question or something like that file an issue um or find one of us on the postgres irc and let's make postgres boring i'll take questions in just a second uh resource links and now questions so the question was what's it written in uh the bulk of so petroni is in python and the operators in go because operators are in go the um uh so the question was the replication is with replication lag the default is asynchronous replication which means indeed there is replication lag and if you have an unexpected master failure um then you will lose some transactions if they're in flight however uh patoni is completely compatible with you turning synchronous replication on um there's examples in the documentation of what that would look like um and so if you want to run it with synchronous replication then you can turn that on um and particularly if you're going to do that i would recommend using postmaster and running it with synchronous quorum replication so that you don't get replica failure blocking your your right transactions right this question was what you missed most was user management and creating users via config maps um the operator if you modify the manifest has a section of the manifest devoted to users and if you operate the ma if you modify the manifest it should create the users if you're using the operator um if you're using petroni without the operator um then you can either create the users when you create the cluster because that goes in the patrony config um or if you want to create a user later on obviously log in a super user and create the other user um but you know which would happen to deployment time but if you're trying to do it in a sort of get repo managed way then the operator is the way to go let me just see if there was someone else at a quite some yeah what about storage how do you specify storage um well the same way honestly that you'd specify it in well for petronia in general you're specifying storage just through the kubernetes api right right um the um there's a section in the operator example where you tell it the storage selector um and and that's the limit of the discrimination it has

Info

Channel: Share Learn

Views: 3,595

Rating: 4.9069767 out of 5

Keywords:

Id: XP_iKW7zB4M

Channel Id: undefined

Length: 34min 7sec (2047 seconds)

Published: Thu Dec 03 2020