Postgres on Kubernetes

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
welcome everybody i'm josh burkus i work for  red hat my goal is to make deploying postgresql   on kubernetes or frankly to make deploying  postgresql in any form boring that is it   shouldn't take you shouldn't be boasting about  deploying postgresql on your internal cloud or on   your public cloud because it should be something  that everyone does with a single command now   is what boring comes from is if you saw matt  essay posted an article a couple of days ago   um talking about how despite being  27 years old at this point i think   postgresql is like one of the hot hip databases  to use with all of your new technology tools and   why this is one of the reasons why people like  using postgresql even with new languages and new   servers and new technologies is that it is boring  and what he means by boring is it is not something   that you have to think about as in you deploy  postgres and then you save your data to postgres   and a postgres says your data is saved it's  saved um and a postgres says your data is   replicated it's replicated and you don't really  have to think about whether that's true or not or   about sharding strategies or about um all kinds of  other stuff that you have to with some of the more   interesting scale out databases now um it's a  good statement it's not hip it it's not hip it   takes a bit to set up but after you're done you  have a reliable workhorse well so what i want   to do is i want to actually take out one part of  that statement it takes a bit to set up and that's   what this project is about and kubernetes is a  big help with us so i'm going to explain a few   things here i'm going to explain the petroni  project and the tool in general for anybody   who hasn't seen one of these one of the earlier  presentations for earlier versions of the tool   um explain what we changed recently for anybody  who has seen my previous presentations um and then   actually show you the sort of kubernetes native  version of it and demo that as well as a brief   demo of the zlando postgres operator and then  talk a little bit about another postgres operator   now for anybody who has seen you know this before  or read any of my stuff about petronia et cetera   what's changed since like the last kubecon  is a lot tighter kubernetes integration   um in order to just rely more on kubernetes  for all of the infrastructure things and i'll   show you that the second thing is that we  have an operator now which is fairly alpha   but there's an operator out there for anybody  familiar with the operator pattern and again   i will show you that and then the third thing of  course is that petroni kubernetes etc is now being   used in production and in a couple of different  places um and so as a result we've learned a lot   by having it deployed in production and relied  on which improvements have gone into petrone so some terminology so postgres  postgresql that's the database we're   talking about um the you know one that's  been around again you know since the 80s   um and yet continues to add new things  uh petroni is a high availability   demon service that runs inside a container explain  a little bit more about that in spilo um is   a distribution that wraps petronium postgresql  and some other common utilities in a container   to be your container of postgres goodness um so  you'll see those words again in this presentation   and that's what the three different sort of  things are so particularly going to talk a   lot about patrony which is the high availability  demon so this does a few things you run in your   container as pid-1 it controls whether or  not postgres is running in that container   it automates failover and replication for your  cluster of whatever defined size and it also   supplies a management api um i for the things that  you can't do through the normal postgres interface now question of you know what's the idea what's  the design behind this what are the sort of   specifically what do we think why we're doing  patrony and not using some other systems and   you know what our goals and there's three basic  ones so we're looking for simplicity um usability   and availability and what i mean by simplicity is  we're really trying to the majority people right   us people who are post-wars geeks we think of  postgres is this huge constellation of services   you know that does all these different things and  you've got foreign data wrappers and you've got   monitoring tools you've got all these things for  us you know having postgres be big and complex is   actually a virtue for everyone else postgres is  one thing it's an endpoint and they want that to   be one endpoint and not 16 different things um  and so our goal partly with the petronium spilo   is to make postgres one end point um and not  16 different things um to in you know rely on   kubernetes as much as possible so that the petroni  you know installed does not have to be complicated   that instead we're simply relying on the tools  that ship with or tools that we can enable with   kubernetes and to supply a set of defaults that  are what we believe just work for most users   instead of requiring anybody to go through  any kind of a complicated configuration   for a simple database that's going  to support say a web application   for usability we've got a couple things one is  there are some things in terms of controlling   postgresql that you can't do through the port  5432 sql interface and that includes things   like changing security settings um starting  and stopping initializing replication etc   and so we supply port access to  that with a restful interface   uh some configuration and plug-ins in order  to support a variety of cloud environments   like for example there's a whole set of plugins  for different backup tools because for continuous   backup to cold storage just in case you lose  your entire cluster so you still have data   and then the other thing is that of course unlike  a lot of public cloud providers we want you to be   able to run something that places no limitation  on what postgres plug-ins or configuration options   you're going to run you know because maybe  you want some obscure plug-in for doing   i don't know biological data and postgres  that you can't get off of amazon rds   um we want you to be able to install that so and  then of course availability we're talking about   fully automated replication and failover that's it  should just work you've got three postgres nodes   one of them is the master at any given time  if the master goes away you get a new master   so now let's talk about a little  bit about what's changed recently   um and by change recently i mean this is still  a feature branch the um so um our petroni design   um i started out um actually before kubernetes was  really usable as anything but we had at cd and so   we designed it where you spin up an fcd cluster  to provide uh to provide a single source of truth   i think everybody here knows what cd is and what  it does yes yeah um at least by by this point in   the conference you ought to the um and to supply  a single certain truth run that xcd separate cd   cluster on our kubernetes um and then the postgres  nodes communicate to and update the xcd cluster   and then at the same time the kubernetes cluster  maintains things like i should have three nodes   in this particular postgres set or five nodes or  whatever and they have this storage et cetera so   um and that works pretty well and that's  actually what's in production in several places   however we're looking at this and we're saying  you know what that's still more complicated than   it needs to be why do i need this separate  scd when i already have a single search of   truth which is called the kubernetes api so what  we've done recently is get rid of that separate cd   as an option there's actually always going to be  reasons why you want to run a separate ncd for   some clusters but again simple defaults and as a  default you know instead of having a separate dcs   everything goes to the kubernetes api so you know  leader elections go through the kubernetes api   um to some other structures and everything else so  this is as simple as possible because the idea is   now i can deploy a high availability postgres  cluster that consists of three and only three pods   instead of needing several different services   so now because most people in the room have not  seen any patron presentations before let me go   over basically how this works in terms of high  availability with an animation and then i'll go   ahead and actually demo it in action so the basic  idea here is we have our container with postgres   inside we've got the patrony daemon running in  the containers pid1 um and then what happens is   we have kubernetes we tell it to create a stateful  set and it distributes a bunch of these uh creates   a whole bunch of these and then we've got a bunch  up and running that are empty currently because   we're deploying a new cluster so what happens  is they all communicate with kubernetes api   and they say we want to have a leader election um  by updating actually a config map and an endpoint   and then they have that leader election and  one of the nodes wins that node becomes the   master at which point replication the other  two nodes start replicating from that master   now things can happen to the master particularly  we're in a container cloud environment   if nothing else we can run out of resources on  that node and kubernetes can decide that it needs   to migrate that container right so if something  happens to the master what happens is the master   key in the config map times out and then the  remaining replicas will do another leader election   and one of them will win and become the new  master and the other one will start replication   from that new master and this will all happen  in the background and then of course kubernetes   will notice that it does not have enough nodes in  its stateful set and bring up another uh another   postgres which will then replicate from the  existing master making sure you have constant   you know this and then via kubernetes services  and endpoints we can make sure that there's   always a connection to the master available so  let me actually go a little bit more into detail   in that failover because it is a little more  complicated than just making a call to the api   so what happens is the master  vanishes and it's got this timeout   you know we've got a time stamp on it and so  we know the master hasn't checked in in a while   so then the replicas who are checking in realize  the master's not there and they grab a lock   currently what we are doing for that is we are  actually having them redefine the master endpoint   um because in the way that kubernetes currently  works that's an operation that only one of them   can win um the the winner then checks the  current replication status to make sure   that it wasn't already it didn't already have  broken replication before the master went down   um i if replication status is okay we'll update  the config map indicating that it is the master   and then the other replicas who were not able  to grab the endpoint will remaster um off of   that master um and you will have a new cluster  so it's a little bit it's a two-stage failover   um which can affect you if you have a really  crappy network um because if your network   support is really crappy then it's gonna be a  little slow but um so let's actually do that so well actually let me show you what's in that  before i do this because it creates very fast so so uh we've got here a sample manifest this is  just a slight modification of the one given as an   example in the kubernetes branch of of the petroni  repo so you can take a look at this yourself later   but basically we just have we define a cluster  name in there we have labeled applications we're   telling it to do three replicas um you know  for obviously for high availability purposes   i recommend always at least three um because  there's a lot of things that can cause two to fail   um the um and then we have our container here um  i have a slightly modified image because this is   a feature branch and i've been debugging um but  it will be merged before the end of the year   the um and then we just have some definition  of the container and everything else and your   usual volumes um uh and then we define a whole  bunch of environment variables that are getting   passed through to the patrony demon in order to  configure the behavior of postgres um so the usual   things password ports if we were going to put  in any kind of performance modifications from   the post was defaults like how much memory uses  etc we would also pass them here um the um and um   i've got this commented out because i've got  it running in media cube and there's no actual   volumes the um uh then we have to define a  couple of other things we actually want to   define a separate endpoint if people are not  familiar with inputs because you might not   be normally when you create a kubernetes service  and you give that service a selector then in the   background to create something called an endpoint  that actually allows things to connect however you   can create a service without a selector in which  case there is no endpoint and you can define your   own endpoint most of the time people use this  to connect to services outside of kubernetes   but in our case we're using this to redefine  the connection specifically to the master node   so that we update only the endpoint the  and then we create this sort of headless   service with no selector now i also  want a separate service that's going   to load balance to all nodes for read-only  connections so i create that service as well and then of course we have  to have passwords and secrets so okay okay so you can see this here we've got this extra  label called roll you know i'm going to actually   shrink this just a little bit and you can see  first a master comes up and then replicas come up that's very interesting sorry about that that is a remnant  of running through this demo earlier so we've got a master and two replicas right there   uh show you this actually in the console because  it's actually kind of an easier way so we've got   our state full set of petronie demo right here um  and then again here are all of the nodes um these   are actually sort of fun if you look at the uh we  we send all of the activity here to the log output so you'll get the leader talking about being  the leader and you'll get the replicas talking   about being replicas which is your main log  out but we also funnel of course the postgres   log output to this um uh in the process  of this we create a couple config maps   and those should also not be there okay  um you know one of which actually has the   configuration of the cluster um which would have  any non-default items uh i haven't i don't have   a lot of non-default settings in here one of the  non-default settings i have in there is there's a   tool called pg rewind that can allow us to bring  individual postgres into sync if they've gotten a   little ahead during a failover circumstance  at postgre's application details um the um   and but the more important thing is this  leader config map actually gives you the is   one of the one of the several ways we offer to  get information about who's the current leader   so this is a lot more interesting  actually if i show you the failover oh not enough so let's kill the current master and as you can see when when network  latency is negligible as it is when   you're running this on mini cube um it picks up  the fact that the master has gone really fast   um any when it's slower than that it's slower than  that specifically because of network round trip   time um to actually sort of detect that things are  gone and obviously when you do things like when   you lose physical nodes from your network it can  take a little bit longer for network to timeout   and then kubernetes will spin up a new  one and we will have a replica again and let me show you those end points here so we've  got two endpoints one is the master only endpoint   which is pointed at whichever one is currently  the master and then we have our load balancing   endpoint that gets created now one of the things  that you can actually say here is wow that's nifty   and that's really super useful but among  other things uh if we want to actually do among other things doing all this passing  configuration via environment variables in   kubernetes manifests is really irritating um and  really doesn't help me when i need to actually   manage dozens or hundreds of postgres clusters  so it would be nice if there was another way   well there is um and that's called  operators so are people how many   people here are familiar with the operator  pattern the whole operator idea yeah so i mean   coreos created the sort of operator idea around a  database at cd and it turns out that it's actually   useful for databases in general probably for  other things where i've seen it used as databases   so there's a few reasons why you want to  use an operator one is it makes it easier to   keep documented in a git repo the  clusters that you're supposed to have   because you reduce the information to the  information that's distinct about each cluster   rather than having a lot of scaffolding  in with your cluster configuration files   um the um for that matter  if you have divided up teams   you can actually pass a much simpler structure  set of information to define the cluster   so that people don't have to understand kubernetes  yaml structure and how to define kubernetes   objects and stuff and have the opportunity to  screw up your kubernetes cluster in the process   and then the other thing is an operator is  actually an active thing it is a thing that runs   on your kubernetes cluster and that's important  because databases require scheduled work like if   you have continuous backup running they have to  take a snapshot periodically sometimes you want   to run other forms of maintenance and that sort of  thing so simply a stateful deployment deployment   you're not done and the big thing is hey i'm  effectively creating this a manifest anyway   so if i'm doing that why not have that run  everything why not just create a manifest and have   everything else be automatic so i'm going to show  you one operator so like i said spilo is the sort   of full packaging of petroni plus the other stuff  so there's an operator for spilo which is what   zolando currently uses in their staging cluster  that hasn't gone to the production cluster yet um   and what you do for that is you install the  operator you create a manifest for each cluster   if you want to modify clusters you do that by  modifying the manifest so let's actually do that   although it'll take me a minute because apparently  i have a lot of remnants of the previous uninstall   um so okay so pardon me while i delete stuff  left over from the previous run of this because otherwise it will refuse to build it okay so i need to install a config map for the operator  itself because the operator has a configuration and then i need to actually create a  service account for it because this is   something that's required for greeting operators and then i need to create the operator and then you'll see the operator logs in registers  itself um that's the the spyware operator there   so once i've created an operator no um i can by simply defining a manifest  and passing the manifest to the operator   tell it to create a cluster so this is  what a manifest looks like for an operator   you know it's still yaml but it is not you don't  have to understand the kubernetes infrastructure   to be able to write these which is really  important if you're going to have say   development teams creating databases for their  own applications instead you define some users you   define some postgres settings i don't really have  a lot in this particular one because i'm trying to   show you a simple one but you can define a whole  bunch of postgres settings within that manifest so okay so let's actually go down here and now okay why do we not have no no i'm actually going to do hold on a moment i i still think i have remnants left over from  other stuff so let me spin up a new cluster hey there we go yeah one one of the outstanding bugs  if you go to the post quiz operator thing you   will see that there's an outstanding issue for  it failing to clean up when we delete clusters   so there is actually an issue there it's being  worked on the um but we go ahead and install that and why are we not seeing the spylo roll oh  there we are so again like great like the   the bear petroni one of them gets elected  whoops no that's not what i wanted one of   them gets elected is the master then the other  two should replicate from it in just a second the um if we actually do let's refresh so we've  got now our stateful set running our three nodes for that and the same some of the same  config maps that you would actually see   through um doing the petroni it's the same thing  but again you've simplified it even further   and this has a tremendous advantage that you can  now hand this off to development or testing teams   who can deploy postgres clusters as they need them  without needing to understand how kubernetes works now uh spilo operator is not the only  operator out there um there's another   whole another project for postgres and  kubernetes that comes from crunchydata   hi there so one of the crunchy data staff here  um that comes from crunchydata um they have their   own operator that's been out for a little  bit longer so it's a little bit more mature   um their architecture is very different from  uh their architecture is very different from   the peptone architecture they were very much  coming from the enterprise side of things   and so they're really looking at having a very  full service postgres offering with a lot of   tooling um the um which is what their thing the  crunchy container suite is what it's called up   um the um and also the design of operate  their operator is a little bit different um   design of our operator is the spiral  operator is extremely manifest driven   um and there really aren't like other ways to  interact with it um the crunchy posters operator   is very command and control driven and what i mean  by command and control is uh there's a client tool   called pgo that talks to the operator and so if  you're using the crunchy operator the main way   to actually deploy things is to use this client  to specifically communicate with the operator   the um and they have the whole demo and everything  else around that um and it's again it's one of   those sort of trade-offs in terms of how you  want to do things there are advantages both ways   um that image did not pop in sorry i had your  whole i had the whole architecture diagram of   the container suite the container switch  has a lot of things in it it's also like i   said it's also a very different setup for high  availability postgres oh and i am a little bit   too far forward it's also very different setup for  high availability postgres than the petroni setup um quick comparison i was actually going through  and trying out both operators and going through   this there's some of it i mean the biggest thing  is that the spylo operator is still kind of alpha   because it was created recently um  things move fast we'll say that forever   um sort of different ways looking in there  one of the main reasons why it's still   kind of alpha is we haven't really written any  documentation for it um which will change the um   uh there's not yeah there's not a specific cli um  for uh the spiral operator there very much is one   for the crunchy operator very sophisticated cli  um depends on like cli's or not crunchy has got   is openshift compatible now spylo is not  really um both support rolling upgrades   um of post price which is another important thing  that the operator does for you because you can do   that by specifying a lot of things in your upgrade  strategy for new containers in a kubernetes   deployment but you'd rather not have to write that  since since the operator already knows what to do   so some future work plans because this is active  number one is obviously merge the kubernetes   native branch into the main petroni um second  thing is to actually right now the operator   um i integrate the operator and the  kubernetes native branch um which currently uh   currently the operator uses the older version of  pedroni um i'm working on openshift compatibility   because i work for red hat and i have to work  on openshift compatibility and besides there's   lots of people who use openshift and use  postgres on openshift so the um prometheus   integration is a big interest of mine because  i like metrics but i also want to package some   other things up for it like powwow which is this  postgres performance analysis tool um and then   one of the other things that's come up immediately  is support for more complex replication topologies   um other things that have come out some other much  more hypothetical things that have come out partly   from being here at the conference obviously  istio integration in order to route traffic   have more sophisticated routing of  traffic to the postgres nodes um the um   there actually is a command line tool for petroni  that's used for a few things and right now it's   completely separate from the operator and  maybe it shouldn't be we're discussing that   also i was interested in brendan's metaparticle  lock crd so we're going to look at that and   potentially having something that's  defined block operator rather than   overloading endpoints to perform that  function for us uh support for logical the   new logical replication is obviously a big sort of  hypothetical thing um and whether or not we could   customize the workloads api and maybe be less  reliant on operator custom controller behavior now there are a few things that we run into in  kubernetes that are still limitations that make   life hard for us there really aren't very many  i mean mind you i've been working with a team   that does staple set for quite a while and so  there really aren't very many because a lot of   the things that i had in my issue list got done  but multi-data center multi-availability zone   cluster is hard to do with stateful sets  right now um i should you know kubernetes   have a built-in leader election thing that's  been discussed before we'll discuss again   um upgrading stateful sets pretty much requires  you to do an operator right now because the state   full set has no intelligence about what order it  should upgrade things in um and it could so again   up for discussion um and then it would be really  nice to be able to ex do more extending of kube   control so that we could stop having all of these  separate clis one per operator that's completely   separate from google control so but if you're  interested in this there's something going to use   we really would be happy to have more contributors  um open project um the um petroni i'll admit that   we do most of our communication via github issues  um there's not a mailing list or anything so if   you have a question or something like that file  an issue um or find one of us on the postgres irc   and let's make postgres boring i'll  take questions in just a second   uh resource links and now questions so the  question was what's it written in uh the bulk   of so petroni is in python and the operators in  go because operators are in go the um uh so the   question was the replication is with replication  lag the default is asynchronous replication   which means indeed there is replication lag and if  you have an unexpected master failure um then you   will lose some transactions if they're in flight  however uh patoni is completely compatible with   you turning synchronous replication on um there's  examples in the documentation of what that would   look like um and so if you want to run it with  synchronous replication then you can turn that   on um and particularly if you're going to  do that i would recommend using postmaster   and running it with synchronous quorum replication  so that you don't get replica failure blocking   your your right transactions right this question  was what you missed most was user management and   creating users via config maps um the operator  if you modify the manifest has a section of the   manifest devoted to users and if you operate the  ma if you modify the manifest it should create the   users if you're using the operator um if you're  using petroni without the operator um then you   can either create the users when you create the  cluster because that goes in the patrony config um   or if you want to create a user later on obviously  log in a super user and create the other user um   but you know which would happen to deployment  time but if you're trying to do it in a sort of   get repo managed way then  the operator is the way to go let me just see if there was someone else  at a quite some yeah what about storage how   do you specify storage um well the same way  honestly that you'd specify it in well for   petronia in general you're specifying storage just  through the kubernetes api right right um the um there's a section in the operator example  where you tell it the storage selector   um and and that's the limit  of the discrimination it has
Info
Channel: Share Learn
Views: 3,595
Rating: 4.9069767 out of 5
Keywords:
Id: XP_iKW7zB4M
Channel Id: undefined
Length: 34min 7sec (2047 seconds)
Published: Thu Dec 03 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.