OCB: High Availability PostgreSQL and more on OpenShift - Jonathan Katz (Crunchy Data)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] hello and welcome again to another openshift commons briefing today is part of a series on operator hours and the good folks from crunchy data have long for a long time done great work around the operator pattern and today we have jonathan katz here with us to talk about the high availability postgres um and a whole lot more on openshift and we're gonna have uh him do it an overview of of the topic and hopefully a little bit of a live demo because we'll put him on the spot and live q a so if you're wherever you are whether you're in twitch or youtube or facebook or here in blue jeans ask your questions in the chat and we'll relay them to him and have a conversation at the end about you know the future of postgres sequel and all kinds of good things so jonathan take it away um introduce yourself and um let's learn a lot more thank you thank you thanks diane yeah so yeah very happy to be here it's uh you know it's been a while since uh i've been participating in an open shift commons i'm looking forward to the point where we can all get together again in person so uh just real quick a little bit about crunchy data uh all we do is postgres and all we do is open source i know very similar to red hat's model uh you know we started historically focusing on how to securely deploy deploy postgres uh which is you know very important in you know many enterprise environments and then you know as you know containers became a big deal and as kubernetes and openshift became a big deal he focused on how do you deploy postgres in those types of environments which is a lot of what we're going to talk about today and like i said you know everything we do is open source you know everything i'm going to demo today talk about today's open source you know we believe you know the best postgres solution is you know the upstream solution so oh you know everything we try to do you know we you know we give back to the community in that regard so yeah you know feel free to you know ask questions throughout you know i i'm very active in the postgres community actually i have a slide on that um i've actually been i've been using postgres for about 15 years um i've been active in the community for about 10 actually a little bit over 10 now mainly focused on advocacy so for instance we have the postgres 13 major release coming out next week and a lot of what i do is you know geared towards you know getting that release out the door um you know before i joined crunchy data i was involved in a you know a few different startups in new york city and of course we use postgres as the solution there but my background's actually an application development and i think you know some of the things we'll talk about today will reflect that and you know for me you know it was sort of my journey from application developer to understanding more of the systems administration side and you know bringing that background of you know being always being the accidental systems administrator in you know the various organizations i worked for so uh and you know you know beyond beyond that um you know i just love the open source ecosystem in general you know i've been very fortunate in my career i've only you know i've been able to use exclusively open source so um it's great to you know give back you know in a variety of ways so you know you know maybe the back of a second you know if we were well first off if we were in a you know a regular uh conference setting i would normally ask how many people are running postgres and then how many people are running postgres in production but i also like to take a step back and just talk about you know what it takes to run a database you know in an organization you know it's one thing to play around with you know a data system on your laptop do some cool things on it which you know arguably is what i'm going to be doing in the demo today but there's you know more considerations when you're trying to run a database system in production i mean it's very similar to what you might do with openshift where you you know you have multiple you know high availability notes set up to make sure that if you know one goes down you're able to get your workloads over to another until you know the the original node heals um and you know and you know and you need to consider this you know when you're running postgres i mean or you know even any database i mean databases are you know foundational in applications they're storing your data so you need to make sure your data is there is safe it's stored securely and you're able to retrieve it and retrieve it efficiently as well but you know a lot of things can go wrong in production um you know for instance your data might become unavailable there could be a network outage there could be you know a road process that's you know just stopping all the resources from the system um you know it could you know it could be any combination of things you know you know there are many cases where you know i would you know commit a denial of service on my own production database so there's the human error as well so you want to make sure that your database remains as available to your applications as possible but you know certain you know other things can go wrong as well you know for instance let's say i drop the users table you know that's going to cause a very bad day so we need to be able to restore the database you know back to the point where the user's table existed um beyond that you know you might be you know using databases in your development environments that you do have one production environment but you know to develop you want to be able to bring up databases for your team so you won't be able to rapidly provision and you know maybe rapidly destroy you may also want to be able to create production or production like data or scenarios for your developers to work through for instance to troubleshoot an issue that is in your production environment so being able to clone and copy that data is you know very important as well and of course if you have you know various you know data regulatory concerns you might be able to do that in a way that is safe and secure you to be able to you know anticipate problems before they happen so you might be able to monitor you might want to know that oh i'm running out of disk you know i need to provision you know i need to provision more space i need to resize my you know pvc or you know or pv um you know maybe you know maybe things are not performing as quickly as they could be you want to be able to you know troubleshoot and diagnose which are your slow queries you know and try to optimize them um of course you need to be able to manage you know you know who's able to access your data production typically it's your applications or you know a database administrator who might come in and troubleshoot things um and you know you also need to make sure you're securing the connections or are they over tls and you know you know other all the considerations around that so there's a lot to consider um and you know i you know i sort of tried to layer it that you know there's you know multiple ways to get there that you do have your stock postgres system but you need to bring in an ecosystem around it now postpress has a lot of these things available in terms of you know what's built into into the core system and this is you know comes through really over 30 years of evolution of postgresql i'd like to joke well it's not really a joke you know i'm about the same age as postgres uh it's been around you know it's been around for a hot minute now um and a lot has you know it's it has you know adapted and changed with the times um you know one of the reasons why postgres has become very popular uh through the years is because of its license you know much like linux you know there's no single vendor um you know it's a very flexible license no one can actually own postgres and the community has gone to great lengths to make sure you know this is the case because of this you know it actually you know it it's attracted a very healthy ecosystem of people contributing to postgres and through this and through real world use cases you know it's you know recent postgres releases have brought it to you know feature parity with other proprietary databases so an exam i'll take an example from about 10 years ago but uh streaming replication now and streaming replication is one of the foundational aspects for you know creating a high availability system that you're able to take your data from your primary server and you know and you know copy it over you know in an ordered fashion to your replica server and in a way that's you know fast and efficient so that came about 10 years ago and it you know it was you know the basic form is basically you know as soon as i get the data changes send it over i've i've no guarantees if it gets there but i'm going to try to send it over as fast as i can then came synchronous replication which is geared towards workloads that are you know require you know that are right sensitive for instance i really don't want to lose the transaction i want to make sure it's copied to at least one other place uh that you know that came let's say you know roughly nine years ago and you know i could actually probably go into a whole spiel about the difference between these two replication modes their trade-offs etc i'll save that for a later slide or perhaps a question you know you know and you know as opposed to develop you know further things were added uh suddenly you had cascading replication where you could have a replica of a replica which allows you to you know push things out you know even you know you know you know you know more distributed environments and i can actually tell you that the postcast operator which we'll talk about you know supports that you know supports that feature in one of its architectures uh we you know eventually we got something called quorum commit um you know which is popular in the distributed consensus world where basically you can say as long as my you know my transaction gets to at least you know let's say two replicas then i'm gonna consider it written and if not you know i'm gonna hold the transaction until i know it's you know safely written to you know a certain number of replicas we also got logical replication which you know created you know you can create some you know really awesome things around that but one of the important things is being able to online upgrade so you know going from postgres let's say you know 10 to 12 i can keep my system online and basically you know flip it over like that reducing the amount of downtime that i need um and you know and so and and that's just one feature now you know this replication has been very important to you know you know ensure postgres can stay available now to properly use it there's a few you know you know it does take some work to you know set things up you know similar to writing a an openshift manifest uh you know they're you know the feature is there but you know you need to be able to fill it all in and this is where using automation patterns like the operator can help uh you know with that task the the other thing is you know postgres you know extensibility you know beyond just having extensions themselves and you know we're going to talk about those today where you know you can basically you know add on to postgres functionality while still maintaining you know the open source core um there's a whole library ecosystem that makes it you know super easy to manage postgres you know be able to for instance monitoring it or you know you know handling you know the aha in a really cool way beyond that uh you know having you know transaction safety data integrity ensuring that your data gets rooted into this being able to detect if your data is corrupted uh this is the page checksum feature and postgres um you know it's you know it's really made it you know very trusted and the question late you know when i first started the postgres community the question i got most often was what's postgres and this was in the tech community that you know people didn't really know what it was and here it had already been around for 20 years the question i get now is not what's postgres but how do i do x y z with postgres how do i deploy it in this manner how can i use this feature and you know it's really showing the evolution that you know not only does you know postgres have you know a name recognition um it's being deployed it's being deployed in mission critical ways and again with my uh open source community hat on like it's just so cool to you know have really seen their evolution and it really goes the credit of the community and you know the feedback we've been getting into the community to make sure we can continue improving the products so um that is my long postgres there you know i was joking with diane that you know i could probably talk about postgres itself for at least an hour and uh there still would be plenty more to talk about but you know today you know the goal is to talk about how do we run postgres on you know openshift or kubernetes and in a way that can satisfy you know all the all the needs you need to be able to run it into production and this is where operators come in uh operators you know have really changed the way in terms of how you can deploy faithful applications on kubernetes and what do you mean by that paper workloads really only have one job you need to maintain state basically if you know i modify you know let's say i insert a record let's say it's a financial transaction i need to make sure it gets written to disk i need to make sure that you know nothing's going to mutate that record in a way where it's going to invalidate the transaction or lose the transaction i mean that would be pretty bad so we want to make sure that we're able to create a system that's resilient that let's say you know my disk does you know crash out or it knows you know the block there's you know some corrupt blocks on the pvc i'm using that i'm able to restore my data in a way and know that you know i didn't lose any of it but in order to do that in kubernetes you know we need to apply more knowledge in general um and this is where operators can help because operators can help encode specific information into you know into your environment so that you're deploying your you know you're deploying your stateful service in a way that you know matches what it does so postgres does things one way mysql does things another way there's you know from a high level they're similar but the specifics in terms of how they work are different and that's where the operator can capture that information additionally you know to modify the state you know it could be arranged for something you know very simple or very tedious to create a postgres replica is actually a little bit of a tedious process it helps to have all that automated and it helps to you know have that automated in a way that is able to do it efficiently um adding a user on the other hand is very simple you know i can add a user database now with one line of sql but let's say i bring a new dba into the team and i want to give give them an account uh to all the databases well that's a little bit more tedious and granted you know if you've been doing devops practices for years you can you know write you know your appropriate ansible playbook to get that all set up but then you have to think about that from a maintainability perspective because you know you're you know you're creating something that may or may not be standard standard practice and then you have to teach everyone in terms of how to do that what operators do is that they create a consistent view of the world so that you know adding a user is the same you know no matter where you install the operator and that's where this pattern is great because you know with having a healthy operator ecosystem suddenly you know we're able to you know create standard ways of you know taking actions on the various uh stateful services that are out there um and you know this leads to the final point of you know allowing automated managed workloads um you know some of this starts with just being able to do you know the proverbial day two operations handling high availability and having monitoring in place having you know taking systematic backups and so on and so forth but those things continue to advance and you know some of this is forward-looking um having you know having smart systems and you know some some of these are already and when me what i mean by spark systems i mean you know being able to auto scale or on you know auto auto tune themselves and there's some services that can do that and with postgres you know to be able to deal with uh a database system like postgres to do things like auto scaling you need to have a lot more knowledge because you know what does that actually mean do i need to scale vertically do i need to add more ram how does that affect my postgres configuration or you know do i need to load balance things more is my workload really read heavy how do i determine if it's read heavy how what are my what are my scaling thresholds and we have the ability to start amassing that information and start building out systems like that but you know that that'll take some time that said you know with the way the operator pattern works you know you know we're well on our way to getting to that point so you know i would say that concludes my spiel on kubernetes operators and you know tie it into a topic that's near and dear to my heart which is the country postgres operator so as i said at the beginning at crunchy data everything we do is open source um and the operator is no exception uh it's actually been open source since march of 2017 uh it's one of it's one of the early examples of a stateful operator uh currently one our version four four actually we have a four five in beta right now i'm gonna demo a little bit from that today this is uh the surprise demo i was uh informed about but uh you know that surprise demos are all the more fun uh it's a it's level five on uh operator hub uh it's a certified operator um you know you can go ahead try it out today and you know going back to that original slide i mean i do like the hippo wearing the the coop hat but the idea with the operators you want to support all the features you need to run postgres in production and it could be a single postgres postgres cluster running a production or it could be you know a thousand postgres clusters being managed by the operator you know we've seen both use cases of course everything in between and to us you know it was important that we're able to do that and you know i could go i can go and read through the list but i you know and i'm rather i'm going to cover you know a couple things i hadn't touched on yet um elasticity being able to add and remove replicas you know they could be for high availability purposes they could be that you need a read-only replica you know your your business intelligence person comes in and you know wants to perform you know some read-only queries um and you know it could be you know it could be for load balancing as well you know it you know it just depends on what your use case is but you know the idea was to make it easy that i can you know add an additional replica just by typing you know pico scale hippo or whatever my cluster is uh leveraging you know one thing that's important too is we can leverage you know kubernetes and openshift native objects to you know create a you know very sound system so uh kubernetes has this concept of pod anti-affinity where you can assign rules to say hey you know i want pods of these types to ensure that they don't deploy to the same cluster uh you can require it and say like don't schedule the pod unless they're specifically deployed to uh different nodes or you can prefer it and say you know try to schedule the pods of different nodes but if you can't do it no it's okay you know schedule to the same node um but the reason we we went with that is to make sure that there's you know a higher probability or guaranteed probability that you can schedule your postgres instances to different notes um a disaster recovery uh we use an open source tool uh called pg backrest um you know it's actually something that you know we support very heavily and the reason is that it was designed for terabyte skill databases that you know they saw the office saw an issue where it was difficult to create an efficient backup solution and postgres for you know very large databases and they sure enough they solve that problem um you know and you know one thing that we've done with the operators that we've built our disaster recovery solution around it um it's great um i you know i you know it's one thing that you know i've always looked to employ you know wherever i've been employed you know and since the existence of pg backrest and you know we're very happy to you know help try further use cases to it in fact you know one of the benefits of doing everything open source is that you know the operator does help you know and based upon the feedback we get from the operator it does help drive features being added to pg backrest so one of the big ones was being able to expire backups you know or create a time-based retention policy you know saying like hey i want to keep this full back up for you know 21 days um you know that you know came from direct feedback of people using the postgres operator and ultimately it was upstreamed into peachy backrest administration uh one thing that i may or may not attempt to demo is being able to use the popular pg admin tool which is a graphical user interface to uh administrate um the uh you know your postgres instance and a pc monitor which is a connection pooler you know one of the known limitations in postgres is that um the number of connections you have could have an impact on database performance and it has to be a fairly large number uh that said uh first off there's actually a big improvement for that coming in postgres 14 but you know that's at least a year away and um you know you know until then you know there is pg bouncer which uh is you know a connection bullet so basically you could have uh you know you know several hundred connections coming into peachy bouncer and then you know skill scale down when actually goes into postgres i've deployed it you know multiple times in my career big you know i was very happy that you know we were able to integrate it into the the postgres operator and you know there's some other this matter benefits of you know future balancer as well where you can do connection state management or basically doing a failover scenario you can have pg bouncer you know hold connections until the failover is complete and then resume them last but not least and you know a big feature four five is a full support of the open source pg monitor which includes you know lots of wonderful the charts and uh graphs that are essential for monitoring postgres clusters which we'll touch on in a little bit so why should we use an operator uh well there's a lot there's a lot of you know really good points on that um i'll i'm just going to brush on this slide because i'd rather get into you know talking about some of the uh the architecture and the the nitty-gritty details of how the postcards operator works but you know automation standardization ease of use um and i think you know you know i tried to discuss this as well as i could before um i do want to emphasize the ease of use you know the idea that you know we have you know a fairly simple api or you you know manipulate things from the crds or you know even like a cli ui you know our pico command line interface you know is is very popular because i can just do pico create cluster hippo boom i have a postgres cluster pico create cluster hippo replica count three i have you know a high availability uh uh uh cluster there um i've also been involved in developing a ui around the operator and you know it's you know that that's the longest story for another day but you know by creating like a standard interface it's very easy to create you know even more you know robust applications around postgres um scale and when i talk about scale you know it's not just saying like oh you know the postcards operator can manage you know a thousand postgres clusters it's also scaling your workload and you know this is something i always looked at you know when you know as an engineering manager is how do i scale our processes across the team can i create something that's repeatable and standard that you know everyone can you know can uh be able to do and interface with so that um you know that's important to consider as well um we certainly don't you know engineering type things to ensure the operator itself can scale um for instance our multi-name space support um you know we at one point we were hitting limitations on it but you know thanks to one of our very smart engineers you know we can handle i know easily northward of 100 name spaces in the multi-name space mode you know i don't know you know we haven't really pushed the limits of it yet or we haven't found the limits i should say um security and this is the cool thing about running postgres and containers is that you naturally get a certain level of security there and such that you're in a sandbox environment um and that actually affords you to do you know if you want to do some you know cooler things like give people super user within your container and by that i mean not a postgresql a root within the container yeah you can do that and there's lower risk to that um you know i might have my own opinions on what you should be doing but the idea is that you know you know you're operating in more of a sandbox environment and you know that affords its advantages as well and you know just having the process isolation built in too certainly helps when you're running in a in a multi-tenant environment um and you know the flexibility and this is one of the first things that really attracted me to what we were doing with the operator is that as long as you have a node you can run it anywhere you know it doesn't matter where your openshift nodes are deployed yeah you can run a postgresql cluster there and that's really cool and you know i think you know in in this regard like you know i can you know i'll speak for the quenching operator but the you know operators in general create that unified layer where so long as i have you know something that speaks you know kubernetes or openshift i can deploy my workload thing with if i'm running openshift you know as long as you know i have an open shift node somewhere it doesn't matter what cloud provider i'm on what hardware i'm running on you know i can deploy there so that's really cool you know one thing that we do when testing the postgres operators we test it you know we test in all sorts of places you know we run openshift in one environment it'll be shipped in another environment and you know for the most part it just works so that's uh you know that that's my pitch for operators in general you know let alone the the postgres operator so how does it all work um well you know i would say just look at the diagram but going into a little bit you know part of the operator pattern is that you have these things called custom resource definitions and you know from my application developer perspective i consider that your database schema your your customer resource definitions or crds basically you know store the infrastructure of what you're going to deploy so we have one called uh pg cluster the peachy cluster stores is essentially your definition of your postgres cluster you know it says how many you know how many replicas do you want what ports you want to run it on do you want a peachy balancer with it do you want to collect metrics with it you know and so on and so forth and you know from that you know pg cluster you should be able to deploy a postgres cluster anywhere if somehow your whole kubernetes environment gets wiped out well well if your data gets wiped out then that's a different story but let's say all your deployments get wiped out well i should be able to redeploy you know a postgres cluster that looks exactly like what you had before from that definition excuse me drink of water so from there what the operator does is the operator reacts to what's been added to a customer resource definition or any changes in the customer resource definition and then applies those changes throughout your openshift environment we've layered a few things on top of that um we have an api server that makes it a little bit easier to interface with uh the custom resource definitions as it aggregates that information together and that's actually what the pico command line client uh uses as well speaking of we have a command line client called excuse me pico which is um i got the pico client it works across different operating systems and it makes it you know super simple to create postgres clusters manage post clusters modify them take backups you know whatever it may be and we'll demonstrate that we also have a scheduler which is used to uh schedule backups amongst other things um and this is great because you know you should always be taking backups that's another thing i always emphasize that there's one takeaway from this talk please take backups of your postgres cluster you know you never know when you'll need them and maybe to further emphasize that um there was a time in my career like one time where i really had to make a point in time recovery like you know there was something where we had to you know get the data back to certain point of time and replay it you know and you know find certain things and thankfully you know now that we have backups you know we we had our transaction history you know back to that point and we were able to you know solve the issue but please take backups it's so important so just uh finishing up you know this diagram um the most important thing when you're running a database is your storage uh no matter what it is your storage is going to be your bottleneck george has gone much faster like ssds were one of the greatest things that happened for uh you know database workloads you know the one trick with ssds particularly back in the day was reliability but you know you know the recommendation would be you know always have you know run in like a raid 10 mode um so that all said um you know the interesting thing in you know the kubernetes and openshift ecosystem are you know the variety of storage classes available um or even you know people who you know we have cases where people just use a host path storage and you know pin their you know postpress clusters to a single node and ensure you know everything gets right into that node um but you know storage selection is you know a very you know detailed involved topic um the various storage classes have certainly improved you know even you know even since i started our crunchy data and you know it's interesting to see what's you know happening there um our team stays on top of it too you know for for our various testing and uh compatibility purposes but um you should always consider you know what your you know what level of storage you need based on your workload not everyone needs the latest and greatest and most expensive storage for their databases i mean you know am i biased like yes of course i you know i love playing with like the latest and greatest and fastest storage but you might not necessarily need that but if you do you should know how you're able to you know optimize uh using uh postgres you know with your very storage layers so let's talk about high availability which you know was you know the you know the original topic of this uh this chat um this is a really important slide because this explains how high availability works uh with the postcards operator but it also shows like how cool it is you know whenever we get tickets about uh you know support tickets around high availability you know one thing we always try to ask is you know how's your high availability set up in your kubernetes or your open shift environment because your posts with high availability with the postgres operator is tied to your kubernetes high availability and this is the feature and this is why it's really cool so kubernetes and openshift are backed by their own distributed consensus storage system you know you know it could be fcd and you know ncd has its own high variability high availability system built in but the reason why we leverage this is because it minimizes the amount of postgres nodes you need to deploy for high availability so so let's take you know the raft algorithm now the raf algorithm algorithm says you know you always need an even number sorry you always need an odd number of nodes to be able to get high availability and i believe you know as a recall the recommended number is 5. so running five postgres databases particularly let's say they're multi-terabyte now that's you know that's a very large footprint even running three i mean i like to recommend three as the number that you should run but you know that's still if you have you know 10 terabytes of data that's at least 30 terabytes that you need plus backups plus you know your you know your various logs and wall and you know other things that you need to store so this will be a lot of data floating around by leveraging the you know the openshift back storage system you actually only need to run two postgres instances to get high availability and that's what's really cool because it does help you lower your footprint but you can still maintain a safe distributed consensus high availability so you know what i showed here was you know i i just showed three nodes here and these are the actual openshift nodes because you you know the other thing you need is a uh you know pg backers repository um you know that's the third component of a high availability and the reason you there's two reasons why we leverage that one of course you need to take backups remember the one take away from this talk is please back up your database two uh pg bacchus is actually used in the self healing process so let's say my primary goes down and you know it sits down for several minutes you know the replica is promoted we want to bring the primary back into the fold or the old primary it's going to be a replica but we need to be able to catch it up to where the primary is so it can you know rejoin you know as a healthy streaming replica we can leverage the pg back rest delta restore feature to efficiently copy the information into uh you know the the failed replica effectively reprovisioning it and bring it up to speed and then tie it to you know tie back to the the primary so so now what does pg backers serve as you know part of you know the disaster recovery system you know it does play a role in high availability as well which is you know which is really cool because you know we can leverage all those efficiencies that are put into it to you know quickly heal uh components within the system so the takeaway from this is that you know the postcode operator does provide high availability um but you know it's leveraging it's leveraging it from you know kubernetes and openshift which you know is like i said you know is really an advantage i guess you know the other the other thing i should point out is that the operator itself is not providing the high availability because that would make it a single point of failure the postgres instances themselves are providing high availability you know they're communicating with kubernetes to make sure that you know to basically determine you know if there needs to be a uh a leader election because you know primary is down or unreachable or you know whatever it may be so disaster recovery how does it work i would say that it works in you know pretty in a pretty cool way and in in what we can support so we can actually you know we support uh you know a multi-repository setup and what i mean by that is that you can push your archives your backups to you know a you know a pvc that's somewhere within your your local openshift environment or you know you can push into s3 or an s3 compatible storage system you know yes to just mean io um and you can actually support both at once um and yeah and that's cool uh because you know you can guarantee that your backups are being pushed to multiple places um additionally you know you're able to schedule backups as well and you know the easiest way to make sure you keep taking backups is to keep scheduling them or to have a schedule with them so they can keep uh being taken however you should also make sure your backups are being taken and you should monitor everything else and you know this this is a feature that um you know that i'm previewing for you know the upcoming operator release that's you know coming out towards the end of the month is uh our our integration with pg monitor so what pg monitor provides is you know a series of grafina dashboards um that can be you know written from you know prometheus database and it basically shows you your overall health your database and you know it can you know it can track things um well checks a variety of things uh and probably to go into that is a full talk in itself but you know you know it's essentially the curated list of the key metrics you need to keep an eye on to detect to try to anticipate issues with your system or you know that will give you the overall healthier system so for instance if you're supposed to take you know you know daily backups and the backups not taken you know that you know the the top bar that you see on the screen is going to go uh red uh you can drill it down to your specific databases or specific pods within your cluster um but you know the idea is that you know you have all these key metrics there take another one replication status uh this helps you detect you know what your replication lag is um if you're using asynchronous replication you know this will be this could become an issue because your lag is too much you know you have you're at risk for data loss if there's a failover event um so you know you do want to keep an eye on that uh you know the other thing the other thing with this is that you know this works with you know all all the upstream components so you know if you already have a prometheus instance that's uh you know set up that you're using to aggregate your metrics you know you can plug in the grafana dashboards and you'll be able to pull from that prometheus instance one thing i do want to show here actually first time we get a drink of water one of the dashboards that i want to show is actually related to getting pod specific metrics so this was an issue that we had run into for a long time was that you know some of the key metrics when using a postgres database are related to your actual body utilization and you know we had a hard time trying to pull these metrics out in a consistent way um you know one of the people at crunchy data joe conway you know he's a post-request committer major contributor um you know you know and you know now a container enthusiast wrote an extension to postgres called pg node mx that can actually reach inside the pod and pull the information out of the various uh c groups and it works for c groups version one and version two and it can pull out you know the specific you know let's call them the os style metrics um from that particular pod so i can see oh you know this you know primary is used has as much disk activity or you know it's currently you know cpu is currently take down 100 of it do i need to raise the limit um and help answer questions like that and this is this is probably you know you're not supposed to pick favorite features but this is the feature i'm most excited for for the for the upcoming release because uh particularly you know as an app excuse me as an application developer um you know this you know this is this is where i always started to when troubleshooting my systems because to me they were the steps that made the most sense if memory utilization is getting out of control you know i knew that you know this problem you know there might be a runaway query and i should look into that same thing with like you know cpu was at 100 let me try to find the process that's doing that oh maybe it's a really poor recursive query i ran which was often the case you know i could then go in and fix it i mean kill the process and then fix it but um yeah disk usage you know oh i'm at like 80 disk usage you know is there an unacknowledged replication slot somewhere that's you know causing you know too many wall logs to be retained and of course you know it's nice to be able to look at the charts and get all this kind of all of this information but it's also good to be alerted to problems and uh we've pulled together our favorite alerts that will tell you you know uh you know what kind of errors are going on you know for instance if high availability is you know if you can't heal something you know something's wrong the cluster isn't accessible and it's inaccessible for a certain period of time um now let's you know let's send a critical alert for that the way triggered this alert originally was i uh i did a a very terrible thing i removed the data directory from postgres and just like totally made it i made the instance unusable and that was good enough to trigger the alert i was actually not running it as a high availability uh since which is why uh it uh was impossible to heal um so that being said uh you know we you know we've added a variety of alerts you know is you're just filling up um it's a replica is the replica you know is it lagging too far behind and so on and so forth and you know we hopefully you know you find these useful and i hope you never have to respond to one of those alerts so um last but not least in terms of you know the walkthrough demo then we'll try you know a live demo um you know adding the ability to administrate your database from a uh a user interface in this case a pg admin uh we actually did uh we actually created an integration where you know you know we're able to tie the postgres user accounts to bg admin and keep them all in sync so that way you can log in jpg you know admin data uh pga admin instance and you know administrate your postgres database so you know we we try to check all the boxes in terms of not only having the the day two administration options available but make it very easy for the people who do the the daily postgres work to be able to interface with their database so let's do this let's create you know this you know high availability cluster with monitoring and connection pooling and you know all these wonderful things with one command because that's really the beauty of all of it of course i need to figure out how to share my screen again in a moment well you're sharing your kitchen nicely so um don't worry about that and thanks and this is uh this is your typical new york city kitchen i have about i think i measured it's like two feet uh wall-to-wall okay uh can you see a terminal window yes we can indeed awesome so so so i so first of all i went ahead um i created a postgres cluster because i think we're going to do um have some fun so let's see so we have a so we have a postgres cluster um let's inspect it a bit so first off uh pico pico's you know we call it the the command line client for um for interfacing with the postgres operator i mean you can interfere with the custom resource definitions directly we find this very useful um to use the command line client and you know there's a bunch of different things uh for instance i can test if my cluster is up i can see the the current disk utilization um i can scale it well first i can i can introspect it i can i can check on the health of my backups take another backup because why not and i can scale it up so yeah so so the pico so the pico client is you can see is a swiss army knife for uh being able to uh handle you know all these different uh operations within uh you know your daily postgres environment so while that's going on to show there's nothing up my sleeve um i had created i'd also created a pg bouncer which i've been connecting to um it is and i have a port forward setup to it you don't need to worry about reading the screen since just a port forward i've also been i also added some data to my postgres database to um you know to basically bootstrap it using you know a tool called a pg bench so what i'm going to do is i am going to start writing data to my database using pg bench and i will also say this is truly a live demo because i'm going to do something i did not test beforehand but i want to see if it works how often does a live demo actually break on you so typically i rehearse them so the probability is very small um this is uh this is uncharted territory but in this case i'm going to try to purposely break it um what i'm going to do is i'm going to kill the primary node while we're running a peachy bench and we're going to see what happens so let me find first off let me find an available terminal window where i can do it i realize the font is probably really small here but what i'm going to do is i'm going to run a delete on the primary node look at that so clearly we have the server connection crash but it reconnected just like that so i'm gonna i'm gonna put this into the background so it keeps running so so what's going on so let's do a test on hippo so we can see that you know our you know our primary pg bouncer are up um and we you know the original primary was this node the hippo blah blah blah blah the new primary is hippo ac gj so the high v ability works that you know we quickly um you know even even though we deleted the pod which you know things can happen in in cube world during openshift world so maybe the pod does get you know rescheduled or deleted but it came back up the interruption was very minimal um that's pretty cool that's yeah you can't can't get much better with high availability on that regard um again you know what i mentioned earlier is that if you want to avoid transaction loss um you need to have synchronous replication set up which is something that you know we do support from the uh the postgres operator here let me stop peachy pinch because those messages are getting tiring if i want to create a cluster with synchronous replication i would use the sync replication flag now i mentioned that i talked about the trade-off with that uh the problem with synchronous replication as it stands today is that um if your replica goes down it's not your primary if your replica goes down technically your primary goes down because you need to guarantee that your rights are getting to the replica now with postgres 10 and beyond this is the corn commit and i believe you know i haven't tried it myself but if you do some of the you know advanced configuration with you know a postgres operator instance you can probably get the quorum commit um i i would also caveat your mileage may vary but the quorum commit you know you can make it a little a little safer to deal with uh synchronous replication in the sense of that you know there's still like the performance penalty uh i mean sorry there isn't like there is a performance penalty still using it because you need to guarantee that your rights get to whatever you know number of postgres instances that there are but um you know it's not it's not necessarily going to cause you know you know a downtime much like just a single replica goes down your primary goes down um but that said if you know there's certainly workloads that need it and you know you're willing to pay the performance hit because you're guaranteeing that your rights are going to be in multiple places i'd also say you know leveraging pg backers the way we do those you know help ensure that your rights get you know pushed out to you know the the backup repository as well um you know there's always a little bit of a delay because you do it you know you need your full right ahead log record written before i know it goes out there some of the other things that uh that are that are cool about this um other than uh showing that high availability actually works is um let me create a p of pg admin so take a few minutes to come up but you know we'll try directly logging into the database via pg admin if i'm able to get all my commands correct because again uncharted territory you know you know maybe to cover a couple of the other cool features um you know you're able to tune your memory settings you know to you know how much you know cpu or ram you want for particular node you know if you decide that you know what you originally you know deployed your cluster with is not um sufficient enough you know you can update it later um what else can you do uh you can add table spaces uh table spaces are good if you want you know there's a variety of reasons where you'd want to use a table space so table space is an additional snow volume that you're attaching to your postgres instance um you know for multiple purposes one it might just be a very large table that or a very large group of tables or very large database that you don't want to attach to your main uh postgres data directory it could be that you know you want to take advantage of you know super fast storage for you know a particular data set it could be that you just have a lot of data and you need to spread it out amongst multiple pvcs now i don't you know the use cases do vary um so you know you're able to you know add table spaces to uh you know operator postgres clusters operator managed plus clusters you can set the size of the pvc that you want um you could do that on a one off basis um what other cool things as you can see there's a lot of flags i do encourage you to read the documentation um i've tried to write the documentation a way that tells a story that gets you going to where you need to go as always because it's open source patches are welcome people certainly do like to leave their opinions on the documentation we also support an easy way of setting up tls connections you know this is you know probably i only lightly touched upon this earlier um keyless of course is a very important part of uh you know being able to you know encrypt uh you know encrypt communications make sure people aren't eavesdropping uh we try to make it as easy as possible uh you just have to provide you know a key cert pair and you're good to go you can force you can and you can also force all connections to be over tls by using uh the tls only option so i've probably stalled i hope long enough uh there should be a pg i've been deployed there is let me see if i can port forward to it so cool so [Music] one moment i'm going to say make sure i actually the port forward correctly to my computer yep alrighty so now i'm going to shift my screen one more time or maybe two more times all right so this is pg admin um it's uh you know as i mentioned it's a popular user interface for interfacing with postgres let me create a new user other user hippo and voila so this is cool because you know this syncs with the administrative dashboard um i can you know very easily log into my databases and see uh forgot which one i uh ran those uh you know the the pg bench tool on mostly because i want to be able to run some queries against it but i forget where i ran it here we go all right so um yeah so here's uh here's uh the uh there's all of our data try it we could try to query against it or i'll probably no because don't remember the syntax you know i can create a table right here insert some data and query it just like that and you know again you know what was really nice is that you know i was able to you know run a command on the operator and then you know have everything you know synchronized into the pg admin interface so you know it's it's it's these things are convenient and it's you know creating ways of you know systematically being able to um you know to be able to you know you know manage a whole wide variety of different uh postgres workloads and needs so i think with that [Music] i will return to the slides so first uh we saw we actually saw this whole thing created um and yeah i mean that's really you know it's really all i have i can you know try to go ahead and you know do more live demos see if see if i can actually break myself um if i try to deploy the monitoring suite i will break myself because i found something you know incorrect in my configuration i stopped and figured quite exactly figured out what it is but if i can get that working it's actually pretty cool to see all the charts and dials are going but yeah i think with that i'm happy happy to take questions hey jonathan's mike here from red hat how are you um how are you doing i'm doing really well i i uh interesting demo it's always it's always good to see see the inner workings of something so cool um but you know i manage relationships with lots and lots of software partners here at red hat and it seems like there's just database vendors coming out of the woodwork everywhere are they all the same meaning like how is how is crunchy uniquely positioned out there to be better than the other 10 database vendors that are all popping up everywhere yeah so you know look there's there's all sorts of different solutions out there to you know but how do i store my data and you know it's certainly something i've seen through the years um and again you know you know you know i'm biased i've been using postgres for 15 years i've deployed been deploying it successfully in production for 15 years deployed and i was running postgresql even before there was replication or like any high availability guarantees and you know i think you know one thing that you know i love about it is that you know it's it's a very strong and healthy open source communities you know similar to the way you know linux is similar to the way kubernetes is and you know one thing that i've liked at crunchy is you know we you know we have adopted the red hat model and you know we found you know and we find that you know you know we like the fact that everything we do is open source and we're able to support open source that you know we can make the upstream you know the best stream so to speak so i think you know you know what you know what i like about crunchy is you know beyond that it's you know how we do focus on you know what i said in this original slide you know beyond open source it's you know adapting to the modern technologies that are out there ensuring that you know we continue to make postgres work you know efficiently on openshift and you know the focus on security you know a lot a lot of what i found data security through the years is that you know everyone wants it but they really only employ it you know to you know to ensure that they're um complying with whatever regulations that they are and if you find that you're not doing things to keep your data safe and secure you can actually run to you know a pr nightmare when you know data is breached and you know data security is a whole topic in itself you know requires spend you know several hours on it and i think you know one thing that you know we try to focus on is not only how to mitigate the risk of threat but you know how to deal with you know what happens you know should there should there be a breach and ensuring that you can minimize you know the overall damage of that so um you know that you know that's what i love that's what i like to say about postgres and crunchy it's you know the fact that you know you know we love the upstream and you know that's you know that's what we want to support thanks uh i got one other question the completely other side of the of the of the table though crunchy data where did you guys come up where'd you guys come up with it we always hear lots of interesting stories about how different companies select their name and you know the the hidden meanings of why of the name and so forth but but why why crunchy white crunchy data so interesting i thought you would have asked about the hippo because i think that's you know that's the the bigger urban legend well that's right i think that's that's part of it it's like you know there is a hippo crunchy how does that work yeah so so there's many so there's many urban legends around this um i'll uh you know i think you know crunchy data actually came out of a meeting where uh someone uh described you know what uh our founders are working on is very crunchy so uh that's uh you know that you know maybe maybe the urban legend for that's a little less but the hippo has many stories um and you know the one that i choose to believe is that uh you know hippo you know hippos are you know fiercely protective of their bordering hole and you know given crunchy's data roots are in you know the security space you know having you know having an animal that is you know very protective of you know your core asset you know the watering hole your data lake if you will um you know it gives a strong you know sense of assurance that you know we're looking after the integrity and safety of your data you know another one that i heard is that you know a hippo you know when you see a hippo in the water you know you only see its eyes but you know there's a whole lot going on underneath the water and you know that's sort of you know like a principle of security is that you might only see one layer but you know there's a lot of other layers going on um there's at least five other urban legends i've heard around the name uh but you know those are the two that i particularly like gotcha okay so what what's next what do you uh i know we're just about out of time um how do people get in touch with you and do they just send an email to jonathan or you know call your house phone yeah yeah uh well fortunately as you can you know see by oh i don't know if you're still sharing my kitchen or not uh we're fortunate that that house phone is does not is not actually connected to an outside line but yeah email me um you know jonathan.cats at countrydata.com twitter me at jcat05 um go to the website crunchydata.com see what we're doing plenty of people come to uh the actual github repo crunchydata slash postgres uh certainly get a slew of questions there but yeah you know that's the you know that's the easiest to what's next if you want to get in touch okay and there was there was another question jonathan just i know sorry we're over time but um crd yeah yeah shannon was asking about you know can you share your cr by any chance and i was like i'm not sure what cr is but shannon just it's the it's the uh that is what the uh it's what you use to create the actual pods yeah so the see so all this is in the documentation we have a section just on the custom resource um so yeah the series the series the definition itself we actually we uh in our documentation we have examples of how you can create uh custom resources if you i mean if you i could show what one looks like real quick uh one that's already created yeah sorry uh if you have just the yamo it's fine just kind of um for the learning purpose to spin it up it's kind of easy to have a sample um when you mention that uh documentation is at the access.crunchydata.com document right um it is oh the github one it's the github one well the one github one get well github it's it's in the access.countrydata.com documentation okay yeah i found that github won and pointed to the quenchy data but i would just kind of want to get a quick start and grab your cr and then i can just pop them on and then try yeah yep so this is so this is an example cr this is the cluster that i was demoing from and it is in the uh github which is what you're saying correct well the correct we it's in our documentation so we so we have an example for how you can create a postgres cluster using a custom resource okay and one of the reasons why with the command line is you know it makes a little bit easier um you know it's much easier to type he go create cluster hippo then fill out the cr but that said you know we do we do provide examples for how to do it uh from you know using just the the custom resource okay cool thank you welcome all right then i think we came to the end of our hour here and uh it was a bit of a tour to force and and i really appreciate that uh thank you so much jonathan we'll have crunchy back again in a little bit time uh i think next month we'll have some of the folks doing some spatial gis but this has been an awesome session so thanks michael again for arranging this and jonathan for sharing your kitchen with us yeah happy happy to invite people into our kitchen that we can only fit so many [Music] you
Info
Channel: OpenShift
Views: 1,190
Rating: 5 out of 5
Keywords:
Id: 9jbR9lZuSU0
Channel Id: undefined
Length: 63min 16sec (3796 seconds)
Published: Wed Sep 16 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.