Connecting Apache Cassandra™ and Kubernetes

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
there we go can you see that one yes all right all right so we're gonna try this again all right here is the first slide uh talking about what's coming so all right and you see those cedric we're good we're good all right wonderful all right so um so this is cedric was saying this is part of a three week series um so we are in the second week for connecting kubernetes and apache cassandra um so welcome to this particular one and then uh next week we will be taking application to the cloud by taking the application deploying it up in gke and google along with cassandra so we're going to go to the next level and as far as the housekeeping that we've been trying to do now for the last five minutes um so if you yeah that was that was really strange what happened there so if you take a look over at the top left-hand corner uh the first thing is our youtube channel now most of you who are watching us today are probably watching you or watching us there at data stacks devs um so if you're you know curious about how to get to future content anything like that um you know then keep an eye there go subscribe if you ring the bell in the channel then you will be notified anytime we have our upcoming workshops and we do we do multiple a week right we have multiple things going on at any given time um over on the bottom left-hand side you will see the bitly link to cassandra dash workshop that brings you to our discord channel so for those of you who are with us on youtube today we're monitoring that channel we will interact with you there but just know that once we stop the stream that stops right so you won't be able to talk to us anymore there that's where the discord really comes in so it's really good to get to the discord channel that way the conversations persist you get to become part of a community that's like seven thousand strong right now cassandra folks um so we really do encourage you to go to the discord channel and that way too if you ever have questions later on you can totally talk to us there you can't do that on youtube up on the top right hand side uh we have the runtime cedric you want to talk to that what we're going to do with the demo piece today yes so sure um today it will be mostly me doing a demo uh because there is a fair amount of software to install to make the kubernetes cluster working but after the session if you want to do the demo on your own all the instructions are available on github the link is in the description below or we can spawn a cloud instance for you if you send an email to jackfryer or alex polishnef those emails are were part of the invite today and alex will also provide his email on the youtube chat and with the cloud instance you will have most things already installed but you know we do not we do not want to spawn 1000 instance for nothing so today it would be generally me doing a demo explaining things you asking questions and after the session you will have plenty of time to do the demo and download all the tools needed to to do the demo and that's even another reason why we encourage you to go to our discord channel and register there because that way if you're doing this homework or if you're doing the stuff after the fact and you've got questions you can just hit us up on discord and we can help you out let's get into the actual content here all right and cedric you mentioned this earlier about for those folks who want to request a cloud instance for some homework and such uh to send an email to alex yes i i asked to send email to alex so this is the email to have an instance but also on the github repo link below when you start doing the exercise there is a link to request some instance uh but you know asking for alex is better you will send the you know generate instance pay based on how many requests we have tonight and send the instance for everyone and those will be available for 24 hours because you know as we told you there is a lot going on on those instances and it's like 2x extra large on rwa so that's pretty big to make that work and something just to point out i just noticed in the youtube chat that alex is having discussion uh with someone about docker so yeah you know this material does take some knowledge in docker and cedric's going to explain some of that as well but as alex is pointing out in the chat we actually have a full series on docker alone and alex has dropped some links there i believe the playlist there yep he just dropped the playlist in youtube if you wouldn't mind dropping that in discord as well alex um that is a it's a wonderful series that alex did in our monday learning series that actually steps through docker and everything so if you're new to docker and you don't have much knowledge there go check that out i think that's really going to be worth your time thank you alex okay and then we have our hackathon this one's awesome this is huge yeah cedric what do you want on this one no no no um yeah so i i think the link is also in the description below but starting end of december we organized acaton we'd call it data stacks led accaton it will end next week end of next week so you still have time to to go there and register to build a cool app on top of astra that will be used last week and you know the the first prize for this accountant is ten thousand dollars yeah okay ten thousand dollars that's that's price and you still have one week to enter free to participate uh build a small app or bigger app you know make your bigger chance to win and you know fill all the boxes and it's not only one prize ten thousand there are dozens and dozens of uh smaller prices like who we who have the better documentation who are the most innovative um who bring the more um interesting ideas what does the bad quote the best code quality all these small prizes are there as well each time five and uh 500 dollars so yeah this is a catan all about cash so you still have time to register the link is there also in the youtube description and yeah please submit some some some coding for us to review and per the question ram christianity that i see over there on youtube can the hackathon be extended no the 18th is the deadline however if you know since we're getting closer and closer to the end of that if you you know be able to catch one of those surprises or something or maybe you never know maybe you knock something out of the park and you win that grand prize so i would definitely take a look um but no the hackathon cannot be extended so that eight that deadline is absolute um and cedric here's a question from grish does this workshop help the cassandra ops certification i wonder if they're talking about the admin certification okay yeah we will discuss the certification in the end but as of today there are two certifications the developer certification and the admin certification soon beginning of march we will launch the third certification will be cassandra in kubernetes so it will be mostly useful for this third certification okay when we'll when we will give you venture at the end of the three weeks series you can still read him the code to try the third certification available in march so is it useful for admin yes and no we will cover some topics that are part of the admin like the repair compaction we can still discuss right but it's very focused on kubernetes which is not part of the admin satisfaction yet all right thank you sir all right moving on moving right along okay so just jumping into some cassandra reminders right and we definitely understand there are some folks that have varying levels of cassandra here whether there's no you know no prior cassandra knowledge of folks who've been using cassandra for a while but we like to start off with just a little bit of reminders of what it is the first thing is cassandra is a nosql distributed database right so what does that actually mean well if you compare that to say like a relational database or other databases that generally they they are a single instance right if i have oracle or postgres or my sql yes you can have master slave setup and replication that way but generally it's going to be a single instance the single machine or whatever that's going to power the database in the case of something like cassandra it is built to be a distributed database and that works by each by nodes right so you're going to have individual nodes that are going to power your cassandra database so in this particular case you see just one node here i could run cassandra like this if i wanted to that would kind of defeat the purpose of having a distributed database but you could but really what happens as is you build up a set of nodes right each one of these nodes will actually have cassandra running on it and have its own capability now each one note as you saw the little text that shot up there in the top right hand corner um generally you know we're talking a capacity per node about maybe two to four terabytes of data per node um thorough put it it's a lot we're talking like thousands of transactions a second per core right the reason why we put lots and not one particular hard number it depends it depends on the hardware it depends on how each one of these nodes is set up but generally speaking right we're talking thousands of transactions per second per core now these nodes all communicate through um a protocol called gossip right so this is how they understand you know like is what are my peers doing is the node up or is it down did a new node come into my cluster right this is how they communicate and cassandra is by nature a peer-to-peer system so as you add and remove nodes the system will actually automatically respond to this right you just configure them in and it'll automatically um you know handle the addition or the removal or again if things are up and down and that is happening over gossip now when you combine a set of nodes right like what we see here this is called a data center or a ring so most of the time when you're working with cassandra you're going to be working with it at you know you've got the cluster itself the cluster has a set of data centers that are in it but you're going to work with let's say a particular data center so this ring that you see here comprises my database right i've one entry point if i were like making requests in and out i can actually make requests in and out to any particular node here right and then those requests will make they'll actually automatically be replicated to the other nodes that are relevant and such but really what i want you to take away here is that circle that you see that ring that is my database regardless of the underlying nodes are distributed or not now then how is data distributed around we get this question a lot right um i've seen so many times the question come through well wait if i then put some data in one node does that just go to all my nodes or you know like how does it know which nodes to go to and everything like that so what happens is in cassandra data is partitioned up by something well they're called partitions right so in any particular table i'm going to have partitions how do i partition the data well when you create your tables you're going to set something called a partition key right so if you notice in real small text there in the lower right hand corner if you look at that table to the right you see we have country city and population notice at the very bottom of country it says partition key all we're saying here is we've decided for this table we're going to partition by country right so what does that mean well as data is coming into the system as data is being written in that data is automatically going to be distributed around my nodes based off of what partitions the data is in and you'll notice that data that has like partitions are being stored in the same place now to be clear this is a very simplistic view this is what i would call a replication factor of one i'm only making one copy of any particular partition so something something we can get into right usually in cassandra your default standard would be more like a replication factor of three i'll make three copies and that has everything to do with robustness that way i can actually lose multiple nodes i still have my data it has to do with performance because now i have multiple nodes that can facilitate that data that's getting a little further than where we need to be right now what's important to understand is that again as data is coming in the data is automatically being distributed around my cluster per its partition so if you look at the top there you see i have two rows of data for the usa partition and so if i move to the right out two rows of data for the france partition so on and so forth all right before i move on i saw some questions come through let me take a look at what these are okay okay they it looks like they got an answer and they weren't anything technical or anything like that wonderful you know i saw them kicking off in my my peripheral vision and i was like i think i need to answer something okay yeah no no no but you know even with the slide before you immediately understood that select stuff from a table it's a very bad idea let's imagine you do have a thousand nodes yes select star from the table will take you minutes yes yeah imagine imagine that right to cedric's point that if you had a thousand nodes or if you have like i like to always use the petabyte thing because cassandra is totally a petabyte database imagine you've got that much data in there and you do a select star with no filters at all nothing in your where clause you're literally going and grabbing the data from all the partitions and all those nodes across your cluster right um so that's generally considered an anti-pattern in cassandra okay so now if you look at the how cassandra replicates data this is this to me is actually one of the coolest features of cassandra so cassandra has inherent replication when you create your key space which by the way is equivalent to like a database or a schema in a relational database when you create your key space the thing that holds your tables one of the things you also do is you set your replication all the replication says is how many data centers do i have it could be one or more and then you know what's my replication factor you know the standard is three so once i've done that one thing then my data replication throughout all of my cluster and my data centers is automatically handled so what does that mean if you take a look at the left-hand side where it says geographic distribution see there i have three rings that are connected by those arrows right those are three different data centers but again those are all in a single cluster so that one cluster is actually my database if i were to say write some data over there on the west coast to any one of those nodes that data would automatically be replicated over to all the other data centers and say my users over in india or in china would be able to get that data at the speed of wire right so as long as it takes that data to go over the speed of wire to get from the initial node over to the other nodes they now have that data available there's no etl there's no manual file copies or anything the data is being automatically replicated and that is being handled from that one thing you do when you create your key space so that's it's it's awesome because it's so simple and cassandra just handles all the hard work for you and that's on reads or writes right so again if i write some data to any node that you see there those little blue nodes that you see there then any other node will be able to see that data right and then of course if i go to read data regardless of where it was written then i can just read that right as way as well now on the right hand side where you see hybrid cloud and multi-cloud this is another kind of key thing about cassandra cassandra's deployment agnostic it doesn't care where you put it right so you can install install it on-prem you can put it in a cloud provider you can put it on a mix of on-prem and cloud providers what we call hybrid or multi-cloud depending on the setup so if i had on-premise in google cloud that would be like hybrid but if i had google cloud azure aws that's multi and i can do that interchangeably right you might also ask why would i do that you know maybe maybe i need to burst up maybe i have an on-premise installation but it's black friday events right or something like that and i need to burst up for a period of time and then come back down and i don't want to have to physically procure all of that on-prem hardware that's a great use for being able to burst up into say google cloud or aws or something like that right and then go back down or maybe there's some capability like i want to leverage some of the machine learning capabilities that's on one of the cloud providers that doesn't exist on the others right i can easily just add a data center update my replication to replicate that data center now that data is available there on that system we've even seen cases where folks will do it just pure out of leveraging being able to have leverage power uh with the major cloud providers because if i can very easily say move to azure over google cloud or something like that and i can do it at the flip of a switch then it gives me leverage power or maybe i'm hedging against potential failures the cloud is a wonderful resource but if you've watched in the last what three years i think every single one of these major cloud providers have had major outages at some point so we've seen a lot of cases where folks will actually spread their cluster over multiple cloud providers as a way to protect them in case there's a major outage autism one that can just easily move over to the other one so this opens up all sorts of possibilities and again this is all powered by that inherent replication that is set literally when you set the replication your key space all right cedric i don't know if there's anything you want to add to that i saw a set of questions come through that i was going to take a look at here now i think you made it right one single technology and you can have a database crossing multiple clouds and multiple regions you don't have you to worry about any replication so you know if you working on a mobile app people can have the data pretty close to them reduce latencies and you know you don't have to to worry the same things you may do with your code using cdn to even the job stack in javascript world to make the code close to the user we can do the same with the data using kasomra technology yeah it looks like alex is all over the questions here on youtube um i think he's answered all the things that i saw yeah he's on fire today um i'm pretty darn sure he has yep i think alex answered them all so wonderful alex thank you so much for that all right great all right so just moving on a little bit so if you look at some of the use cases right that are available for cassandra um scalability is definitely one of the big ones cassandra is known you know it was purpose built to be able to horizontally scale indefinitely right so if you compare that to like vertically scaling where um with like and by the way i'm going to reference some of the relational dbs i'm not trying to knock on them at all it's a it's a different tool for a different need relational databases were born in a totally different era they are general purpose databases and you can vertically scare them right you can add more cpu and add faster disk and add more ram um but as we like to joke in the oracle world you can scale up until you run out of money right at some point you hit a ceiling and you you just can't get there anymore where cassandra was built to be able to horizontally scale out on commodity hardware right so scalability is definitely one of the strong use cases here um so what this translates into are let's say you're doing event streaming or iot internet of things right um sensor networks is a great one we see a lot of cassandra usage for things like various sensors whether you know those are weather sensors or whatever imagine the amount of data you have coming in from all these different sensors that are all over the place cassandra makes a really good system for this and what's cool again is as your thorough put needs increase you can just increase the size of your cluster and then handle that increase thorough put availability is another big one for cassandra um so cassandra was built originally when it was first built like over 10 years ago now to be able to handle failures well and gracefully when they happen because they are guaranteed to happen as a matter of fact if you look at any distributed system out there as you increase the number of nodes right you increase the number of potential failures not just cassandra that's any system at all so cassandra was built and there are a ton of like robustness and like defense mechanisms that were built in to handle what happens if a node goes down what happens if the network is severed you know all sorts of things um so if you have a case where you've got like a mission critical app you need to ensure that your database is always available that's a really good fit for um cassandra as a matter of fact from the always-on it's called the always-on database right the reason why is because cassandra is built in such a way that in some configurations you could lose two-thirds of your capability and still have an available database right still be able to facilitate reads and writes and requests and everything and it's amazing we're not doing it today but it's it's really fun for those of you who've played with multi-data center setups if you're replicating the multi data centers and you can very clearly see cases you could just take down a whole data center and your other data center is there and it can still facilitate that stuff um so it's really good for those types of use cases of course for distributed i see some of the questions that alex is already on top of in youtube talking about the distributed nature of things and from the one graphic we showed a minute ago that geographic one you could totally have a global database with multiple data centers in different places in the world where this really comes into play is imagine today just by de facto standard we almost all of us we have to create global apps why because our users aren't just in one data center in the us or something like that right um they're in china they're in australia they're in the mia whatever they're all over the place and there's too much latency if i have one database and a data center over here in the us then my users over in the other side of the world are going to incur way too much latency in the response times and such so what do you do with this well you put data centers out in those different places you put the data where they are and again you use cassandra's inherent replication to push that data around for you and where this really comes into play is you see there in the middle that compliance or gdpr that's a really good example of something where i might have a data center in the u.s that doesn't need to worry about gdpr but me i sure do right so i can have multiple data centers there and then i can have different security configs on each and such so it really opens up a lot of possibilities there and especially for the customer experience you see in the very right hand side again putting the data close to where the actual users are so they get what they feel is that instantaneous response and and not um the huge response and lag times from having to go latency over wire across the world and then finally with the cloud native piece um it's just a good fit right the fact that cassandra itself is cloud agnostic you can put it on any cloud provider you're not locked into a particular vendor or anything like that it automatically allows that flexibility where you can deploy and multiple multiple configurations class pro cloud providers or even with a single one in your on-prem install all right let me take a look at um the content or questions okay we're good all right so just a couple things in vocabulary um so we've kind of talked about this a little bit already um so at the outer edge there you see the cluster that's essentially this is this is your database right that's the cluster level within the cluster you're going to have your data centers and then within your data centers you have nodes which we've already talked about and racks racks are subsets of nodes within a data center right um those come into some more special use cases but they are there um uh to you know to kind of offer up even more kind of options in the ways that you're handling replication and such we're not really going to get into that today what's really more important is just to understand cluster is your database the data centers right um are the individual rings of nodes if you will and then you have the nodes within them all right with that that's our primer on cassandra just to get there i'm checking out the questions and i see that alex again has been all over them so i think we're good you good to go go on here cedric yes sure so this time let me try to share my screen you want to try to share again oh no let's no no let's let's let's skip like that you know troubleshooting you will move you uh with me through the slide okay so that was it for the cassandra part uh the important things is for you to understand that cassandra is a distributed database and so to deploy cassandra you will have multiple a module with multiple parts to deploy to make that work it's a distributed system all right moving to containers okay uh what we used to do in the past and maybe some still using virtual machine you want to have multiple operating system on the single machine and the idea is always to do as much as possible with the same infrastructures okay so virtual machines are pretty cool it's a huge file with embedded an operating system where you can deploy your app and that is pretty easy to copy just to replicate the environment is it's as easy as replicating a file right and having a single hypervisor driving this virtual machine now on the single virtual physical server you can have multiple virtual machine this is the beauty of virtualization but here the issue immediately you might notice is look at the stack how big it is the issue is in each virtual machine you do have the operating system the big square in the middle and you want to avoid having multiple i don't know windows or linux running in the same machine because it's taking some cpu and resources and ram and if you could have a single operating system and having application running on top of them that would be cool right and this is where containers came in place so containers are simply linux or unix processes that's very simple it's based on lxc but never mind the idea here the id here is really to have now you do only have a single operating system on top of it you will crea you will have the blue rectangle here it's called document but this is okay a container engine it could be docker or it could be new containers engines like container d or podman which are more and more relevant solution and in the in the docker engine then you will deploy instances so and instances are what we call containers and it's pretty easy as well it's easy to make scale copy containers multiple time replicated scale out and now with the engine from the container you still have resource limitation enough security to avoid the container to you know exchange some data or security information where they should not have these authorization levels so it's all about doing more with the same physical infrastructure and no doubt it has so much success so next slide please david so docker is still the leading implementation of container engine what we call a docker engine so inside um the docker engine so the current chinese is the runtime with a docker demand and using some docker command line you it will interact with a cli command line interface and send some command to the docker engine to the docker demon so your command docker command line is simply a client sending some uh request to the engine and then journey could be interfaces with the rest api is all based on a rest api so simply put no more than that you type command and you send commands through the rest api to the current jar and then the current join will start or stop or interact with running containers okay next slide so what is the lifecycle well i saw just in the question in the beginning that some of you are not really familiar with docker so again some vocabulary some vocabulary so when you do have your app and you are pretty happy with uh the code and the tools you need to run your app you will create a snapshot a photo of your environment and you will create what we call a docker image to do so you are using what we call a docker file which is a set of instruction a recipe where you say okay i will start with a linux then i will install i don't know python because i need python to make my app run then i will copy paste my code inside this box and i will add a command line to start my app and with that i'm happy with my image and i will ask docker build docker bill will run this recipe to create you an image once the image i mean the image is immutable it won't change if you if you are a developer and working with object related programming language the image is like the class okay something static that won't change and it's just there as a template this image you can push this image to a registry so a repository where everybody else can use the same image or you can pull so you will download the image locally just to be able to start to be able to start some environment based on this image so when you execute docker run you will use that image to create a container so a container is like an instance okay now i think you saw images and which are templates and containers that you will start based on that image okay and now you got the client the host and the registry where we will store all the docker images but again this is short form introduction to docker just for you to get the keyword like image and containers but on this channel we do have i think four weeks so eight hours on docker so please four weeks four weeks worth yeah alex dropped it earlier um and alex if you wouldn't mind if you've got the link handy to drop that one again to your docker series um that'd be really useful and for those of you who haven't used docker before from a development standpoint docker is actually pretty awesome it it will allow you to spin up as a matter of fact cassandra's undocker is an example um instead of you know installing it fully on your own and having to go and configure uh all your paths and all that kind of stuff you can just spin up a docker container in minutes right um and there are so many different applications and services that are available through docker so if you haven't checked it out i think it's a good idea exactly you don't want to install the tools anymore on your machine you don't need to anymore yeah and it will download the image and you can start working right away and the beauty of an image is everything is pre-packaged and ready to start you don't have to struggle about starting command or anything okay next slide so this is our images that we we could uh use to create containers so cassandra is no exception the cassandra developer teams created a cassandra docker image and push that docker image to what we call the docker registry or docker hub so docker dot io and there you will see the public images to everyone so if you simply say like the command you saw here you see here docker run cassandra it will pull the image so it will download cassandra and it will start cassandra like that and now you might realize okay but now i want to provide some configuration to this image and this image is static it's a template so how it works so in the docker run command we will we will use a set of parameter to explain docker how to run this image so you do have dash dash name just provide a name of for the container but hey you cannot have two containers running with the same name so yeah that's just not something you use or very often on production you will let tokyo pick a random name because you i mean you don't care about the name dash d telling you okay run that as a demon i don't care i don't want to get the command line on in this container just run that on the back end dash e for environment variables okay so here key equals value and so here is one key available in the cassandra configuration but if you go to the docker official image on eb you see dozens and dozens of keys you can provide to override the default configuration of cassandra dash p is for port which port do you want to open by default the containers running is not you cannot reach this container everything is just locked it's for security reason so if you say dash p 77000 now when the you are running this container on your host and you go to localhost 7000 then you will rely on 7000 in the container and so you can access services in container so 3 000 for node 8080 for java most of the time so seven thousand is for inter communication in customer nodes and 1942 is for a driver to connect to cassandra this is exactly what we did last week and dash v is for volume i didn't tell you but when you start a container it's a stateless object and so if you kill that container it's done nothing remaining so for database is not really what you want right but you want to mount some volume so some disc you do have on the host that you will mount to the running container and now if you stop or kill the container the data is still available on disk and you when you restart your container the data will bring back to your container so first important things here is to work with stateful application so applications that do have a state that need to save data and if you stop and restart you want your data back in docker you will use what we call volumes all right so you might say okay that's docker command is already pretty long now if i add i don't know 10 volumes and 20 environment variables it's just a nightmare right it will be a common so long it's hard to maintain hard to how to work with so docker provide on top of the default docker cli docker compose so david if you move to next slide now in docker compose you can have a yaml file where you will define oh i would like to run these container for instance cassandra seed here in green use the image cassandra 3.11.6 so okay this is the template you should use provide the container name if you want to do the port mapping environment variable exposition and volume mounting okay but now you can define everything as a yaml and also you can introduce some dependencies between containers in cassandra there are different flavors of nodes onenote is used as a seed and when you want to scale make yours your cassandra ring scale out you need to provide the seed ip so this is how you do that with docker compose and using this file you will start two node cluster one is a seed one is a node but you know even with that to say okay you can introduce some dependency between the node but what happened with one node fail how do i recover or how do i make one node scale if i now i want two instance of cassandra node how do you do well that's technically possible by doing docker compose dash dash scale cast on handle equals two as provided on the top left hand corner but what about the disk how do you tell that guy that the second node should not use the same folder to save data it's not easy it's really not easy so docker compose is super cool to work on development to run multiple containers in your laptop and make those containers talk to each other but if you if you go to next slide david you might say that okay for cassandra this docker compose seems to be pretty appealing cassandra in docker okay use the docker image from cassandra provide the proper parameter you're good to go but the reality is if you click next it's simply not enough because you cannot handle all the situation if one knows goes down how restart the node if you know if you need to scale out or there are multiple situations docker compose cannot orchestrate the containers all right so moving on what we need then is a container orchestrator and the construct the container orchestrator no surprise will be kubernetes so if you go to next slide so definition for kubernetes if you go to the kubernetes.io is kubernetes is an open source system for automating deployment scaling management of containerized application with docker compose you can define some dependencies between the container but it's not enough you really want to introduce some management okay how much time do i do i make my containers number two wait just to be sure that container one has been properly started and is ready to get my requested so introduce more advanced dependencies between the containers and if one container goes down it's all about kubernetes so if you go to next slide you see a landscape of features provided by kubernetes we already told about horizontal scaling i want this container to scale as much as needed but hey now i do i don't know you do have your micro service scale scale out and available three times how do you do load balancing between your three instance of your service now how do you do that well kubernetes provides you already the load balancing and now you can give to your clients only a load balance url and if you need to scale out and add multiple instances of your app clients don't care they simply use the load balancing url if a node goes down if your application goes down human test will automatically detect that this container is down and it will try it will retry and restart the contents i use a lot the word containers on purpose we just came from tokyo explaining but kumantes has its own vocabulary so let's see what it is because we do have the vocabulary for casama let's discuss the vocabulary for kubernetes and make those two words match okay guess what kubernetes is a distributed system as well but unless uh cassandra there is a master okay and this master is called qrs control plane and in a kubernetes cluster you do have a master that can be replicated for resilience reason and multiple nodes but also we call multiple workers multiple slaves and so when you interact with kubernetes like docker you will use a cli so instead of docker run you will do cube ctl for kubernetes command cube ctl provide a command and this command will be passed to the rest api part of the kubernetes control plane so when you set up kubernetes on your machine you will map your cubesat command with an existing running kubernetes cluster it's simple as that all right so if you open now on next slide if you now open the master i don't want to go too much into the detail but here the master is the brain of the system and so first this guy should have some control manager controller loop i need to check all the pieces to see if everybody's is elsie and if it's not lc i detect that one container is down then i will talk to my buddy the scheduler these guys one is down please create another one okay the api server i told you this is the one getting the command and there is some storage some way to store the state of your system in its etc it's a key value database okay so enough don't go too much into the detail for the master in kubernetes let's see about the slave or the minion or the worker so first they do have uh a coup proxy so this um proxy help to interact with the master just to get all the commands um with kubelet which is you know an attachment but nevermind what is important here to know is in this worker you will have multiple pods okay this whole those are in orange so now we keep we stop talking about containers based on image remember we talk about pods okay the pod is the orange box over here inside a pod you can have one two multiple containers okay the the one we just spoke together until now docker containers container d containers podman containers well in the pod you can have one two multiple containers not only one and you know especially with tools that is not http ready or do not expose any interface to interact within with it you want to provide another container inside the same pod just to help this guy expose to the outside what's happening so i don't know you do have old technology only exposing gmx all java technology to expose metrics and in kubernetes is all about http web server you would create what we call a sidecar which is a running container this guy will only you know plug to gmx and expose http interface okay so inside the pod multiple containers and what you will scale with kubernetes are pods so you put in the same pod components containers that need to scale together and work tightly one to each other okay so it i mean it's a lot of words it's a lot of vocabulary but hey um at the same time we just give you only the vocabulary you need to understand what's coming in how to deploy customer in kubernetes so after these infrastructures and all the architecture of kubernetes cluster let's dig into what you actually create in cumulative so pod is one thing but it's not the only thing you create in humanities everything you create in kubernetes is called a resource and what i will show you that there are multiple flavors of resources first is what we call a namespace and as the name stated it's a way to isolate a bunch of resources component inside the same name so the best practice is if you build an application that needs multiple code you tend to put them in the same name space and like that you can interact with them providing the namespace and do some group operation for instance and if you don't need your app anymore you can totally drop the namespace and all the pods related to these names will be dropped as well so you might notice that these namespace are really logical if i create a namespace it will live on all the workers all the slaves all the minion okay it's really now everything that we create as a resource will be available and you need to to you don't need to care about which worker doesn't working okay name spicy one thing what do we have next oh um you know i'm going one resources after the others do we have any question that alex is not able to keep up so far he's keeping up but he actually there was one here um from dina haran in youtube is it possible to build the cassandra cluster to spread across multiple kubernetes clusters it's not an easy one [Music] actually no as of today and very few stateful applications will be able to work across multiple kubernetes cluster because you need to federated those clusters and technologies to federate those clusters are not mature yet okay some as some are some just released last year um i didn't see any article that it's production ready so i would say be careful you need to have the center brain you know you you can have the same kubernetes cluster running in you know multiple locations having workers in multiple locations but it still needs to be part of a single test as far as i know yeah as a matter of fact alex is um making comment here about that i'm i'm yeah alex just said it in youtube it's possible without the cast operator so and it sounds like there is a configuration point there um i don't know how much work that entails maybe alex can give us some insight into that um if we're talking about outside of the cast operator and you know using you know expanding across multiple kubernetes clusters we'll wait for it though i'll see if he says something let me know you might see yeah you might tell it's a peer-to-peer communication and you don't have a brain that will told you i need to spawn nodes here or here but anyway moving on so you know about a pod you know about the name space um so you do have your state full a stateless application like your micro service no database nothing just simply start a bunch of pod and you're good to go but what if you do have state you need to save the data as for occur with volumes well on kubernetes those are called persistent volume okay so persistent volume are mount or you know associated with pots just for the pot to be able to access these persistent volume with persistent volume claim which is just access controls list for you to access the volume what kind of volume well which is pretty cool with kubernetes it's declarative you can define oh i do have some disk with ssd very fast and i will create something called a storage class telling that okay this storage class is disk for cassandra it will be one terabyte large and use ssd technology you want big enough using ssd to be as fast as possible but for for others components in time the same commander's cluster you want you need some low speed disk mechanical drives and so you can have another storage class so when you start something that needs to be stateful with the state with data storage it's all about data on kubernetes you will define volume persistent volume and when you do that you will tell which storage place you want to use and something i just want to add here i think this is actually one of the coolest features especially when using the cass operator which we're going to get to right with kubernetes is that with persistent volumes in within kubernetes you could lose the pod in that that contains your consensus instance and just spin up another one and it's in a place and it can just continue on from where you were before like almost nothing happened because you have that persistent volume and the data for your your database the node is still there right it's it's i think that's really powerful because i mean um you know when you're doing updates or if for some reason you lose a pod or something like that then it just it just heals itself right so i i've always thought that that was a really cool feature of this whole setup yes and yeah you know this is a strength cassandra the data is available on multiple nodes and you know if you lose any of the node it's not a big deal service is still available and you win yeah oh i'm sorry no go ahead and continue sorry no no no there's a question from uh satya in uh youtube saying should we be using static pv or dynamic pvc is fine that's a very good question yeah and i will i will come to that just the slide after that um because as of today uh in the way we propose to this to deploy customer and kubernetes we are using a stateful set so i think david you can move to next slicing sts is just the next slide uh maybe not this one next let's see okay stateful set so it's a stateful set it's simply a way to say oh remember my pod and the disc needed for this pod i will create a stateful set just to map the two of us together and so when you create a cassandra data center you may have multiple nodes and so you may start multiple pods and it's time you add some persistent volume the way it is coded as of now it's a dynamic pv it's provided when you initiate the state full set and as the persistent volume is considered to be close to the pod you know if you lose the pod you can consider the disk as well lost because it is a way to work with cassandra if you lose a node okay that's not a big deal service is still there and if you're not able to restore your the losing node you can simply add a new node to the to the cassandra cluster and the data will be russia fell among all the all the nodes and so because cassandra addock provide these features as distributing the data we are using ephemeral storage and dynamic persistent volume but that's for today i mean during this year later we are working on we try to partners with providers like maya data and obs just to do some static persistent volume you know preempt some disk just to be able to recover and do not let cassandra do is reshuffling and maybe gain sometimes if some even goes down and you want to restart this node very quickly so as of now fm real dynamic but more coming soon so here's a really neat question i think you're going to like this one cedric uh by none actual user none just go for null or empty right but uh if pod goes down and new pod kicks in hinted handoff and gossip still happens i suppose the new node would have a new ip uh probably a new ip so it will be considered a new a new pod so it will bootstrap and we will push of all the nodes uh because ip are allocated dynamically uh i don't think the intent on deaf will will will be there if you totally you know shut down the node if the node commands line you know for network issue something like that so the node is still leaving but you cannot contact him when the network comes back the gossiping is still there and it will reconciliate and the incident of will happen but with bird and kubernetes it's logic like oh it's not working i kill it sometimes we just uh use the association pet versus cattle so pets you give him a name you you know you cherish your pet you love your pet cattle you don't care you're just giving numbers and if it does it dies you just create a new one and so it's the same logic so if really the pod is is not healthy you will kill it create another one if it if it was a matter of network issue it will come back uh and it will simply do the antenna and recontinue and i've got one more here for you uh from panagoyas says is it possible to resize the cassandra storage of a node after deployment uh-huh um thinking about myself that's a good one yeah i yeah i yeah i don't know if you could change the size and then just apply the configuration right i wouldn't reduce it because no no no no no you want to expand yes yes yes storage class that's right you know what you know i would maybe i would you know back up create a new one bigger and and restore uh i i'm not sure you can edit a storage class at one time uh because what you will do you will edit your kubernetes yaml with the new storage class bigger then you will reapply your settings it will spawn new pv persistent volume with the larger site what do you do then you need some extra tools like medusa to do the restore of your node or it will be considered a new node and start doing all the data reshuffling but you know i also count on alex on the youtube somewhere just to confirm that that behavior but you know we could take that question well and then uh diana uh over on youtube is saying the recent version the most recent version of kubernetes allows resizing your pvc um but yeah and and uh panic lotus is saying um that uh they'll look it up so cool all right thank you cool i i didn't say any any stupid things like cool right yeah okay yeah i still have some credits if i say some tip stupid now everything is fall apart it's funny this guy just uh moving head on the screen okay uh so you now have uh the term for uh cassandra what is a cluster rack node and you do have the same for kubernetes so our mission is to make that distributed system fit into kubernetes as is so what you would do just thinking what you would do to do that okay we might know that a node could be a pod i need some persistent volume to mount some data i probably need some configuration configuration i could share with multiple pod open the network maybe provide some secrets to have username password stuff like that and you'll be good to go right well not pretty and if you move to next like david this is the small stuff i put in place in cassandra there are some rules to administrate uh cluster those are part of the administration cluster and for instance there is the two minute rules you cannot you should not add multiple node inside a cassandra cluster at the same time the proper way to do it is really i create my first node i wait the two minutes until this node is lc and i joined the cluster and only then i create a new one and there's really one rule among dozens and dozens of full and so the default resources in cuban tests are not enough to handle a distributed casana you still have the rules to apply and how do we react with some notes goes down or you know becomes unstable you do have multiple procedure to recover and you know you need to execute some command and see what's working what's not and based on that act on your cluster you also need some to schedule some command to backup to restore to do the maintenance what we call reaper or compaction of your table if you need to so only with this pod persistent volume and all the default resource i gave you it's not enough and unfortunately it's not enough so what can we do well moving on we do have the solution for you so if the existing definition in kubernetes does not fit your needs you can create new one this is what we call custom resources definition and so that's kind of the template and when you instance that it will be a custom resources or you know custom resource definition or crd and then there is uh in the yaml so if you're not aware kubernetes is simply a notion of habel okay because kubernetes yaml as a service okay it's all about human well so you define your own resources and there is a part of the code which is spec per specification where you will define some keys and dedicated configurations okay and so you want to do some con contact here as a sample you define your own configuration default values properties and those resources like every resources on kubernetes cluster have a state okay so let's say i i have defined what look like my resources provide some states some properties what do i do with that how do i tell my remember little scheduler and control loop controller knows okay how do i behave with this new guy which is totally exotic you need some some component in between the the kubernetes master and this resource to know how it behave and to do so we use what we call a operator if you move to next slide you will see that a operator is simply a type of component of resources inside kubernetes world that just here to help the default controller to to listen and to change the state of a crd custom resource definition okay and so this guy will listen all the events coming from the crd will send even to these just to act on it execute some command and make it work and so if you want to install an application that needs the operator what do you do first you define what the operator looks like okay the pattern there then you instantiate this operator it's a running process that needs to listen and send even and and get events okay so define the operator start the operator but with that nothing happened third operation you need to define your crd and start a crd and now the magic happens i start a crd and the state is initializing so i sent an event telling that i'm initializing then operator knows okay because you are initializing the next step you need to do is do that and do that and step by step the operator will give the workflow to start all the components in your app or the you know the component you want to start and so what we did at datastax we created a crd matching a cassandra data center or a ring remember the ring over here we do the crd for a ring and so if we instantiate this crd what we want the operator to do is start the third node which is the seed nodes you know remember the one that we want to start first when the guy is online move to the second one and third one and so forth and so on until i do have the i do have my data center running with the proper number of cassandra node i asked and each time it will provide not only the pods and the disk for each node but also some extra resources like secret login password configuration remember the dash key with all the settings you provide on docker here now you have a dedicated resources called config map in the kubernetes world that will be used and inject in the running container to provide the settings in all the running pods and that's as as you know as easier it is you may have read tons of documentation around kubernetes operator is just that it's a new guy listening other events and act on simply the crds has been defined for all right moving on so our cast operator so that we open source secreted by data stacks but not only a member of the community uh work on the operator there are still multiple operators on the open source to interact with kasana this is the one um casa perato is the one proposed by the lsx to the committee this guy can do some ring initialization the data center or ring as i told you seed management i told you start with the seed and add new nodes based on that do the iraq management so in the cassandra vocabulary you may have remember about the rack this is just an insight to tell cassandra oh by the way those two nodes will use the same disk or we will use the same it resources so they should we we just tell them this is part of the same rack and when cassandra detects that two nodes are part of the same rack that tends to distributed the data not at the same two nodes because you just increase the risk of losing the data if two nodes rely on the same disk tell that to cassandra and cassandra still write the two replica of the same data on the these two nodes you do have a risk that if you do lost the the disk you will lose the two replica of your data at the same time so you want to avoid that so cast operator is aware of rack and will create some stateful sets per rack all the settings for your ring or data center will be in the cassandra data center resource definition crd and here it come the magic if you nude if you lose a node it will try to reboot and it will if you need to expand your cluster you will simply go to the yaml for your crd and you will change number of node for x to y and then apply this new config and the operator will spawn new nodes one after the other and do the scale up if you want to also if you do have some settings to change remember you i don't know you'd have 1 000 nodes in here in your data center and for some reason you decided that now your memory on the on on the drive gvm should be moved to 500 megabytes to 700 megabytes well by simply changing one yaml file the operator will start and doing changing the config for every nodes one after the other and do some rolling restart for you so that's pretty that's also pretty cool for uh rolling a new config or if you want to upgrade custom drive version why not boom this guy will start running the nodes and and make that work for you yeah and i just i also want to jump in here too that that scale-up capability to just change the ammo and apply it right that's another one if you think about for anyone here who's actually ever managed their own cassandra clusters before and you've added a node right it's not too too bad once you once you use the pattern it's actually quite easy but the cast operator in kubernetes in my opinion makes it even easier because then once you have your cluster running with the cast operator as cedric just said if you want to scale that thing up or scale it back all you have to do is change the yaml and reapply right it handles all of the rest of it for you all the base configuration and everything so it's like again one of those like really cool features that once you get to start to play with it and see how it works you're like oh that's that's nice right that that kind of reduces some of the um administration overhead exactly yeah and so we talked about previously about multiple kubernetes cluster which it was a bit wild but here with the cast operator you can have multiple dc okay you can create multiple crd part of the same cluster simply provide the same cluster name and when the node will be spawned it will they will join the same uh cassandra cluster which is pretty neat right and so if you do have some migration on a new dc to to to create because operator is enabled for that as well so i do have a question uh from our friend uh dana haran and this time discord switch over discord hello um where can i find more information about the cass operator i do know that there's a repo out there um oh and boom alex dropped it yeah i dropped it right when i said that github.com that's the one yeah yeah that's totally exactly something like that you know tends to open source uh what we do also i um yeah okay let's let's do like that so here is just a sample definition of cassandra data center so our crd so here you can see let's look at the right part of the settings uh you you will define your cluster name server type server version so the cast operator can work with cassandra 3.x cast on graph x data stacks enterprise the enterprise distribution of provided by their stacks with cassandra at the core but also providing spark graph and solar to do some search so it works both with oss and data stacks enterprise um here we define some data center with size nine and you you can spawn some rack okay just i would like to create a rack based on the availability zone here in aws we are entering a little bit cloud cloud settings we will do more deep dive cloud focus next week to move all we are presenting today to the cloud so this is a sample conflict telling you that okay now those nodes would be will be on the same availability zone [Music] in the same in the same region but different availabilities and so when customer will spawn some nodes and distributed data among these nodes especially distributing the data defining the token range how to distribute the data it will know that oh those two parts are parts of the same vpd zone so if i can i will not put the same data in those two just to be as reliable as possible what else do we have yeah we do have the storage class okay uh which kind of disk should we use and we can also have some cassandra config which keys should i override when i start the customer so cedric i've got a question here from you from yurta in youtube is it possible to have different versions of the operator within the same cluster say upgrade the versions only per namespace that's i don't know i don't know either i don't know alex i wonder if alex has a comment on that one uh we yeah we can we can reach out to the team hey alex get up so he's thinking about it too yeah we'll have to think about that one come back to you on that one the operator itself had his own versioning so you would have your instance of apparatus listening for some kind of events and when we release a new version of the operator you know this guy should be backward compatible should increase and read new events you know this is but we need to have you know validation from the engineering team here yeah and alex's second response was i'd avoid the situation and i'm i kind of feel the same thing right because you're adding an extra layer of complexity into um your your configuration at that point and that could lead to some unintended consequences but um yeah we'll have to we'll have to see what the engineering team says on that but you know at the same time i'm understanding the question very well kubernetes is moving so fast you know yeah 1.5 when the 15 one then 16. wonder 19. we do have a new kubernetes version every every month so at some points you might consider okay how do i migrate my own kubernetes so should i use another master with the new version right and you might ask the same question we're getting operator and everything so yeah it's a good question is there a proper question that we will probably put on community that that has come to have everybody get dance we're writing down yeah and i'll i'll go ahead and log that one um to make sure we don't lose it okay cool yeah prefer to say i don't know okay so boom and you don't need to understand all things here again you'll be quizzed on this go okay yeah you know try to find the blue guy named pvc you know like try to spot the okay even with the single yaml definition this is all the resources that are created for you so not only the dc but also some disk but pv pvc super user and now you can see here it's only a single node cluster and this is everything that has been done for you and if you make your cluster scale out you will simply add new pods at the very right of the screen you know the one name cluster one dc one default sts0 adding new node will simply add new lines with new pods we do have pots for each customer node we do have a stateful set for each cassandra rack we do have a dedicated custom resource definition for a data center or a ring and the same operator can be used to manage multiple cassandra clusters now you do have some mapping from kubernetes to customer that should talk to operator developer it's for developer it might be a bit not that easy it's a lot of info i get that but you know as usual we provide all the resources and when you will do the ends on uh after this session you will get the ants and you can be able to invoke cube ctl to just list all the resources that have been created and interact with them to see all things work okay all right so now with the cast operator you're pretty confident to make your cassandra cluster running and if something goes wrong you expect this operator to you know to catch up and replace what needs to be replaced and under all the failover for you and if you need to use you know in your app just in interacting you expect as well to have services dns everything needed to be able you to connect to all the nodes in a transparent manner okay so let's move on and here comes the big one okay this one i will move fast you know remember where i told you in a single pod you can have multiple containers i was really thinking about that cassandra is a java application pretty cool but it when it comes to interact with kubernetes yeah it's it's blah okay interactive with kubernetes is all about api rest api because the controller really needs to know if your system is lc started and if you think everything is okay so inside your container you need to have what we call pod you need to have what we call probes so readiness probes um and a liveness prop just to take okay the component is just started and component is ready to get some requests and there are some bootstrapping and loading time and they focus on where it's about 20 seconds so castler itself is not enough and so we implemented what we call a sidecar which is another container running in the same pod exposing a rest api to everyone just to see okay this is how my cassandra nodes behave and what i can get from it yeah and again it's open source here is a link and you can use it in your own kubernetes cluster even you know you don't use any cast director at all okay moving on okay so uh we explained custom height in docker yeah that's not what we want to do with the single node cluster on local 11 that's it we did also docker composed and not enough we know we need to orchestrate our containers then we introduce customers kubernetes yeah but the default resources in kubernetes are not enough so you need an operator and with the operator you're good to go now you can start and work with uh and now you can work with with the customer cluster but now i do have a question for you okay on your kubernetes cluster you do have your cassandra system running how do you monitor your system how do you backup how do you do the maintenance how do you plug application on top of these containers well to answer all this question and make that as easy as possible we created implemented kate sondra k tess so k a s for q r f s sound graph or gas somehow and in there if you go to next slide even you will see that at the heart of this distribution so cassandra dot io again open source you will find cassandra and the cast operator at the heart but also extra pods to do reaper with cassandra ripper open source as well created by the last pickle company which has joined data stacks now cassandra made user to do the backup same same principle and even more some tools to connect cassandra to grafana and promotees which are monitoring tools okay next and now instead of defining yaml file for every single resources so remember already with a single yaml provided to the operator our custom resource definition we were able to create tons of resources on customers on kubernetes remember all these pods and all the definitions with helm it's even simpler so i'm java developer i like to call helm like maven for uh for uh kubernetes slash docker or it's like npm or pip you know pip install npm install maven install help me the same you do have some recipe with ready to go manifest so manifest is you gml where you define all the resources you want to create in kubernetes and now you with helm you can simply say helm install and this guy will define all the resources install the operator and from that you can also say okay now helm start me an instance of kate sandra and boom you do have a lot of pods and resources created for you the idea is to make that very very easy and i should add too that the the grafana and prometheus implementations here they're not just basic grafana and prometheus with nothing else added right especially in the grafana and you know it's already set with all sorts of metrics and dashboards and charts that are relevant to cassandra right so you're getting you're getting all sorts of knowledge it's already baked into that and for anybody who's ever done that before you know it's not too hard to hook up like prometheus and grafana and you know the cassandra has the the plumbing for all that um but it's yet again it's something else that you need to go off and configure on your own where here it's being all pre-baked and pre-configured per cassandra node for you with all of those metrics and stuff again relevant metrics ones that you care about when you're actually running real cassandra clusters so there is a ton of knowledge from the community and the people who do this that is actually baked into this stack that's that's actually really important true okay okay so let's see next slide all the features for ken sandra i told you already what it looks like next okay so this is the project get somehow.io and this is uh where you can find this the source code and now you know i would like to do uh to show you what it looked like at least and so i will try to share my screen again and we're gonna do this again all right let me go back so i i have you know are you writing down everything on my so let me share in my screen now all right start sharing okay and tell me if i'm okay all right it's still oh oh i think i see it all right we see your screen this time oh so cool yeah okay so um everything that i i'm showing you is part of github so the link github.com slash data sucks academy slash coupon 2020 because kate sandra was announced at coopcon 2020. here you do have the slides again we did not lie to you and here is all instruction and command to use one after the other to make the demo running so having a cassandra cluster running and prometheus grafana instances and see what's happening so you can either run these locally or use cloud-based environment and so those cloud-based environment will give be given to you if you use alex if you send an email to alex uh follow provide his email or jackfryer or use the link provided here and we will send instances to people once a day and those instances will be available for 40 24 hours or you can work locally and if you do locally you need at least four cores eight gigabytes of ram and to install locally here we do have setup local okay so to work locally you need a couple of tools so you need docker what a surprise you need a way to start kubernetes on the laptop multiple tools available you may already have heard about mini cube or kind here we provide you the instruction to run with kind and so having those small kubernetes cluster on your laptop you use the same command as set here now it can be slow because each time you need to pull meaning you download all the images okay and doing that in two hours was simply too too short so what i did i do have my instance on the cloud okay and i do have the same yaml as in the repo over here okay and what i did i simply first execute the command so i do have my insta when you get the instance this is the web page we provide use like kind of a home page for you so if i go to get somehow my home page this is what it looked like and all the link to start tools has already provided so you can use either the ssh console but hey i like to use my own ssh and it's better because i can make things bigger on screen so what i did i simply use ssh the ssh command to connect to the instance ssh ec2 user at the url provided and i changed the password for my demo okay so i'm the only one able to run to this ip don't try to hack my box okay so first what you do you tell helm remember the maven mnpm ppin style guy to say oh you will need to download images from these helm dot customer i o so add this url to your repo then you update the repo so now you downloaded all the image then all these components need to talk to each other by default all the network all the ports are closed for security reasons and so when you want your pods to interact one to each other you can simply open just a port with what we call a node port but you can also open the part two you know outside of the cumulative cluster and to do so you are using yet another set of resources in the kubernetes world namely services and ingress and so traffic just one of the implementation to this port opening so what you do you add the repo and you update the repo so now you update okay as usual and now you will insert you will install traffic providing all these settings so if i go to the settings so traffic.settings it's easier to look at this file on github so traffic values what i simply say and i say okay i will have multiple namespace and pods what i would like to do is for traffic i want you to expose this part for a service name pod i do want to explore this guy and get to about this guy okay and doing that in your menu you can immediately start interacting with traffic and now you do have all the rules and routing rules that has been set so when you invoke the cloud instance we provided you using this url it will be routed to the proper pod okay okay moving on this is traffic then when we do when we do have traffic we simply say oh elm install cassandra tools cats on rocket sonar that's it now elm will be ready to accept any uh command to start helm so namely helm install cassandra cluster hey using kate sandra cluster and we can set some extra parameters enable traffic of course and provide some configuration keys so namely the address of your linux box you know we cannot know by by advance what would be the ip of your linux box so we simply define some address environment variables and those are simply mapped and you know all this command i done that on my linux box and so if i now give ctl get pods and make that maybe smaller like that you see that even simply by doing cube ctl get pods this is all the pods everything that has been started already for me so here you do have cube ctl dc default sts remember this is our stateful set for our data center named dc1 and not only cassandra using casa operator here with this guy cass operator you also have some tools like reaper for doing some compact compaction but red reaper administration tasks and also some primitives graphana so let's see if i if i keep going uh so let's see let's see let's see yeah it's you know pretty simple let's do yeah let's not go through all the steps but what you will have is first you will have prometheus so primitives is a time series database something that will listen to metrics coming from multiple system have some what we call service monitor to pull the metrics from external system so here we do have an endpoint so a promoted service monitor will pull the cassandra cluster running inside kubernetes and extract all these keys so kate sandra dc1 endpoint promoters and then it will be available in prometheus so maybe i can have some something like that and here i do have some events coming from cassandra storing their time series database but like that yeah it's not really happening so working with primitives there are some web ui named graphana you will plug grafana to promote use using what we call a data source so let me see data source and again just help install kate summer everything is set up for you all these interconnection between graffana and prometheus and prometheus and the customer cluster is done for you so here is your here in our data source okay we will use these data source to create dashboards and so again when you install catsundry you do have three dashboards available one is okay what are the system metrics in my cluster okay is it busy or not what is the matrix mm-ram cpu network traffic okay of course you can use your own promoters okay of all promoters graphene stack you simply have to import the dashboard we gave you and the service monitor just to push the data to your own grafana so this is one dashboard but not the only one i do have cassandra overview so now i do have read through put right throughput slow request status of the node the status history so here i'm running a single node but i could simply say helm scale to three nodes and i will see that its scale but it will take a minute or two so let's let's move on over here and last dashboard available here is uh no no i don't want to save my dashboard yeah i don't know what i click is what we call the custom condense and this is again some coordinator what about the p95 latencies namespace more data related to the cassandra jvm and cassandra way to work and not only some metrics regarding the environment so again that would be your homework so go over here do the exercise either on your laptop or using um an instance we will provide you if you feel the form and something just to just to belabor the point too i can't help myself with it um because i you know watching all of this capability kind of grow over the last so many years but coming from in in the you know history of my own career and stuff of doing a lot of these things before kubernetes before the operators before cate sandra and such like imagine how many folks between the metrics the configuration creating the dashboards hooking up the plumbing managing the cluster all that stuff that at some point someone or a team of people were doing right for your organization or whatever and that might even be for a single website or a database or something like that that all that is being encapsulated into this this setup for you right um just alone just the just what cedric has just been showing you just alone i mean again hooking up promoting using grafana it's not that hard but there's enough little configuration parameters enough knobs getting all of the the knowledge that's been put into those graphs hooking up reaper hooking up um all the other management tools that take that takes work right that takes actually a lot of work and it's all being done for you here so it's and it's cool it's for free right you just you don't have to pay for this thing you just go get it you do it so i can't i can't belabor that enough because i've always i feel like coming from an ops um background and having to do a lot of the stuff manually and then seeing this work all being done is just like yeah yeah you know and you know you see that we add the compaction because this guy is running for multiple hours already and you know immediately you see my latency and some jumped here and there but you know with that i'm done with the small demo and i guess it will be time for you to override if you take a look at the data stacks academy organization i will drop that down here you go um you'll notice in here um there is battle stacks the one i was just talking about workshop spring data cassandra spring reactive right those are the ones that cedric were talking about all of our workshops are self-service they're meant so you can actually work through them on your own and most of these have the accompanying videos that are on the youtube channel so again make sure you go and subscribe and with that cedric i think that's it unless you have uh anything else no let's wrap up uh remind you all the acaton so tomorrow we will do another time the same session still live still with you still still interacting with you so regarding kubernetes and also tomorrow same time of today we would have asked me anything question regarding the akaton so if you actually started on the acaton um come join join us live here and we will answer your question we're getting there kittens awesome well with that thank you cedric thank you everybody and we will see you next time take care bye-bye take care and as always don't forget to click that subscribe button and ring that bell to get notifications for all of our future upcoming workshops imagine a being gifted with powers from the goddess of cassandra who grew those powers until she can multiply it will move with limitless speed and unmask hidden knowledge with those powers she was able to fully understand the connectedness of the world what she saw was a world in need of understanding from that day forward she sought to bestow her powers on all who came into contact with her empowering them to achieve wondrous feats
Info
Channel: DataStax Developers
Views: 1,660
Rating: 5 out of 5
Keywords:
Id: PCA-aERrcU4
Channel Id: undefined
Length: 102min 17sec (6137 seconds)
Published: Wed Jan 13 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.