Running the Elastic Stack on Kubernetes with ECK

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

all right let's let's get started and the rest will join us i guess in a few seconds welcome everybody hi welcome to today's meetup my name is dalia and i'm from elastic's community team today we're here to hear from sebastian who is a senior software engineer from elastic he's going to talk to us about running the elastic stack on kubernetes with pck and we will take all the questions in the end um and if you ever want to present at one of our awesome meetups please let us know we always look for um awesome speakers who want to share their stories with us so if you want to be one of them uh get in touch with us uh so sebastian please go ahead sure i'm going to share my screen hi everyone is at the right screen yeah like that maybe so as delia said my name is sebastian i live in toulouse in the south of france uh if you've never been there you should it's a nice city probably a bit colder than italy but still nice so i'm a software engineer at elastic i work in the cloud team and um more specifically i work in on a product called eck elastic cloud on kubernetes that's what we're going to talk about today if i can go to the next slide apparently doesn't want to alright let's try again all right okay so at lastic we have a little bit of history with kubernetes you may or may not know but for the past few years we've been working very close to kubernetes and the kubernetes community the first one of the first thing we did is really some official hem charts for the elastic stack on kubernetes they're free on github we also worked a lot on integrating bits with the kubernetes ecosystems for example you can deploy file bits or metric bit on your kubernetes cluster to grab all the logs from your kubernetes pods or all the metrics from kubernetes itself and send that automatically to elasticsearch but in the past few years we realized that running the entire elastic stack on kubernetes i mean including elasticsearch itself is not that straightforward so today we have baby yoda with us to help understand how you can easily run the elastic stack on kubernetes and what that means exactly so if you try doing this yourself you may have tried already it's not that easy it's not that straightforward the first thing is there are a certain number of resources you need to manage on your kubernetes cluster you need to manage pods secrets services stateful sets persistent volumes all those resources that you need to set up for for example your elasticsearch cluster to run properly on kubernetes once that's done and once you have your elasticsearch cluster running you have to handle what i like to call day two operations meaning scaling up the cluster adding more nodes scaling down the cluster maybe changing the configuration of that cluster and then you'd like to do a rolling upgrade or maybe you also want to handle version of grades upgrading the stack from one version to another all these operations are not that easy to do in practice like it's one thing to have the cluster running for the first time but it's a different thing to have it in production for several years after that and then of course because we talked about elasticsearch we talked about stateful workloads so elasticsearch is basically a search engine but also some sort of distributed database that runs on multiple servers so you need to deal with availability and consistency whenever you do changes on that topology when you add new nodes remove some nodes etc you want the elastic such cluster to stay available at all time and you want strong consistency in the data you also need to deal with volumes how do you deal with rolling your grade and make sure that you know that you restart reduces the same volume as the previous one how do you deal with terabytes of data on your cluster those are many things that are not so easy to do lucky for you we have a new project called eck elastic cloud on kubernetes you can check it out on github the code is entirely open and eck allows you to deploy the entire elastic stack elasticsearch kibana apm server enterprise search and bits on your own kubernetes cluster we're compatible with most kubernetes distributions including openshift and the biggest cloud providers to the aws google azure and the goal of this project is really to tightly integrate with the kubernetes ecosystem so when using eck you'll use the usual kubernetes tool in the communities ecosystem like kubectl and all the various tooling you can have around that eck supports orchestrating elasticsearch with advanced topologies and talking about hot warm cold deployments they may be part of your nodes that we call the hot nodes would have a very fast underlying storage for example but then you would progressively move that data onto the wall nodes with slower storage and eventually to the code modes and maybe in the future to frozen nodes like with very cold and cold and distant storage like s3 for example and we want this to support very smooth operations like scaling up the cluster scaling down the cluster rolling your grades et cetera this should be very easy to do with dck so to give you an idea of what it looks like i'm gonna jump straight to a small demo here so the first thing i'm going to do is deploy eck itself that's pretty straightforward it's just about applying a kubernetes manifest so this is a yaml file that i apply on my kubernetes cluster so in this example i'm using a gke cluster on google cloud platform so with this single command you install eck on your kubernetes cluster and we see that it's actually installing a bunch of kubernetes resources the first few rounds are about installing custom resources for elasticsearch kibana etc and the other ones are about all the our back access that the eck operator needs and installing the operator itself then i have a sample yaml file here that describes the elasticsearch cluster i would like to deploy so here i have one cluster that i called my cluster i want this elasticsearch cluster to have version 7.9.0 and in that cluster i want two group of nodes which we call node sets here the first node set is called master nodes and it specifies that i want three master nodes so i know those are master nodes because they are configuration specifies not the master equals true and then the other group is a group of data nodes so here i want three nodes and that those are data nodes and not master nodes so we have dedicated masters on one group and dedicated data nodes in the other group then along with elasticsearch we are specifying another resource that we call kibana which is kivana surprise we call it nike banner and and sort of the magic here is that we don't specify any complicated configuration so that kibana can connect automatically to elasticsearch which requires a url password a bunch of tls certificates etc here we just specify a reference to the elasticsearch resource that we specified right above so this one called my cluster and that's it with this sample yamaha manifest we specified an elasticsearch cluster of six nodes and a single kibana instance that should be automatically connected to that cluster so to deploy that on kubernetes i'm just going to apply that gml file using the regular kubctl tool and what we'll see here in a few seconds in this um upper left corner on the screen is that we have several kubernetes pods that started to appear we have the three data nodes the three master nodes and we have uh kibana itself so we i think we temporarily have two instances here but it's gonna get back to one instance pretty soon it says sort of a node kubernetes back here uh so we have this running and what happened is actually eck noticing that there's this demo specification that was uploaded and eck will do whatever is necessary to make sure this cluster is correctly deployed on kubernetes so there are a bunch of bots here each part represents an elastic such instance here but kubernetes eck also deploys a bunch of other resources for example there's a type of resource in kubernetes that is called the secret and we have a secret that was created automatically here for the elastic user for example so if i get that secret and look at what we have inside it we see that we actually have the password of a user called elastic which is the default user created automatically by eck so i can retrieve that password which is base64 in kubernetes and i can use that later on to connect to the cluster along with that secrets we also have a bunch of services services are abstract resources that perform some sort of run rubbing across multiple boats and here we have one important service called microster http which is the main service i can use to reach that elasticsearch cluster so if locally i do some sort of pulled forward to that service i should be able to reach that cluster so we want to use port 9200 and here i'm going to curl localhost on that port i'm going to use the password that was specified here and i'm going to skip certificates validation because by default um ec generates a bunch of self-signed certificates for all nodes so that we use tls by default but you can tweak that and provide your own certificates for example and if i call i can see that i i can successfully reach that elasticsearch cluster in version 7.9 that's here and i reached a random node here like the master node number two so a very nice thing that we can do is that we can for example add two additional data nodes and it's just about changing the number of data nodes here in that file applying the file again and we'll see that two additional data nodes popped up here in the list of parts again that's because eck grabbed all the changes that are necessary and will do whatever is necessary for that to happen something slightly more complicated that is also interesting is for example changing the version of elasticsearch so that's a version upgrade 7.9.1 and doing that is pretty straightforward again i change the gamut specification i apply the tml file again then eck will notice that the rolling upgrade is required for that cluster to um use the new version and it's actually going to do it so here we see that this node is getting terminated and pretty soon it's going to be restarted in the newer version of elasticsearch so see this is all about editing that yaml specification with whatever specification you actually want for your cluster so it could be anything i could change the configuration here add a new configuration item for that node and apply the tml file again and that's it eck will just deal with it while making sure that in the meantime you can still access the cluster there's no downtime it also ensures that there's no data loss or data and availability while some challenges are getting performed all right so baby yoda is slightly curious about what we're doing here it just feels like we're changing a few lines in the yaml file that uh he's a bit curious about what actually happens behind the scenes and that's what we're going to see right now so what's eck exactly uck was what we call the kubernetes operator um operators is the concept in the kubernetes ecosystem and basically we could say that operators are the clients of the kubernetes api meaning they communicate with the kubernetes api and they also act as controllers in the kubernetes sense so controllers in kubernetes are programs that ensure that the kubernetes cluster runs correctly and they're a bit special compared to kubernetes controllers because they act for a particular custom resource and that custom resource here is elasticsearch so that's a bit abstract let's get into some of the details so you better understand what's going on so the first thing we did here when we installed eck is deploy a bunch of custom resource definitions so you may know that in kubernetes api there are different resource types we've seen that there are pods which are like containers there are secrets services and so on those are built-in resources what we can easily do which is pretty nice is extend the kubernetes api with additional custom resources and here we define our own custom resource whose name is elasticsearch so in this custom resource definition we specify that resource kind elasticsearch has a field name spec and in that field there's another field called version then another one called not sets which is actually an array etc so as a user you just have to write a custom resource which is basically the cml file on the right that respects the schema of the custom resource definition of the elasticsearch resource so there's this custom resource thing which is on the api side then this is best the operator side of things so the eck operator is basically a pod running in the kubernetes cluster and that pod registers a few watches on the api server meaning is getting notified whenever an interesting change happens on the api server and this interesting change could be for example the user created a new elastic search resource automatically the operator gets notified of that and that's whatever is necessary for that cluster to be deployed and that's what we call the reconciliation the idea is that whenever the operator receives a new notification it will retrieve the specification of the elasticsearch cluster and it's going to make sure it creates or update any expected kubernetes resource for that cluster to be created successfully so for example if we take this cluster on the left with three master nodes and two data nodes once the operator retrieves this yaml file it needs to ensure that there's going to be three parts for the master nodes and two pods for the data nodes but it's also gonna do a bit more than that it's gonna create a config map for the elasticsearch configuration it's going to generate a bunch of tls certificates so the nodes can trust each other also going to create a secret for the user and the password and another secret for the elasticsearch keystore if you decide to use it for some of your own configuration secrets and even more because in front of that we want to create a bunch of services and mostly there's one main service which is the one you use to reach the elasticsearch cluster using http so all those boxes on the right are different kubernetes resources that the operator is managing if we do the exercise of for example removing that config map or removing that code the operator is going to make sure it recreates them eventually so you can break whatever you want the idea is that the operator will always repair things behind you it's always running in the background so again that's what we call the reconciliation loop we get the resource and then we create or update every expected other resource that needs to exist based on the specification we received but the operator also interacts with the elasticsearch api say you deploy a resource that specifies an elasticsearch cluster and maybe later on you have data tree source because you don't want 10 nodes anymore you just want one single node as you can imagine if we just remove nine elastic such nodes all of a sudden we'll basically be breaking your cluster because there will no be consistency in the master nodes anymore because you move from maybe three master nodes to a single master node so there's no leader anymore you will probably lose all the data that was on those nine nodes because you you remove them all on this a single one remaining that's right the operator needs to interact with elasticsearch you will notice that it needs to down scale the cluster from for example nine to one nodes and do whatever is necessary so this downscale can be eventually performed in the safe way so it's gonna migrate data from away from all the nodes to remove and make sure they are replicated on the nodes to keep first it's going to also tweak the cluster settings to account for example for a single master node instead of three master nodes so those are all operations that you would have to do yourself i mean calling the elasticsearch api before changing the topology that are now getting automated by the operator itself based on the topology changes you specified one thing we noticed earlier on in this project is that this very light and minimal yaml specification is super nice to get started but if you are a kubernetes power user you may want to rely on some of the kubernetes features that you already know and it would be sort of a shame for you not to be able to provide your own kubernetes stuff into this elasticsearch world so that's why in that specification we try as much as possible the operator to provide all the default settings that you need for your cluster to be running in production but we also want to allow you to override all the settings or to specify your own additional settings if you know kubernetes very well and that's why for example in this note set here we have a section called pod templates you can tweak the labels of that code you can give particular affinity settings saying for example that i want all the pods of this not set to be deployed or scheduled on kubernetes nodes of my end-to-end or production environment for example so this is all using the kubernetes syntax and i can also override the environment variables of the elasticsearch container so here i'm changing the java the jvn options so this is all optional but basically you can override all the defaults like we said this is another example about volumes for each nut set we by default create one volume per elastic such code and we're using the concept of persistent volume of kubernetes persistent volumes are very nice because their life cycle are is bound to a particular bud and because each part is part of a stateful set in each group of boards so here we have one stateful set for master nodes one stateful set for data nodes and each part in that stateful set has a persistent volume assigned for the entire life cycle of that stateful set it's very nice because it means if your pod suddenly crashes for whatever reason like maybe the host died maybe elasticsearch runs out of memory then kubernetes is going to recreate that but automatically and it's going to reattach the exact same persistent volume meaning as seen from elasticsearch point of view this is just equivalent to analytic search node we're starting with the same data it's actually how behind the scenes eck handles the rolling upgrades if you want to roll and upgrade that cluster we actually just need to delete pods one by one at every deletion the pod is automatically getting recreated and we'll attach the same volume so of course those volumes they depend a lot on on the underlying storage that you have the default is to deploy volumes of one gigabyte but obviously that's not great for production you may want volumes of several gigabytes or terabytes so you can override that here you would say that you want the elasticsearch data volume to have 100 gigs of storage for example so each pod is not is now gonna access 100 gigabyte of storage and then the storage class that's specified here represents the underlying storage mechanism for that volume and there are many it just depends on your kubernetes provider for example if you use gke on gcp you may want to have a dedicated ssd volume attached to a single part if you use aws you may want to use ebs volume attached to it to each pod and the performance of these volumes of course depends a lot on the nature of the volume you can also use local storage to use the local disk available in each node etc so basically ect is compatible with all the kubernetes volume story then it's sort of up to you to choose the best volumes that fits your elastic search use case and then in many cases network attached volumes like the fastest ebs from aws gives you actually decent performance and the advantage of being able to be attached on different pods even though they may be recreated on different nodes whereas local storage volumes such as local ssds on your hosts will likely give you the best performance but the main drawback is that you kind of cannot move the pod around across different kubernetes hosts it has to leave where the volume so to try to summarize um i think one of the biggest value added of eck over maybe handling your kubernetes resources yourself is making sure that your data stays safe and that your cluster stays available whatever changes you make on the elasticsearch topology additionally it handles tls by default it gives you sort of a simplified configuration for elasticsearch handles data migration rolling upgrades complex topologies you can also deal with availability zone awareness by reusing the concept of affinity of kubernetes your cluster in three different zones for example and you can also use crosslist to search on replication for different clusters maybe running on the same kubernetes cluster or maybe having like one cluster deployed on prem another one on kubernetes et cetera so it's pretty useful um you can give it a try really quickly i think you've seen before that i i was able to deploy this cluster really quickly um you can read the github page and we have a link to the quick start documentation if you have a kubernetes cluster available or even if you don't have one you can install mini cube locally you'll see that in in maybe five minutes you'll have your elasticsearch cluster running and it's actually fairly straightforward that's basically all i wanted to say for today it's a quick overview of what eck is how easy it is to use and what it can do and i think we have some times for question now before we do that um there's this survey that we we try to ask people to feel about our meetups um talia maybe you can try to paste that in the chat yeah all good thanks so if you can take maybe a few minutes uh a few seconds actually now or after after the meetup to fill this survey that would be very nice and then i'm open to answering any of your questions um i'm one of the engineer in the team so it can be very technical questions or maybe product questions or general questions as you want so i'm going to look at what we have already in the q and a pane now elk 7 can be configured without explicit distinction of mastering just data nodes any specific reason they are created here separately uh no particular so you can so i think you may you mean elasticsearch version seven but i think you can you can actually do that on earlier versions of elasticsearch um so you can of course deploy a 15 nodes cluster with no particular split in the master and data nodes that's going to work fine for production environments though you may want to have for example three dedicated master nodes whose loads will be dedicated to really handling the cluster state and separate that from your data nodes because maybe your data nodes would have a much bigger run capacity much bigger storage and different cpu constraints so you don't have to split the node types but in big production use cases it usually makes sense to do so um i hope it makes sense next question was how did you with system volume configuration yeah that was the i think the slide that that came a little bit after so that's how you do that you specify your system volume constraints here so the storage class name the storage size etc it's up to you to pick the batch best storage class basically all right all these modular features of eck equally obtainable on the elastic stack community edition yeah so basically eck comes in with two different licensing models the first one is the basic license of eck which you can use for free similar to how you can use the basic license of the stack for free and then there's the enterprise license which you would pay for most of the features are in the basic license already actually but if you want to use licensed features of elasticsearch for example machine learning and then you need an enterprise license or vck and then we're working on a few more features that we only exist probably in the enterprise license versions but really already most of the ck features are in the basic edition and the big difference is in the features of the stack itself not in the basic license like machine learning what is recommendation for products on scale setup um it's hard to answer because obviously it really depends on your use case um how much data are you going to store um is your use case search heavy and in which case you want to read data a lot is it index heavy in which case you want to index data a lot so usually if you ingest logs for example your workload would be more ingest heavy but in general what we recommend for large deployments by large i mean you likely need more than five nodes it's a group of dedicated master nodes at least three so that there can be a quorum of three nodes with an elected leader and a group of data nodes and among those nodes you likely want to split them in different regions and availability zone so for example if there's a failure in the aws data center where the virtual machines are you don't want the entire cluster to be impacted so you want like a third of the data nodes to be impacted and you know that you have replicas of the data in the other availability zones that's another aspect and then i think the last um important thing is the hot one called topology you can search for it on google but basically the idea is that you may want to split your data nodes into several groups with different constraints and different performance requirements the hot nodes would be where you query and ingest data a lot then progressively you would maybe move the old data to the world nodes automatically using ilm policies again this data could be moved to the code nodes for example so i think that's a good example of a large production grade system right indicating master nodes data nodes splitting across multiple availabilities and maybe splitting the data nodes into hot and warm nodes for example do we have a tutorial page step by step for starters yes let me show it to you if you look for eck quick start let's make that full screen here you quickly end up on this page probably the first result in google and that's really like the entry point of the documentation on the elastic website so it's basically gonna go through what we just did in the demo uh deploy uck itself so it's a one-line command deploy elasticsearch like that's this yaml specification that you want to deploy then we give a few comments so that you can like get more details on the health of your cluster see that it's green the logs off the nodes by the way that's fairly easy since we're using kubernetes we can just grab the logs of a particular but like that and that's how you get the logs for asset search um and so on so that's the quick start and then um as you move on to understanding how the system works you can then go to the more advanced sections and here we have more details on how to run elasticsearch on the case how you can configure storage how you can configure tls settings secure settings custom plugins etc so everything is there i think the documentation is pretty nice we try to make it easy to digest so i hope you can find everything in there next question was does the eck operator handle automatic rotation of the created dls certificates yes it does by default we create certificates that are valid for one year and if i remember correctly a few days before that date is reached we regenerate you certificate automatically we propagate that to the pods automatically we update the secret that contains these certificates automatically so basically you have nothing to do we handle certificate rotation automatically which is great unless of course you decide to provide your own tls certificate for the main http service in which case it's it's like your job to deal with the rotation of that particular certificate of your own if the hot one called role for elasticsearch node also defined in the startup manifest it is not because we try to stay as close as possible to elasticsearch there's an example in the documentation here i think in the elasticsearch section we have advanced elasticsearch node scheduling so we have availability zone hotform topologies here so the way you would likely achieve that is by setting the custom attributes in the configuration of elasticsearch you're specifying that this group of nodes is the group of hot nodes and then you'd say that you want this type of hot nodes to run on kubernetes instance types with high i o very fast storage systems that's right for example this one is in this example we have one terabyte of local ssd attached to each node and then we define the wall nodes setting a different node attribute in the configuration and making sure those pods are scheduled onto different kubernetes nodes so here all the kubernetes nodes that have the high storage tag i hope that makes sense so here for example the high storage for the warm nodes is likely configured with 10 terabytes of storage so yeah it's up to you to define your own split of group of nodes like that um we're working with elasticsearch teams so that maybe we make that a little bit easier and the future in future versions of elasticsearch there may be a different place where you can specify whether your node is hot or warm node but so far that's the way to do it all the nodes train before we started what if i run an index without replica that's a good question so i think there are two different cases to consider for when we're saying we restart or maybe remove a node the first one would be imagine you have three data nodes like that and you change the specification so that instead of three data nodes you only want one data node so in practice what you want to do is remove these two parts in the bottom here what eck will do is make sure all the data that feeds on those elastic such nodes those two to remove is first moved to the remaining elasticsearch node so on this first part here and then once that's the case elasticsearch will remove the two of the uploads so even though you may have indices with no replicas doing a downscale is fine because then we make sure the data stays safe in the available nodes that will remain after the downscale now for rolling upgrades the situation is a bit different um what we do if we want to do a rolling upgrade of the cluster which in practice is sort of a restart of every single node one after the other um we just sort of restart the pod itself we don't care about moving the data first but once the part restarts it restarts with the same volume so if you have indices with no replicas what's going to happen is that during the time where the body is restarted because the data the single replica is not available anymore and then once the butt restarts you can access that index again so obviously if you don't want that to happen the best thing to do is to configure your indices with at least one replica so that while the pod is getting restarted you can access the reticle on the other side um technically we could make it so we would like move data around before doing the rolling upgrades but it feels a bit overkill maybe if you don't have replicas that that's because you don't really care about the availability of your data so it's better to make the rolling upgrade super fast and not do all these operations right if you don't want that to happen you probably want replicas because the chance of losing a pod for whatever reason because the server crash still exists right so if you you care about the availability of the index you probably want replicas i hope that makes sense does the butt template need to be modified when the default crd has been created by running the default setup command i'm not sure i understand the question actually [Music] so just to make sure about the vocabulary here the crd is actually the custom resource definition so it's it's not like a an elastic search specification it's like the specification of the elasticsearch specifications it's a bit better meter but here we have here on the left is the cr custom resource that respects the crd definition so uh when you install eck you eck itself you don't have any cluster deployed right after installing eck you need to install this custom resource on the left so then it's up to you whether you want to modify the pod template or not if you don't modify the plot template you'll have sort of a default cluster with default settings and if you modify it you can adapt it to whatever you need i hope that answered the question because i'm not sure i understood it if not if i like please ask another one maybe more details where does eck store it it slugs uh nowhere so eck really is just a kubernetes but so if we look at the pods on my system i have a namespace called elasticsystem here and that's pcba that's the elastic operator pod so if i want to retrieve the logs of that but i can grab them like that in the right namespace of course for production cases you may want to persist this log somewhere and a nice thing to do actually is to deploy eck then deploy an elastic such cluster 3ck and deploy file bit we also have support for 5bit and basically you'll have file bits running on every single kubernetes node and filebit will grab the logs of every single container running in kubernetes and you can configure it to send those logs automatically to the elasticsearch cluster you deployed so that in the end you can have the logs of eck itself in an elasticsearch cluster so of course it can be a different elasticsearch cluster like you you may want to use a target elasticity cluster in the cloud to store all the logs of that kubernetes system but you can also use one here um so really it's like we do nothing special by default for the logs it's just the logs are just um output on the standard output like any regular container but then it's fairly straightforward to configure file bit and elasticsearch to grab all the logs of kubernetes and store them in six inch which is a bit of inception right but uh it makes sense that's that's the right thing to do i would say what's the easiest way to move from one storage to another persistent volume haha that's a very good question i guess you may have experience with eck a little bit so let's take this example imagine we have those three data nodes we configure the 100 gigs of storage with a maybe this gce dc system disk storage class um currently there's a pretty big limitation in kubernetes which is that once we created the stateful set that corresponds to that group of nodes so the data node state will set on the right here we cannot modify its volume section because the volume specification of the stateful set is immutable so if you want to increase the storage size or change the storage class you cannot do it because the stateful set is immutable so you see k will reject the update which is a burner kind of so one way to deal with that is that instead of modifying that stateful set that not set in practice here but that translates to a stateful set you can create a different one so you create a new one with the new storage size and storage class that you want and you remove the old one in practice that's just about renaming that one so if you you rename data nodes to data nodes 2 for example and you change the storage settings here in practice it's going to translate to new stateful sets getting in with your new storage requirements and the old stateful set getting progressively removed by eck eck handling all the data migration steps from one stateful cell to another so that's how you can deal with it basically it's going to move the data from the old volumes the old storage to the new storage i think we have that explained in the documentation now it's nice that you ask this question because i'm actually working on a pull request so that we can handle storage resize and it was just merged a few hours ago let me find it um let's start with the name that's this one support elasticsearch volumes expansion so it's a new feature that's going to land in version 1.3 and the nice thing is now we support basically this little gif here increasing the storage size directly without the trick of renaming the stateful set and we're gonna do an inline resize of the file system within the pods if the storage class supports it so see here we increase the storage from 100 gigs to 200 gigs and it was down um inline like we didn't need to restart the elasticsearch now the pods which is pretty nice so it's going to be there in 1.3 eck work with the kubernetes topology spread feature to spread nodes evenly across availability zones yes i think it's a feature of kubernetes when the 18 if i remember correctly so if you remember this example where you use affinity settings here um you can use the topology spread feature directly in that template spec and it's just going to work as you would expect with the kubernetes so yes it's actually pretty good because before that to deploy uh things in multiple zones you would have to define multiple group of nodes with one zone per group but now you can define a single group of nodes for the data nodes and say that you want to spread those parts across multiple zones so you don't need to specify this multiple group i was wondering if eck would automatically monitor the health of nodes and take communication if necessary if so how is it implemented the operator of your kubernetes probes so that's a good question so there are two mechanisms in place for that the first one is that actually three different mechanisms eck monitors all the pods of the elasticsearch cluster in kubernetes if anything changes in those spots for example here if i delete this data node don't do it but you can do it it works i delete that node it's a pad what's going to happen is that the status of that pod changes it goes to terminating and then it's going to be removed every time there's a chant in the status of a pod eck gets notified so it can do whatever is necessary to sort of fix that so here eck will notice that the pod is getting terminated in maybe 20 seconds it's gonna notice that a pod was just removed and is missing so it can recreate that pod automatically and everything that's required for that but um you also see that we have like a sort of ready flag here so this this part is not ready anymore that's because with each part we deploy a readiness probe the readiness probe is basically checking that the elasticsearch api responds for that particular node so that's deployed within the pot and then the third mechanism is that within the operator itself we regularly pull the elasticsearch api to see if the cluster is healthy so basically we called we call this api here you got help api and this is uh oh yeah and it's restart port forwarding because it was probably connected to an old code this is telling us that the cluster is currently yellow it's actually yellow because one node is still starting here see 1.1 so now this new node what we created so it's likely that in a few seconds the cluster will be green again because the replica is alive yes here it is so we do monitor this um status regularly we pull this endpoint from within the operator and if there's any change in the status like we notice it's going from green to yellow we'll try to repair things but in most cases we only need to repair stuff when there's something happening to a pod so we actually do detect that from the resource itself directly hope that makes sense we're going into the details but that's nice in case of local disks does cck make all the magic for you or is some additional configuration required that's a super good question um i wrote actually a documentation for that recently if you go there maybe it's there in 1.2 documentation already there's a section about storage recommendation um it's pretty useful to read because it talks about storage in general and specifically it focuses on the difference between network attached and local system volumes so in your case if you want to use local system volumes the thing is you need to have a local volume provisioner available on your kubernetes cluster so one thing you can do is like manually create this sort of persistent volume resource yourself so here i'm saying for example i have 500 gigs of storage available on the particular directory and on the particular node so you create that yaml file and if you specify that you want an elasticity cluster using that storage class then eck will automatically match your latitude buds with the existing volume resources that you create maybe you don't want to go into the pain of creating those yourself so that's why there are several provisionals available a good example is this static provisioner here basically it's a program that you deploy it automatically discovers all the disks that exist on your different nodes and create the volume resource for that automatically and then i think a pretty good third example is toppo ldm the pueblo vm is a dynamic provisioner and as compared to the static provisioner which would create persistent volume of the right size depending on the disks you have automatically and then you can use those volumes toppo lvm will dynamically provision a volume as you need it so like maybe you want 10 gigs of storage to put lvm notices there's a request for 10 gigs of storage it's going to create an lvm volume on one host with exactly 10 gigs of storage and provision that on demand it's a bit more complicated but it's very powerful so the short answer to your question is yes you probably need to need to look into one of those three solutions to make sure what's going to be your strategy for local volume there's nothing that comes bundled with dck for local problems but we rely on whatever is possible with kubernetes local buildings does eck work with the kubernetes service topology feature for example to prefer network traffic to stay within the same availability zone to not pay for cross availabilities and traffic i that's a good question actually [Music] i think there's nothing special to eck here in the sense that whatever you can configure in elasticsearch [Music] to define the different regions of each group of node can be configured with eck so it's up to you basically to configure elasticsearch the right way according to your kubernetes topology constraints so that it fits your needs and then an important thing is that remember there's this this service that we create the dck creates automatically so this one for example can direct requests to any node in the cluster you can easily create your own service on top of that um you can like in the service specification restrict that you want packets to be routed to a particular set of pots and so if you have a label set for a particular availability zone you could make sure that the service only direct requests to um post of that particular availability zones so that's that's one way but then of course um i mean if you have replicas on multiple zones to deal with some failures i don't think there's anything you can do around the need for elasticsearch to replicate data from one node in one zone to maybe another node in another zone if you don't want that to happen you can either have no replicas which is probably bad either have all the replicas in the same zone which is also a bit dangerous right i hope that answered the question do you have any plans to implement a kubernetes auto scaler total scale around this cute age yes that's a very good question also it's a work in progress pretty much working progress we're still in the design phase it's not there yet but by the way um there's an event called elasticom elasticon global next week laticon is our big elastic event with a lot of presentations and it's it's free this time you can register for free on the elastic website and we have a whole session dedicated to auto scaling basically what we discovered is that auto scaling in kubernetes itself is nice mostly for stateless services for stateful workflows like elasticsearch it doesn't work that well it's it's a bit more complicated than what kubernetes offers what we're actually working on is implementing some auto scaling features in elasticsearch itself so elasticsearch can give us hints about what should be auto scale data nodes master nodes more ram more cpu more storage and then eck would react on that signal and automatically scale the cluster depending on that so we're working on it yes not there yet but pretty soon i hope is the recording shared i guess you mean oh yeah so i think this session is recorded uh delia if i remember correctly it's gonna be on youtube exactly the youtube link is in the chat so it's gonna be uploaded there in a few days okay cool that's nice so i can say things like the youtubers say like please please uh add the thumbs up subscribe to my channel um what about custom plugins a good question as well we have this i think covering the documentation um so the idea basically is that when you customize the pod template you can add your own unique containers there and you can add a new container that is going to install any plugin you want before elasticsearch runs so we make sure of all the underlying tricks with the volume mounts so that the plugin is preserved between the unit container and the actual elasticsearch container but basically that's the way to go so if you want to install your own plugin you would specify a url to that plugin here and basically that's it yeah what metrics are exposed by default oh by the way we are close to the end of the hour can we go on for like five minutes or are we i restrict some timing in the yeah yeah cool so what metrics are exposed by default by all cluster nodes um in which format is prompted use format supported [Music] i think it's not so it's an interesting question um elasticsearch by default exposes almost everything uh if you call that api for example if i remember correctly it's not i don't remember let me find it again cut i think i can do cut maybe i don't remember i think it's something about node style maybe just notes like that yeah so if you do that you have a lot um of details around each individual node so you see settings about the thread pool if you change a few things in the query parameter you can expose the ram usage cpu usage the jvm heap size usage the garbage collector information and so on so it's all in the elasticsearch api and so far those metrics are only exposed in the elasticsearch format so that's the like really in this json elasticsearch custom api but what we can what you can do which is super interesting is deploy bits alongside elasticsearch on your cluster so we have some examples here for running bits and in this configuration examples section we have something about elasticsearch and kibana stack monitoring so if you deploy this manifest it's going to deploy bits on your kubernetes cluster and bits will automatically grab those metrics i just showed you uh from the elasticsearch clusters that are available and store those metrics in another elasticity cluster that you dedicate to metrics so that you can then like visualize those metrics from kibana for example i think we have a blog post that covers that actually that was i think released like yesterday or something yes elastic stack monitoring with elastic cloud on kubernetes so it gives you an example you have a production system you have a monitoring system and you want to send the metrics of that production system to your monitoring system right um in the end you in kibana you would see things like that so you have the health of each individual elastic such kibana et cetera cluster and you can like trill down in more specific metrics cpu ram number of indices charts garbage collector and so on what is responsible for time counting in teamworks or prompt i think you mean that one here that's actually a watch command that i'm running here so that watch command is giving me the time here i used to have a time reported by uh teamwork but i noticed that it's refreshing the terminal uh too often and it's actually killing my battery so i stopped doing that uh no watch second counting what do you mean do you mean that one oh do you mean like this the age here that's the kubernetes features or maybe you mean uh maybe you mean this which is the the time for which i run the command uh i think that's the csh plugin that does that yeah yeah i'm using the uh i'm using this nice prompt which is called pure pure front and i think it comes bundled with that it's pretty nice yeah i think that was the last question um thank you everyone for listening thanks for answering all the questions and thanks guys for asking such amazing questions the recording will be up in a few days on our channel and you have the link in the chat here where you can see it cool thank you thanks everyone thanks sebastian thank you so usually that's the time where i would ask for the the best pub to go to and have beers next but yeah hopefully everybody's welcome to my house if you want to share a video thanks guys have an amazing weekend

Info

Channel: Official Elastic Community

Views: 5,225

Rating: 4.8297873 out of 5

Keywords: kubernetes, elastic, elasticstack, community, elastic stack, elastic stack on kubernetes, eck

Id: Wf6E3vkvEFM

Channel Id: undefined

Length: 61min 20sec (3680 seconds)

Published: Mon Oct 05 2020