Kubernetes StatefulSet simply explained | Deployment vs StatefulSet

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
in this video we're gonna talk about what stateful studies in kubernetes and what purpose it has so what a stateful set it's a criminales component that is used specifically for stateful applications so in order to understand that first you need to understand what a stateful application is examples of stateful applications are all databases like my sequel elasticsearch MongoDB etc or any application that stores data to keep track of its state in other words these are applications that track state by saving that information in some storage stateless applications on the other hand do not keep records of previous interaction in each request or interaction is handled as a completely new isolated interaction based entirely on the information that comes with it and sometimes stateless applications connect to the stateful application to forward those requests so imagine a simple setup of a node.js application that is connected to MongoDB database when a request comes in to the node.js application it doesn't depend on any previous data to handle this incoming request it can handle it based on the payload in the request itself now a typical such request will additionally need to update some data in the database or query the data that's where MongoDB comes in so when no J's forwards that request in MongoDB MongoDB will update the data based on its previous state or query the data from its storage so for each request it needs to handle data and obviously always depends on the most up-to-date data or state to be available while node.js is just a pass-through for data updates or queries and it just processes code now because of this difference between stateful and stateless applications they're both deployed in different ways using different components in kubernetes stateless applications are deployed using deployment component or deployment is an abstraction of pots and allows you to replicate that application meaning run to five ten identical parts of the same stateless application in the cluster if you want to know exactly how deployments manage parts and why this abstraction is needed check out my other video about that where I already covered this in detail also I make these kind of videos every week about kubernetes and other DevOps technologies so if you don't want to miss out on those either you can subscribe and click the notification bell to be notified when the next video is out so while stateless applications are deployed using deployment stateful applications in kubernetes are deployed using stateful set components and just like deployment stateful set makes it possible to replicate the stateful app pots or to run multiple replicas of it in other words they both manage parts that are based on an identical container specification and you can also configure storage with both of them equally in the same way so if both manage the replication of pots and also configuration of data persistence in the same way the question is what a lot of people ask and are also often confused about what is the difference between those two components why we use different ones for each type of application so in the next section we're gonna talk about the differences now replicating stateful application is more difficult and has a couple of requirements that stateless applications do not have so let's look at this first with the example of a my sequel database let's say you have one my sequel database pod that handles requests from a java application which is deployed using a deployment component and let's say you scale the java application to three parts so they can handle more client requests in parallel you want to scale my sequel app so it can handle more Java requests as well scaling your java application here is pre straightforward java applications replica pots will be identical and interchangeable so you can scale it using a deployment pretty easily deployment will create the pots in any order in any random order they will get random hashes at the end of the pod name they will get one service that load balances to any one of the replica pots for any requests and also when you delete them they get deleted in a random order or at the same time right or when you scaled them down from three to two replicas for example one random replica pot gets chosen to be deleted so no complications there on the other hand my sickle pod replicas cannot be created and deleted at the same time in any order and they can't be randomly addressed and the reason for that is because the replica pots are not identical in fact they each have their own additional identity on top of the common blueprint of the pot that they get created from and giving each pot its own required individual identity is actually what stateful set does different from deployment it maintains a sticky identity for each of its pots and as I said these pots are created from the same specification but they're not interchangeable each has a persistent identifier that it maintains across in a rescheduling so meaning when pot dies in it gets replaced by a new pot it keeps that identity so the question you may be asking now is why do these pots need their own identities why they can't be interchangeable just like with deployment so why is that and this is a concept that you need to understand about scaling database applications in general when you start with a single my sickle pod it will be used for both reading and writing data but when you add a second one it cannot act the same way because if you allow two independent instances of my sequel to change the same data you will end up with data inconsistency so instead there is a mechanism that decides that only one poll is allowed to write or change the data which is shared reading at the same time by multiple parts my sickle instances from the same data is completely fine and the pot that is allowed to update the data is called the master the others are called slaves so this is the first thing that differentiates these pots from each other so not all pots are same or identical but there is a must pot and there the slave pots right and there is also difference between those slave pots in terms of storage which is the next point so the thing is that these pots do not have access to the same physical storage even though they use the same data they're not using the same physical storage of the data they each have their own replicas of the storage that each one of them can access for itself and this means that each pot replicas at any time must have the same data as the other ones and in order to achieve that they have to continuously synchronize their data and since master is the only one allowed to change data and the slaves need to take care of their own data storage obviously the slaves must know about each such change so they can update their own data storage to be up to date for the next query requests and there is a mechanism in such clustered database setup that allows for continuous data synchronization master changes data and all slaves update their own data storage to keep in sync and to make sure that each pod has the same state now let's say you have one master in two slave parts of my sequel now what happens when a new pod replica joins the existing setup because now that new part also needs to create its own storage and then take care of synchronizing it what happens is that it first clones the data from the previous part not just any part in the in the setup but always from the previous part and once it has the up-to-date data cloned it starts continuous synchronization as well to listen for any updates by master pot and this also means and I want to point this out since it's pretty interesting point it means that you can actually have a temporary storage for a stateful application and not persist the data at all since the data gets replicated between the pots so theoretically it is possible to just rely on data replication between the pots but this will also mean that the whole data will be lost when all the pots die so for example if stateful set gets deleted or the cluster crashes or all the nodes where these pod replicas are running crash and every pod dies at the same time the data will be gone and therefore it's still a best practice to use data persistence for stateful applications if losing the data will be unacceptable which is the case in most database applications and with persistent storage data will survive even if all the parts of the state full set die or even if you delete the complete stateful set component and all the parts get wiped out as well the persistent storage and the data will still remain because persistent volume lifecycle isn't connected or isn't tied to a life cycle of other components like deployment or stateful set and the way to do this is configuring persistent volumes for your stateful set and since each pod has its own data storage meaning it's the own persistent volume that is then backed up by its own physical storage which includes the synchronized data or the replicated database data but also the state of the pod so each pod has its own state which has information about whether it's a master pod or a slave or other individual characteristics and all of this gets stored in the pods own storage and that means when a pod dies and gets replaced the persistent pod identifiers make sure that the storage volume gets reattached to the replacement pod is a set because that storage has the state of the pod in addition to that replicated data I mean you can clone the data again there will be no problem but it shouldn't lose its state or identity states are the same and for these reattachment to work it's important to use a remote storage because if the pod gets rescheduled from one node to another node the previous storage must be available on the other node as well and you cannot do that using local volume storage because they are usually tied to a specific node and the last difference between deployment and stateful set is something that I mentioned before is the pod identifier meaning that every pod has its own identifier so unlike deployment where pods get random hashes at the end stateful set pots get fixed ordered names which is made up of the stateful set name and ordinal it starts from zero and each additional pod will get the next numeral so if we create a safe we'll set called my sequel with three replicas you'll have pots with names MS equals zero one and two the first one is the master and then comes the slaves in the order of startup an important note here is that the stateful set will not create the next pod in the replicas if the previous one isn't already up and running the first pod creation for example failed or if it was pending the next one won't get created at all it will just wait and the same order is held deletion but in reverse order so for example if you delete a the stateful set or if you scaled it down to 1 for example from 3 the deletion will start from the last pot so my simple 2 will delete it first it will wait until that pod is successfully deleted and then it will delete my second one and then it will delete my sequel 0 and again all these mechanisms are in place in order to protect the data in the state that the state phone application depends on in addition to these fixed predictable names each pod in a state will say gets its own DNS endpoint from a service so there's a service name for the stateful application just like for deployment for example that will address any replica pot and plus in addition to that there is individual DNS name for each pot which deployment pots do not have the individual DNS names are made up of pod name and the manager or the governing service name which is basically a service name that you define inside the stateful set so these two characteristics meaning having a predictable or fixed name as well as it's fixed individual DNS name means that when pot restarts the IP address will change the name an endpoint will stay the same that's why I said pods get sticky identities so it gets stuck to it even between the restarts and the sticky identity make sure that each replica pod can retain its state and its role even when it dies and gets recreated and finally I want to mention an important point here is you see replicating stateful apps like databases with its persistent storage requires a complex mechanism and kubernetes helps you and supports you to set this whole thing up but you still need to do a lot by yourself where kubernetes doesn't actually help you or doesn't provide you out-of-the-box solutions for example you need to configure the cloning and data synchronization inside the stage full set and also make the remote storage available as well as take care of managing and backing it up all of these you have to do yourself and the reason is that stateful applications are not a perfect candidate for containerized environments in fact docker kubernetes in generally containerization is perfectly fitting for stateless applications that do not have any state and data dependency and only process code so scaling and replicating them in containers is super easy so this covers all the main concepts in order to understand stateful said and how to use them in later videos I will show you how to create a stateful set so we'll go through the stateful set configuration file in detail what are some additional attributes they're specific to stateful set and we'll also see all the other stuff that I mentioned here in practice so again click the notification bell if you don't want to miss out on the next videos so thank you for watching and see you in the next video
Info
Channel: TechWorld with Nana
Views: 161,242
Rating: undefined out of 5
Keywords: statefulset kubernetes, statefulset, statefulset tutorial, kubernetes statefulset, kubernetes statefulset vs deployment, kubernetes stateful vs stateless, kubernetes statefulset tutorial, how statefulset works, statefulset in kubernetes, why statefulset in kubernetes, deployment vs statefulset kubernetes, kubernetes deployment vs statefulset, what is statefulset in kubernetes, why statefulset, kubernetes statefulset example, kubernetes tutorial, kubernetes, techworld with nana
Id: pPQKAR1pA9U
Channel Id: undefined
Length: 15min 59sec (959 seconds)
Published: Sat Jun 06 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.