Zookeeper

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

let's start with our next discussion now there's something very important I would like to cover it cover today it's called zookeeper so we all talk about Hadoop HBase HDFS and everything but zookeeper is actually one of those tools which are very very important it's quite independent in the sense that the way the way everybody is starting from peg hive SD HBase is depending on HDFS HDFS depends only on zookeeper you will see how so there was this assignment which people get or one more thing there is an assignment for HBase and I would like to go through that assignment before it gets too late ok so the assignment I would like to describe that sometimes something I always miss so the assignment is this it's a subjective assignment okay so yesterday we discussed how how how to store the the data which was crawled with a relationship like which pays is pointing to which pays and and and so on okay so we emulate it one-to-many relationship in the HBase yesterday in front of the discussion today the assignment for you is let's say you have to design a simple Gmail Gmail like email storage a email storage which is similar to Gmail you have to emulate the only requirement is that it should be able to handle many users ok and it should be able to handle use data should be able to many users and it should be able to handle huge data and whenever a user comes in it should be able to give all his emails easily so you have to design the s based database such that such that it's it can handle humongous load as in it can handle petabytes of data okay so so the questions that you need to answer is what what what will be the tables in each of these tables what will be the row keys and E and what will be the column families what will be the columns under each column family for each of the table now it's up to you to imagine what all email system you want to provide but from my side all I want is the normal simple email I don't want any labeling I don't want all of this archive this and that all I want is that even the user comes in they should be able to query they should be able to get their email details okay so so that is the assignment for today alright now the next session next question which I want to ask you is you are writing a something called email processor email process it could be something like which is creating an email index of the email okay and you are not supposed to modify email okay so the server which contains the emails of the user is single this a single machine contains the emails of all the users and the processors run on multiple machines so there are multiple machines running the processor who can read the data from who can read emails from this inbox and the only only problem is that the email processors cannot modify the Inbox okay inbox is read-only for them okay the other insight is that each email has got a unique identifier now question to you is how would you make sure that no to email processors no to email processors process the same email okay if one email processor is working on an email don't process it again how would you ensure that please note that when you are running in multiple threads but on the same machine or you can say inside a same process multiple threads can create logs such that one thread does not does not step on other threads feet okay so that one thread does not access the same data so that multiple threads can coordinate but if there are multiple processes threads are totally useless okay threads of one process can access the data lock by threads in another process okay so in those cases what we used to do is we used to create a local file to coordinate but here these email processors are separate machines so what we need to do is we need to create a central database so that we can write we can write the data in that database and and coordinate and in that database you could actually have something like a bloom filter bloom filter is actually quite a great concept for situations like these okay the idea which I wanted to put forward is you need an external service for the coordination between email processors and then when we have been talking about distributed computing the idea is mainly around this coordination the idea is mainly around coordination amongst distributed machines so whether we has so many decent servers and n-space is trying to talk to them or these rhesus original coordinate among sister simi similarly date and all nodes are coordinating amongst them themselves then primary name node synchrony name node I have to coordinate between themselves and there are many more services that require coordination okay that require coordination similarly for our say cloud X lab right now we are building our own way so that our users can coordinate well with each other so so basically whenever there are multiple servers running manually in the network and then they need to coordinate amongst themselves it's very difficult to use the the it's very difficult to use the threads or a process based log locks because these are different machines okay so general idea used to be let's create a central database but again this central database could become the bottleneck because if you have a thousand processes running parallel e in the cluster and these thousand processes require a coordination a really really fast paced and vomit this this coordinator this central database to which all these email processors are talking all these processes multi thousands of processes are talking could become the bottleneck easily and that's where comes zookeeper zookeeper is a distributed coordination service for distributed applications it exposes a very simple set of construct of primitives such as like GATT and put just like a file system you have a folder structure you create a folder there so that nobody else can create folder in that so similarly zookeeper also comes with this nice tree structure so that many processes can create nodes inside nodes trees inside inside this tree so that all the processes can happily coordinate with each other okay and itself zookeeper itself is a distributed meaning locking system which looks like you know a robust single machine but inside it runs on many machines we will talk about it today because this is probably the very important idea okay so it is used for synchronization configuration maintenance and coordination service that does not suffer from race condition or deadlock okay because every system if you if since we are talking about big data big data need to be handled by many machines and all these many machines need to coordinate with each other at some point or another in those cases zookeeper will be your good way okay so basically this is what we are going to talk about the structure of zookeeper now why are we talking about zookeeper in detail because the structure of zookeeper is important because it's going to teach us something very very very important so far so far we have been having a situation where we were nominating who is master and who is slave we being the administrator we were nominating okay this guy is at space master and rest are the data node similarly this one is a name node others are data node and so on in case of zookeeper this is done through election just like in a democracy the master and slave is decided based on elections and that's why zookeeper is kind of truly distributed service Oh so we will go into the details of zookeeper today so zookeeper zookeepers architecture is something which is used by MongoDB and some more okay we will go into the details and it's very very important to learn this aspect all right - in installing zookeeper you can extract it and start it but you don't need to worry about installing on in the case of cloud X lab and also it comes pre-installed with Ambari if you are using Ambari for installation okay so I've just run my okay so right now what does it say Zelda in support is up and okay alright so once you login into the console here you type the command called zookeeper - client and that way you will be able to see the zookeeper ok that's how we are going to start let me just let me just explain to you what does it mean by zookeeper so let's go with the practical first and then theory later ok so the data model is more like a tree structure ok more like a tree structure it contains basically the nodes one inside another inside another the top label is slash slash contains one node called you here and inside you you have dark good and calm ok so each node is called Z node as inside node and Z node can have children children can have data ok every G node can have children every G node can have data Z note form a hierarchical space also these kind of names are illegal like this are illegal and on every node every node contains the name of the node as well as data so every node knows whose parent who who is its parent as well as what is its name and what kind of data it is storing what what data is it storing so when you change the data on a node okay you cannot say append you can have you can only overwrite the data or change the data okay so that's in very simple words that's our data model okay so let's before jumping into it going into theory let me try to show you okay so we are into zookeeper client right now let's have a look at the way you see the tree structure is just like Unix you say LS / you can see all the sub nodes of slash so right now there are so many nodes and why I why there are so many nodes because the whole system is actually using zookeeper okay in every folder different different system is keeping its data for example here we have crazy we have brokers brokers Kafka brokers storm is also keeping its data and HBase is also keeping its data under at base unsecure you can see that you have under HBase you have all of this information inside zookeeper what table locks are enabled what are the reason servers so you can see inside zookeeper everybody keeps its data okay data is in configuration right now under reason server there are only these two region servers active right now there are only two regions are worse running and as and when more region servers will run there they'll make their entry here okay so you can see LS / it shows all the Z nodes which are children of slash okay and you can go further into each one like this okay at every node you can say get will show you the data right now it's data is blank okay and let me go till the last point zookeeper Kota okay let me say get you can see again it's data is blank so it's okay to have a blank right huh all right let me go into something meaningful let's say I say get slash 12 of December I'm saying give me the data of 12 December so it says its data is initial all right I'll talk about all of the details so you are comfortable with this so you can see under a config config is I think it's kind of Kafka country it has topics inside it you can see there are topics inside it and let's see get okay so there is no information inside it and let's say I want to check what is the topic containing topic is containing this kind of a JSON data okay say it's version number there is conflict this okay so every node has got a name it has got a children okay like for example these are the various nodes which contains held their names and has got some children and every node would also optionally would have children for example storm has these many children and under credentials they'll have more details and so on if you want to see the data what is there inside here okay you can see that all right that is this data model let's go one step further now there are different kind of nodes by default every node is persistent you make a change to it it is persisted and capped okay and it remains there and you explicitly delete it okay so the way you create a node is in a very simple way you can say create slash slash is the parent and let's say we want to create a Z node called 6 fab we want to create 6 fab and we want to put our secret data inside it okay so we have created this node called 6 fab is created here we are using create command and then here you can see if you want to see the data of 6 fab we can say get this ok here it's basically expecting us to give the JSON data okay I think it wants us to okay you can see that it has created the data that way and alright alright so we have been able to change the data get the data created node and we can also create another node inside six fab Sundeep okay and okay we can put the need some information to some deep saying he's an instructor okay so we can say okay so now you can see that we have been able to create this nozzle okay okay so we have been able to create a node a lesser node and then update a node and get the data of the node alright so the nodes which we created just now were the the nodes that we created just now were basically persistent north okay we can delete these nodes by saying delete six fab and something okay okay so you can see now the data has been deleted if you are also logged in into a zoo keeper Clyde you'll be able to see what am I seeing right now okay so this way you can create delete check the details check the sub children and and so on of a particular node inside so people all right now there are some very interesting kind of nodes you can create an F formal node which will be deleted by zookeeper as soon as the session ends I'm connected right now as if I create a formal node it will it will basically disappear as soon as I disconnected okay so it cannot have children let me just show you first okay so let me eight an epimer load by saying create - e6 fab okay and the knee no title is Sandeep is online okay sorry this somehow hits okay so I'll say some data here okay so I've created this zookeeper node okay right now you can also see there is one node called Sandeep is online if I exit so the session has been ended okay that particular session which created that node okay has ended therefore therefore that node will disappear disappear after some time okay that node will disappear because that session has been ended so these are a thermal nodes Everman node exists as long as your session is active it cannot have children not even the a formal ones though it's tied to the client session but it is visible to everyone you can create your a formal role like this okay so that is now let's check it has been deleted let me login again and check six fab all right it has disappeared after some time okay so this is like creating an attendance system so every node needs to keep on sending pings to the server so that the server keeps their sessions active and therefore it will show their presence okay so basically a sequential then another kind of node is called sequential node what is the sequential node in case of sequential node zookeeper creates a sequence number okay the sequence number starts from zero for every parent okay for every parent the sequence number starts from zero so and they see whatever when when you are creating a sequence a sequential node you just provide the prefix the sequence number is offended by zookeeper to the prefix okay for example I will show you right now how many nodes have been so far created in six fab we have created around three nodes inside six fab one earlier which we deleted then we created an a formal load so I think we have created two two sub nodes inside six Feb 8 C quenching node inside six fab it okay let me just provide this my prefix I'm just writing my prefix inside six web and or and some data inside it so it will automatically create this node you see the name of the node is my pref - so it restarted with zero four zero it gave to the first node which we created called sandy and then second node one was the one which got which disappeared and third one is the one the sequential node so this equation running number keeps on going for every node it's just that it's utilized in in while creating a sequential node so every parent keeps account of their children by the way of the sequence number okay so based on this you can always figure out how many children what is how many children have been created for a particular node okay so you can see that and let me also check if we can say get six okay alright so this way you could keep a count of what's going on so this counter keeps on monotonically increasing if I create another node here it will increase by one okay you can see that it has increased by one I can even if I change the prefix the counter will be in the same way you can see if I provide empty then the name the node of the Z know the name of the Z node will be only the number okay so this is another kind of Z node called sequential node all right now the architecture this is very important I hope everybody is up and running with zookeeper can we see these g-goo zookeeper node on you yes you can okay you can see the zookeeper here all right it's showing the current status also what I was about to show you that also you can see okay right now you can see how many machines are there were 2 3 4 5 after : it's the port number which is 2 1 8 1 and B 4 : is the IP address so you can see that here there are 5 machines on which Zhu keeper is running finally this is a cluster okay one of them is marked as leader rest are followers ok now this the folder structure you can see so generally you can see the zookeeper from here nobodies now let's move on to the next point all right now let me talk about the architect this is very very important so everybody should pay attention to it this is going to be because it uses strategy called Paxos algorithm okay so the design of oozy is based on Paxos algorithm and if you read the paper it's actually quite unnecessarily complex I'll try to make it very very simple if you have a question just let me know okay it's if you are interested in politics you will definitely enjoy it and even even people with the little you know it's basically a very interesting way to look at how systems talk to each other okay it's quite an interesting thing okay so and even if you you you like the the the Indian the SAS bahu kind of serials then also you will like the Paxos algorithm all right and let me let me start with it okay all right so so basically the design of zookeeper has two modes one is called stand-alone which is just for testing and and and it has no high availability it's not that interesting that's just for those people who are running virtual machines of any of this will be using zookeeper in that mode okay but the real thing about zookeeper is the replicated mode so in replicated mode zookeeper runs on a cluster of machines and we call it n symbol we give it a nice name called and symbol this cluster of machines cluster of machines in many computers connected over a network okay over a Wi-Fi or whatever you want to use all right now in this it's quite highly available okay will show you how it's highly available and it keeps on running happily as long as majority of the computers are up majority means more than half more than half majority in case there are three computers majority means two okay in case of two computers majority means two in case of one computer majority is one in case of 20 computers majority is 11 okay so this cluster will this example will happily run as long as as long as there is majority of the computers are available all right now the architecture so initially what you do is you install zookeeper servers on every machine in your cluster okay so all these machines let's see here there are three machines a B and C all these three machines are running the zookeeper server server as in this is a program called server which runs on two one eight one four as you saw just now and all these things are running parallel E and talking to each other they know about all these three machines know about every other server running the zookeeper ok so we turn them we switch them on and then what happens is something interesting they start talking to each other and the first thing they do is first thing s we we start these machines we basically configured this server and then we started the zookeeper servers on all these three machines and what magically starts happening is they all start communicating to each other and the first thing they do is all of them first thing they do is do the leader selection the gossip amongst themselves to select a leader and basically whosoever is more responsible so every every machine makes the opinion about other machines based on number of pings responded by the other machines and then they reconcile the nodes all machines reconcile the nodes and then the machine who is designated as the leader wins and becomes the leader and rest become the rest become the follower okay now what is the significance of other of being the leader the leader all the nodes who want to change something like we wanted to create a node they will be redirected to the leader okay followers will only be used when we say get me the list of children or give me the data no give me a Z node data so for every lead request followers will be used while all the write requests the requests that involve change would be forwarded to the leader okay so as soon as the network is up the nodes are talking to each other voting about each other gossiping amongst each other and their election happens and the somebody becomes the leader and the moment somebody becomes a leader rest of the followers synchronize their state with the leader okay if the leader fails at any point of time somebody else will become the leader based on the election so as soon as a leader feel fulfills or leader steps down election happens and somebody else becomes the leader and then and then the same process keeps on happening automatically and the election takes around 200 milliseconds if the majority is not available at any point of time the leader will step down okay if so in democracy if lesser than 50% votes if lesser than 50% people 50% or the layer or lesser people voted then election is invalid there is no leader the whole system will remain read-only there will be no change that can happen in this system so majority is not available at any point of time the leader himself even if there is leader they he will step down okay now once the election is successful and leader has been elected what happens every night request is forwarded to the leader leader broadcast the update to the followers when the majority have saved or persisted that change so when we said create node that change went to majority we went to the all the nodes okay which means the most important thing you need to understand the I mean not most I mean one of the very important thing that the data on every node is same okay any change you make is propagated to the whole cluster that means you should always store small data you should always small store small data on zookeeper cluster so when majority saves the change only then the leader says commit and then client gets a response that your your change were successfully written okay majority is again the key word here meaning in case of five node when if I have five nodes in the cluster three is the majority and on unless three nodes successfully write the change I will not get the response but luckily since all the three rights are happening in parallel so it does not take much time the protocol for achieving the consensus is atomic like two-phase commit okay so when we are basically reaching to a consensus like I mean when we are writing the data is basically a commit now what most other important aspect about zookeeper is that it saves the data to the disk before the memory so there were ensamble and symbolizing government bunch of machines we started on every machine we started zookeeper server so zookeeper service is nothing but a process listening on a port and talking to other machines ok every machine knows who are the who are the zookeeper servers available in the network so so the zookeeper servers talk to each other elected a leader based on who has the latest data and who responded to the things really really well and then did you they choose a leader and then they synchronize the state with the leader and then the story starts and all the right requests whenever there is a change delete or update or get or set the request goes to the leader while read request can be served by anybody all right so this is a simple architecture now question to all of you if you have three nodes a B and C a being the leader and a dies a being the leader and a dies as in a cat disconnected from the network will somebody become the leader if you have three nodes a B and C will somebody become and leader if if a is no longer available yes that's right because two is still a majority okay so as soon as a is dead automatically the machines will talk amongst each other somebody will become the leader okay now either B or C based on circumstances will become the leader now question is what happens when B dies let's say let's say C got elected as a leader and after a while B is also dead only see who is the leader he left will see step down from being the leader or will remain the leader so okay let me let me explain again so when a was dead two nodes four left two out of three is he still a majority 3 divided by 2 is 1 point 5 and 2 is greater than 1 point 5 therefore therefore 2 is a majority there for election happened now what happened was only one node is left so the leader will step down because he is the only one left or there is no majority in this system okay so we will this is something which took me a while to figure out why is it so at in the first I thought that okay at least this is one computer available less not it should have captain running and serving the requests but see steps down from being the leader and becomes the follower and so no one becomes the leader C will become the follower okay since majority is not available a is lesser than 1.5 which means less than 50% of 3 therefore there is no leader even if there is a leader in these circumstances he will step down why would it step down yes basically it is as soon as the lead there is no leader what happens is there is no right it was being served the whole cluster the whole zookeeper becomes read-only so any ideas why C has such kind of behavior why do we need match already let me go to another now let's come to the actual answer I think you will appreciate this part let's imagine that there is this data center there is this network where you have data center one data center to having total number of six nodes and a being the leader is the leader while others are follower please note that out of six nodes if three nodes are dead the leader will step down even if there are three nodes available it will step down because I mean let us understand the reasoning behind it let's look at this there are two data center connected by a wire one data center containing a B and C and other data center containing d e and f now a is the leader and we have all of them are happy b c d e f are all followers and serving all the feed requests while the chains are being forwarded to a any change comes to D is forward into a and so on so everybody is forwarding the write request to a and they are serving happily the read requests now something strange happens here in the network and the connectivity between data center one and data center two is dead so now on one Island there is a B and C and on another there is B in these circumstances what what goes on is now imagine that majority kind of rule was not there let's say zookeeper did not follow majority kind of a rule and elected whosoever gets maximum votes what could have happened is booth the islands would choose their own leaders and will he start accepting write request any write request is coming to say f is being forwarded to D so let's imagine that this could have happened it could have been disastrous the the cable got disconnected between the networks the data center one had chosen its leader called a and data center two had choosen the another leader called D had majority rule was not there in the architecture this could have happened and what would that lead to two leaders in the same country everyone running the system their own way and hawai does it become disastrous when it comes to the information imagine that the same tree of information having two separate variation in one there are some more nodes getting added other in another some other deletions are happening it could have been beyond reconciliation you won't be able to reconcile okay so the data could have been equivalent to getting corrupted if the majority was not there okay now if that happens in reality what happens is that the leader a noting down that okay only the vote of B and C is available to us meaning only the three nodes are available right now so leader a would step down and in the data center - nobody will be the leader all right so basically here all right let me repeat again quickly so basically if there is this kind of scenario if you and and there was no majority kind of a rule majority rule says that for election there should be at least 50 / 50 percent people available for more than 50 percent people should be available to vote in this kind of scenario when the network got disconnected here there is a three people available to vote and the other side also there are three people available to vote in case there was no majority rule both the data center will choose their leaders and both the leaders will start operating at the same time taking the right requests which means the consistency of the data is lately lost two leaders means to write requests parallely being served by two different nodes changes are happening to the same thing parallely by two different nodes and the whole integrity of the system is gone for a toss all right therefore therefore it is to ensure that there is always one single leader this rule of majority no parties no partitioning of the network this is called partitioning of the network no partitioning of the network should cause multiple leaders therefore it is because there can be only one majority in any kind of a topology it's like saying that it's my birthday and I should get the biggest cake and how can I make sure that I get the biggest cake by saying that I should get little more than 50% similarly to ensure that there is only one leader there is only one such Network it is made sure that leader says I need more than 50% majority and that can be only one not more that they cannot be more than 150 percent majority in any network topology ok a quick question to all of you in an ensemble of 10 nodes can tolerate a certain on how many nodes so it can only tolerate the shutdown of less than 50 not half so the answer is 4 all right there's some more details I need to provide you so zookeeper provides you an idea of sessions just like you have web sessions or tcp/ip sessions similarly zookeeper also provide you sessions a client the client which you use for connecting to zookeeper servers has the list of servers in the available which means the client maintains all the list of IP addresses it will keep on connecting to each until its success then it creates a new session on the server side as well a new session is created for the client okay and every session has got time period decided by the caller as in the client okay just like any other observer okay if the server has not received a request within the time period it might expire the session if the succession get expired FML nodes are lost to keep the sessions alive the client keeps on sending the things are called heartbeats okay it's like it's like you keep on refreshing a web page so that it doesn't timeout okay similarly the client keep on doing that but good news is that your program doesn't have to do it the client library underneath which you're going to do will take care of all the heartbeats also this client library handles the failovers and sessions are valid on switching to another server application can't remain agnostic to the server connection because while your software your application is trying to create an old maybe there is a real action happening okay so you have to be aware that the connection might fail okay so this is a simple connection as in you create a connection it's connecting and then it gets connected you get an update and then the connection gets closed now one of the big use case of zookeeper is if you have millions of clients connected to zookeeper aku to a web server so on the back end you have a load balance service load balancer which means a service being run on many machines parallely and and there are thousands of clients who would use this service parallely okay now how to discover which server is up which server is down okay one way was that all the clients are maintaining a list of servers and keep on pinging to every server and checking which server is up the right way is to use zookeeper as the coordinator in the middle okay so every server will create their a formal nodes so that their a formal nodes will be up as long as the server is up in case server goes down the referral node get disappear will disappear and the client will not talk to them okay on one end all the servers will create their a formal nodes and on the other hand the clients will talk to the zookeeper and get the list of all their formal nodes and will try to connect to only those which are active right now okay so this is let's say there is a server which connected to zookeeper and said hey I create my connection and then it creates a Z node called duck under servers then another server comes up says create a Z node a formal Z node called cow under servers and at this point of time if somebody comes as in a client comes and says give me the list of available servers zookeeper will say it's duck and cow but after a point of time if duck server gets killed this duck server get killed so this a formal node will disappear and if that disappears the the the Z node will disappear and which means they'll the list LS less server will only return cow not the duck okay so this is how the coordination between the discovery of the client server from the client would happen okay so it guarantees that there will be a sequential consistency update from any particular client are applied in the order like the way we were talked about eventual consistency here there is a sequential consistency consistency because every change has got a number on it a Tommy City either up either Zee node change fails or successful there is no Midway this is a single system image a client will see the single system image okay the new server will not accept any connection until it has caught up okay so as we are told that it provides durability as in if a change is a change it is not lost it is not in the memory it is in the disk time like dime leanness rather than allowing a client to see very stale data a server will automatically shut down okay this is a very important part a server will kind of commit a suicide if it is stale for quite some time alright now there are various kinds of operations which we have already seen create delete exist get CL and get a CL as in access control list as you can troll list is something with not too crude critical for us basically when you have many programs in the network they might want to access so we might want to control who is who is modifying so if you have many many clients you want to secure one Z node from another and that's when the access control is come into play all right now the update could be multiple or single ok so this is this is the API for zookeeper is quite quite strong it provides creep it provides you a way to combine multiple operation into single just like database transactions in a database transaction you can say a during a duro updater or delete a row in a single transaction all of the changes will happen in a single shot in the similar way multi update make sure that all the changes like creating as you know deleting something else creating tanzy nodes all of this would happen in a single shot there will be no Midway state all right so the API which zookeeper provides you are of two kinds which is one in Java other in see the contributed ones are for Python and rest so the interesting part is that there are two versions of every function one synchronous other a synchronous what does it mean by synchronous and asynchronous all right so there are two ways to talk or two ways to communicate okay let's say you make make a call to me and and start discussing things okay saying okay so how should I say let's say you want to check that the session has been started or not okay how should I say let's think about the hotel reception okay so hotel auu there is a hotel reception you're staying in hotel and you are expecting a visitor the Riven way that you keep on calling the reception again and again and asking them that has my visitor come in has a person called Sandeep Giri come in to visit me or the hotel reception says to you that hey I will call you back when somebody names and degree comes in okay so in one case you continuously pinging okay in another case you've just told them they'll call you back okay so in one way you can say that or you called the receptionist and said stay on the phone until sandy berry comes in that is synchronous call you are waiting until the work is finished okay another way a synchronous you your call finishes quickly and you get a call back when the real event happens okay every publisher subscriber model is in the same way so there are these api's which provide you the callback and you can also there's something called watches so sometimes you want to get notified saying when something happens just inform me like we had triggers and post processors and pre-processors in HBase so here you want to say that you want to get registered you you want to get watches when something happens on the Z nodes okay so so what these are watches are triggered only once just like in JavaScript so you will have to basically rear edge astir every time okay you cannot keep getting the updates you will get only update once and then in that update logic you could subscribe me subscribe okay so when does watch trigger the watches that are created in the function called exists are triggered when the nodes are created deleted or updated okay or get data the the watches of gad data gets triggered when the z node when a z node is deleted or its data get updated the watches because the watches are configured when you call a function called get like exists get data or get children so the watches of get children get invoked or get executed or get called when a node is deleted or any of its children get modified okay so those are called watches the other thing is ACL access control lists you can have the like authentication and authorization okay the authentication could be based on login password or it could be based on somebody called Kerberos or it could be based on the IP address all right so access control list is a function of what kind of authentication you are using the identity of this of that scheme and the set of permissions all right we have gone through hands-on we've already tried all of this you can try by yourself all right so the other use cases for zookeeper are building a reliable configuration service ok another is building a distributed log service since that not to do many process can talk to the same I mean not to process have the race conditions ok all right now when not to use zookeeper that's a very important part do not store big data inside zookeeper the total number of copies are equal to the total number of nodes so and also all of the data that is store you store in Z node will be loaded in the lab so zookeeper is blazingly fast zookeeper is extremely reliable and zookeeper is sequentially consistent or you can say heavily consistent ok but the downside is that it keeps data in the RAM and it keeps many copies of the data so if you are trying to store a one gigabyte data and you have 10 number of nodes it's going to consume overall 10 gigabyte of data and 1 1 gigabyte of RAM on every machine ok so also it provides you sequential consistency not strong consistency although because here the consistency is going to be great but it's not as compared to using a singular system like a database or like a file system so it's not extremely strong consistency it's basically sequential consistency so that was zookeeper

Info

Channel: KnowBigData

Views: 5,851

Rating: undefined out of 5

Keywords: Apache Zookeeper, Zookeeper, Big Data, Hadoop

Id: XzypROJqDIM

Channel Id: undefined

Length: 61min 45sec (3705 seconds)

Published: Thu Sep 14 2017