Kafka Tutorial | Learn Kafka | Intellipaat

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey guys welcome to this session by Intellipaat Apache Kafka is used for building real-time data pipelines and streaming applications uber Spotify and slack are a few companies who use Kafka in their technology stack and in this session we'll be learning about Kafka comprehensively and also before moving on with this session please subscribe to our channel so that you don't miss our upcoming videos now let us take a quick glance at the agenda we'll start off with a quick introduction to Kafka and its features and after that we'll look into Kafka topics and partitions after that we look into the workflow of PUB /SUB messaging moving on we'll be looking at the various CLI tools in Kafka and after that we'll learn how to configure a single node in Kafka and finally we will do the multi node cluster setup also guys if you want to do an end-to-end certification training on Kafka Intellipaat provides a complete certification training on Kafka and those details are available in the description now let us begin this session let's start by understanding what is the need of Kafka so the currently industry is emerging with the lots of real-time data that needs to be processed in real time so this is the 21st century and all of the top organizations over there are generating the lot of real-time data so let us look at some examples so we've got sensor data when it's used to predict the failure of a system ahead of time now we've got all of these sensors generating real-time data and it's very important to understand and process this real-time data in a very quick way similarly we've got real time economic data and it's based on the preliminary estimates and is frequently adjusted for better estimates to be available so this is all of the financial data which is being generated from the stocks and so on and if we tap into this real-time data which is generated by the stock market then it could be a huge boom for our economy now what happens is all of the organizations can you have multiple servers at the front end and back end so we can have multiple servers at the front end and like web application server for hosting websites or applications and we can also have a lot of back-end servers now all of these servers will need to communicate with the database server thus we'll have multiple data pipelines connecting all of them to the database server so this is what we have so let's say we've got this organization which has all of these servers over here so this is the front end and this is the back end so we've got the front end server we've got a couple of application servers over here and we've got a chat server for similarly at the back end we've called a database server or security system server and we've called our real-time monitoring server and they've got a date over that house now all of these servers which are present at the front end would want to interact with the database server and not just the database server but all of the servers which are present at the back end so you see over here these are all of the data pipelines which are present now let's just take the keys of this front-end server and this database server so you see that these are all of the data pipelines which connects the front and server to the database server similarly these are these are the data pipelines connecting this application one server to the database server now we've just got four servers at the front end and five servers in the backend and we've got like almost 50 data pipelines connecting these servers to the front end and back end and these are a huge amount of data pipelines and dealing with all of these their pipelines can be a very cumbersome and time-consuming task so as we see that the data pipelines are getting complex with the increase in the number of systems or an adding a new system or a server requires more data pipelines which will make the data flow even more complicated so let's say I go ahead and add a couple more front end servers and a couple more database servers over here now just imagine the number of data pipelines interacting over here but that is a very complex system isn't it and managing all of these data pipelines becomes very difficult as each data pipeline has its own set of requirements so if you got a thousand data pipelines between the front end servers in the back end servers then managing all of these thousand data pipelines is a very very cumbersome task and even adding or removing some of these pipelines is difficult in such cases so this is what Kafka comes in to solve this problem now Kafka is basically a messaging system now what is a messaging system so just consider that this is a system which is there between the front end servers and the back end servers and this basically decouples all of the data pipelines now what happens is you've got producers which produce all of the messages or generate all of the data and all of the data is stored in the form of a stream in the Kafka cluster well you can consider Kafka cluster to be a group of servers which are known as brokers now all of this data is being generated in real time from these producers and that data stored in streams in this Kafka cluster ok guys a quick info if you want to do an end-to-end certification on Kafka Intellipaat provides a complete certification training on Kafka and those details are available in the description now let us continue with this session now this consumer generates a request and it takes in or consumes the data from this Kafka cluster so this is how the process flow goes so the producer generates all of the data which is stored in the Kafka cluster and from the Kafka cluster the consumer consumes all of the data right so we see that the number of data pipelines have decreased over here right now let's understand what exactly is Kafka so Apache Kafka is an open source distributed publish/subscribe messaging system that manages and maintains field time stream of data from different applications and websites so this basically means that this is an intermediate system between all of the producers and all of the consumers or the front-end servers all the back-end servers and it basically provides a proper system with which we can deal and maintain the real-time stream of data so Apache Kafka Basically originated at LinkedIn now they had a problem to solve where they were dealing with huge amounts of real-time data so that is why LinkedIn part of using a publish/subscribe messaging system and they came up with Kafka now once they understood how important and valuable Kafka is that is when it later became an open source Apache project in 2011 and then it became the first class Apache project in 2012 now it's a simple fact so Apache Kafka is written in Scala in Java and it is extremely fast scalable durable fault tolerant and distributed by design so these are like some basic features of Apache Kafka all right now let's properly understand the solution provided by Kafka so what is happening over here is apache Kafka basically reduces the complexity of the data pipelines and it makes communications between systems simpler and manageable and with Kafka it is very easy to establish remote communication and send data across a network so you can establish asynchronous communication and send messages with the help of Kafka so what do I mean by a synchronous communication so this basically means that the producers over here keep on sending the messages in the form of a stream to the Kafka cluster and they do not have to wait for an acknowledgement from the Kafka clustered right so these producers will keep on sending the messages which would be stored in the Kafka cluster and they would be consumed by all of these consumers over here so this sort of ensures that there is a reliable asynchronous communication between all of the producers and all of the consumers right now let's go ahead and look at all of the Kafka features so Kafka is highly scalable in Nature because it has distributed systems with no downtime now what do I mean by distributed systems so let's say if you had just one server which would take care of all of the messages consuming from the producers and only this server sends all of the messages back to the consumer now dealing with this huge amount of data by a single server is very difficult so this is where they've got something as a Kafka cluster who have we got multiple brokers and all of these multiple brokers simultaneously take care of all of the messages coming from the producer over there and Kafka can also take care of huge volume of data so Kafka can take care of terabytes of data which is continuously being generated by all of the producers and it can seamlessly send them to the consumers at the back end and Kafka also provides fault tolerance let's see if there is failure of one node then all the data which is present in that particular node would have a replicas stored in some other systems so let's say if there is one broker which feels then the data which is present on that broker would also be present in two or three more brokers right so this is how Kafka provides fault tolerance and Kafka also provides reliability because it has distributed partition replicated and fault tolerant next Kafka also provides durability because it uses distributed commit logs that as messages persist on the disk as fast as possible so the producer sends a message and this message is immediately stored in the form of a distributed log in the disk and the performance of Apache Kafka is also very very good now when I say performance what I basically mean over here is high throughput and what do I mean by throughput so throughput is basically the amount of information which is being passed in a particular amount of time let me just say that Apache Kafka enables huge amount of information being transferred in just one second of time so this Kafka cluster can taken terabytes of messages from the producers in single second of time and then transfer these messages to the consumers in a single second of time so this is what is known as throughput so here you have something known as the producer throughput and the consumer throughput so producer throughput is the amount of information which the producer is generated in a particular amount of time and the consumer throughput is the amount of data the consumers can consume in a particular amount of time so this is how Kafka ensures that there is very high performance and Kafka also provides zero downtime now Kafka is very fast and guarantees zero downtime and Zero data loss and another feature of Kafka is it is extensible so there are many ways by which applications can plug in and make use of Kafka so you can used on any platform and use it for multiple purposes right so now that we've looked at some of the features of Kafka let's go through the components of Kafka so let's start off by understanding what exactly is a Kafka broker so Kafka brokers are the servers that manage and mediate the conversation between two different systems so this is basically that server which makes sure that all of the messages which are coming from the producer are properly stored in the Kafka cluster and it also ensures that these messages are properly consumed by the consumer so this is the intermediary between the producer and the consumer or between the front end and the back end and brokers are also responsible for the delivery of messages to the right party so next we'll understand what exactly is a message so messages are simple byte arrays and any object can be stored in any format by the developers so the format of these messages could be in the form of string JSON a through and so on right so simply put messages are just simple byte arrays which are sent from the producer and then broker sends it to the consumer which requests for it right now we'll understand what exactly is a topic so in Apache Kafka all the messages are maintained in what we call as topics so consider it like this so let's say there's this huge organization and there is data related to different things so that could be data related to sales there could be data related to accounts there could be data related to technology and there also could be data related to analytics now all of these could be different topics so let's say I just pick up sales over here so sales could be one particular topic now all of the messages which are related to the sales topic would come under one category so over here topic basically means that there is a particular category of messages which could be clustered into and all of these messages are stored published and organized in Kafka topics next we'll understand what exactly is a cluster so in Kafka more than one broker notice a set of sewers is collectively known as a cluster so when you have just a single broker so it is just a one broker Kafka architecture and when you have more than one broker that is known as a Kafka clusters there is basically a group of computers each having one instance of a Kafka broker so let's say if you have three computers then all of these three computers or servers would have by an instance of a Kafka broker so next up we'll understand where exactly our producers so producers other processes that publish data or messages to one or more topics so this would come at the front end or these are basically entities which would generate all of the data and they are basically the source of data stream in Kafka next we'll understand what are consumers so consumers are the processes that read and process the data from topics by subscribing to one or more topics in the Kafka cluster right so we already know that consumers are used to consume data from the Kafka cluster now the consumers can either consume one or more than one topic right so let's say there's this one consumer group and it wants to consume only data or only messages related to the sales topic that could be another consumer group which wants to consume data and with respect to the analytics topic as well as the tech topic right so this is how consumers work next we'll understand what up partitions so every broker holds few partitions and each partition can either be a leader or a replica for a topic now what basically happens is when topics are sent from the producer to the consumer these topics are divided into partitions so you don't really send the entire topic as a whole from the producer to the consumer so these are divided into a set of partitions and these partitions are were restored in the cluster and all the rights and reads to a topic go where the leader which is responsible for updating replicas with new data and unfortunately the leader fails and the replica takes over as the new leader so this is why Kafka is known as fault tolerant now let's go ahead and understand the architecture of Kafka Cluster so we've got producers over here and these producers send messages in the form of topics and which are received by the consumers over here ok guys a quick info if you want to do an end-to-end certification on kafka intellipaat provides a complete certification training on Kafka and those details are available in the description now let us continue with this session now as of I've already told you these topics are divided into partitions over here this is a one group of cluster so these producers send a topic to this broker over here now this is just one topic right keep this in mind so we've got topic 1 and this topic 1 is divided into three partitions so we've got partition 0 partition 1 and partition 2 and all of these three partitions are stored in a single broker over here because this is a single broker cluster now this topic would be consumed by the consumers which are present over here so this is the basic architecture of a Kafka cluster and the same thing is happening over here so we've got data or the topic which is sent to the broker over here and this is a single topic over here and the single topic is being divided into three partitions now when the producer sends this topic it is basically stored in the form of what is known as offset so consider that there are three messages which are being sent by this producer over here right and all of these three messages correspond to the topic one now message zero and message one so each message is tagged with a offset value over here so let's say there is message one which is tagged with offset zero and that has message two which is stack with the offset one and there is message three it is stack with the offset 2 over here so this is how it happens so all of the messages are tagged with an offset value and are stored in this cluster over here and whenever the consumer request for this topic you got it from the broker now let's understand the workflow of a Kafka producer so these producers send records to the topics so these records are nothing but the messages which are sent as topics to the program now these producers select the partition to send the message per topic right so these are like randomly selected and these producers just select one particular partition to send one particular message per topic to these partitions and its producers can implement priority systems which are based on sending records to certain partitions depending on the priority of the record and the send these records to a partition based on the record scheme so producers don't only wait for acknowledgments from a broker and send messages as fast as a broker can handle so this is the asynchronous communication which we are talking about these producers keep on sending these messages or records to the broker over here and it will not wait for an acknowledgement back from the broker so as and when the broker can handle all these messages we'll keep sending these messages to the broker now let's understand about the Kafka broker so Kafka clustered typically consists of multiple brokers to maintain the load balance there was already told you if there are multiple messages being produced by the producer then handling all of them by single broker would be a difficult task that is why we have got more than one broker which is known as a Kafka cluster and a broker on receiving messages from the producer assigns off sense to them and commits the messages to the storage on the desk so as you see over here these are basically the offset members so this partition or this broker has received these two messages from the producer and it has assigned these offset values to these messages and it has stored them on the desk now his broker serves the consumers by responding to fetch request so partitions and one broker instance can handle thousands of reads and writes per second and terabytes of messages that is really huge isn't it all right so if we just take this one particular broker over here this one broker itself can handle thousands of writes per second and terabytes of messages now backups of topic partitions are present in multiple brokers and if a broker goes down one of the brokers containing the back of partitions would be elected as the leader of the respective partitions now let's understand about Kafka topics and partitions so messages in Kafka are categorized into topics which we already know now again let's take the same example let's say there is a school and there are different tables over there so there would be one student table there would be one teachers stable and there would be one table relate to departments now we can consider all of these three different tables to be three different topics and there would be data pertaining to these three different topics over here right so the messages or the data related to these three different topics are categorized individually into these topics and these topics are basically broken down into a number of partitions and this messages are written to it in an append-only fashion now what do I mean by append-only fashion this basically means that I've got a producer over here and it sends the first message or the first record which would be given the offset zero after that we'll go ahead and send the next message which will be assigned the offset one and then it will send the next message which will be assigned the offset to so you can actually not go back to the earlier messages or you can skip one particular message and go forward right so this would happen in an append-only fashion and you will be writing message by message so this is how producers write the messages into the broker but reading messages is different so the reading messages can either be done in the order from beginning to the end or we can actually skip certain messages or we can actually go behind or we can refine to any point in the partition by providing an offset value now let's say this consumer a wants this message at offset zero from partition one right now it doesn't have to read this message and it can only read this message over here similarly if this consumer B and if it wants to read only this message with the offset value one from partition two then it can do it and ignore these two messages so this is how consumers work now this offset value is basically the sequential ID provided to the messages another thing to keep in mind is partitions provide redundancy and scalability so partitions can be hosted on our different so that is a single topic can be scaled horizontally across multiple servers thus enhancing the performance all right now let's take this example to understand Howard topic is basically divided into different partitions so this figure shows that we have a topic which is divided into four partitions with rights being appended to the end of each of these partitions so this is the topic and this topic will have one particular name and it has been divided into four partitions and these are partition 0 partition 1 partition 2 one partition 3 and these are all of the messages which are present in partition 0 right and the writing takes place at the end of this rate so we basically happened the next message over here similarly the same thing happens at partition one partition two and partition three now recorders stored in a partition either by the record e if it is present or by round-robin if the key is missing so this is the basic default behavior right now let's understand about replication so if we take the same example where we have a topic and that topic is configured into four partitions so this is partition 0 1 2 & 3 and if we set the replication factor of a topic to be equal to 3 then Kafka will create three identical replicas of each partition right so this partition would have three replicas partition 2 will have 3 replicas partition 0 would have 3 replicas and partition 3 would also have 3 replicas now all of these replicas would be present on the available brokers which are present in the cluster so let's take this partition over here now since the replication factor is set to 3 there would be 3 replicas of this partition over here and the ID of the replica is the same as the ID of the broker that use it so over here the ID of the broker is too right since this is broker - that is why the ID of the replica will also be 2 so this is replica 2 similarly over here this is broker 3 or the ID of the broker is 3 and that is why the idea of the replica will also be 3 now after all this is done for each partition kafka will elect one broker as the leader and out of these 5 brokers over here this broker has been elected as the leader and if this broker fails to do some unfortunate incident then one of these 2 brokers either the broker 2 or broker 3 would be elected as the leader because this has a replica of partition 1 right now finally let us understand about the Kafka consumer so the Kafka consumer can subscribe to one or more topics and read messages in the order they were produced so it's not a necessary that every consumer has to read only one particular topic so it's totally fine if one consumer reads more than one topic and the consumer can keep track of all of the messages it has already consumed by keeping track of the offset of messages so let's say consumer a reads this message which has offset value zero now to remember that it has already read the the message at offset value zero or it will do is will keep a glance of the offset value and since it knows that it has already read the offset value zero then it will go to the next offset and it will read the message which is presented offset number one so consumers can also work as a part of a consumer group so as we see over here we've got a consumer group that is they've got one or more consumers that work together to consumer topic so over here we've got three consumers present in a consumer group we've got consumer a consumer B and consumer C and all of these three consumers are consuming only this one particular topic now messages were the same he and I were the same consumer and this consumer group basically assures that each partition is consumed by only one member over here partition 1 is being consumed by only consumer A partition 2 is being consumed by only consumer B and partition 3 is being consumed by only consumer C let's look at this working mechanism okay guys a quick confirm if you want to do an end-to-end certification on Kafka Intellipaat provides a complete certification training on Kafka and those details are available in the description now let us continue with this session so over here again we've got these three consumers in a single group consuming one particular topic now we see that consumer 0 is working on only one partition which is partition 0 similarly consumer 2 is working on a single partition which is partition 3 what consumer 1 is working on two partitions simultaneously which are partition 1 and partition 2 now this is also possible when it comes to consuming messages with the help of the consumer but the only thing which matters is that all of the offset values are in check over here consumer one reads offset value 5 and 7 consumer 0 reads offset value 6 and consumer two reads offset value then so if all of the consumers work in tandem and maintain the synchronicity then would be absolutely no problem right now let's start by understanding where exactly is Apache a zookeeper and how does it help when it comes to working with Kafka so let's actually start off by looking at the definition so zookeeper is an open source Apache project that provides centralized infrastructure and services that enable synchronization across an Apache Hadoop cluster now this is a very complicated definition so what do we mean by this now Apache zookeeper comes in whenever we are working with any sort of distributed applications now if it is a distributed application for example Kafka for this would obviously run on multiple systems so you have multiple nodes Working together parallely now when multiple nodes are working together parallely like again let's take Kafka so you've got a multi node cluster set up let's say you've got multiple brokers you've got multiple producers and you've got multiple consumers now all of these brokers consumers and producers have to work parallely in sync now there could be a lot of cases where this does not happen right so either the broker might fail or the message is sent by the producer would not have been received by the brokers or there could be a case where the consumers are not able to process the information sent by the brokers so to make sure all of this is working properly in sync we would need a apache zookeeper so zookeeper make sure whenever you are working with any sort of distributed application all of these work in tandem I provides a lot of services so one of those services which apache zookeeper provides this the naming service now what do we mean by naming service should basically means that it is sort of like a DNS but just for the nodes which are present in this cluster setup so with the help of Apacha zookeeper we can identify which brokers are present in our cluster currently and which producer is sending what messages to our broker currently it's all of this naming information is present and also it is used to elect the leaders and the sleeves now it is very important when it comes to cluster setup to have one leader and as the workers that there is a proper workflow maintained now what happens if the leader itself fails so if the leader itself fails then Apaches Zookeeper maintains a list of all of the workers which would have the same topics present in them now Apache zookeeper would go ahead and elect one of these sleeves as the leader so this is where Apaches who keep Oh comes in and not also this it also makes sure that all the topics which are present in the partitions they have the relevant offset numbers and it also makes sure that these messages are properly sent to the consumer which so these are all the services which are provided by Apaches Zookeeper now just a bit of information about Apaches you keeper service originally developed at Yahoo and facilitates synchronization in the process by maintaining the status on the zookeeper servers which stored information in local log files and a Apache zookeeper servers are capable of supporting a large Hadoop class of which we have already known all right so this is the zookeeper architecture so we've got all of these servers so these are the server applications and these are all of the client applications now all of these server applications work parallely and all of these client applications also work parallely so let's say there is this client which sends a request to the server application over here now if this line does not get an acknowledgment from this server then what happens is it will not wait for a long time and it will send the acknowledgement to the next server so this is how paralyzation works in right so if one server doesn't respond to this client then immediately that request is sent to the next server and it will wait for the acknowledgement from this server in the brief amount of time and if you in the second server does not respond to this client in that specified amount of time then that request is sent to the third server and it will be responding to this particular client over here so this is how Apache zookeeper works now let's see how the keeper and Kafka work in tandem so the Kafka brokers coordinate with each other using zookeeper right so over here we've got just one broker but let's say it is a multi node cluster setup or a multi broker cluster setup so if you have multiple brokers then all of those multiple brokers need to work with each other entire now that is possible by using the zookeeper alright so this is how the kafka brokers work and then producers and consumers are notified by the zookeeper service about the presence of a new broker in the system what about the failure of a broker in the system so let's say as of now we have just got one broker in this Kafka cluster now suddenly we decide to scale up the process and we add two more brokers in the system now when we add two more brokers in the system then these two producers and these three consumers would have to know about the presence of the new brokers so this is where zookeeper comes in so zookeeper it notifies these two producers and these three consumers but listen I have added two more bro goes into this Kafka cluster and whatever messages should be sending that would be processed badly by all of the three brokers over here right now after only denote fields they're on the basis of the currently live notes apart is root keeper will elect the new lead-up so I've already told you about this so let's see if one of these partitions are selected as a leader and that partition fails then the other partitions which are currently alive one of those would be made the leader and zookeeper and Kafka keeps a set of in sync replicas so this is how the zookeeper maintained synchronization between producers consumers and brokers now let's look at the Kafka workflow so the producers will start off by sending messages to a topic at regular intervals so let's say there is one particular topic and the topic is related to football now the producer will publish or send messages to that particular topic at regular intervals of that the brokers will store the messages in partitions configured for that particular topic so let's say this topic football has three partitions then these messages would be stored in these three partitions now if a producer sends two messages and there exists two partitions Kafka will store one message in the first partition and second message in the second so this is how it works so again let's take our football topic over there as I've said let's see if this football topic is divided into three partitions and we are sending around six messages now the first message would be stored in the force second message would be stored in the second partition third message would be stored in the third partition no again the fourth message would be stored in the force partition again fifth message would be stored in the second partition and the sixth message would be stored in the third partition so this is how it works and after that once the producer sends all of these messages and it is stored in the paruko system then a consumer would always subscribe to a specific topic so over here we've got consumers and these consumers would subscribe to our football topic and when the consumer subscribes to a topic Kafka provides the current offset of the topic to the consumer and the offset is saved in the zookeeper Ensemble now let's say we've got this consumer which reads the first two messages which are present in the partitions so over here the offset value would become two now since it has read two messages the offset value becomes two and it has stored in the zookeeper now going ahead the consumer has to read the third message which would be present in the third partition so the consumer would know this by looking at the offset of the currently consumed message and four new messages the consumer will request Kafka in regular intervals now as soon as the messages received from the producer it is forwarded to the consumers and on receiving the message consumer will process it and once the message is processed an acknowledgement is sent from the consumer to the broker and once the broker cluster receives the acknowledgment the offset has changed to a new value and it has updated the zookeeper now the consumers are also able to read the new message correctly even during so route registrants the offsets are maintained in the zookeeper now let's say we've got this producer over here and it continuously keeps on sending messages and there are let's see 100 messages in total and the server breaks down when it has sent around 57 messages now there is no need to worry anything at all because the offset number would be stored in the zookeeper and the offset value of 57 would be there now once the server goes back up because rumor can start consuming messages from the offset number 58 because of the help of the Sookie and the slow repeats until consumer stops the request so this is the entire flow of the messaging system now let's look at some of the top companies which are using Apache kafka so these are of the top companies so we've got Samsung Electronics whom these who are in corporation Herrmann International industries and lanagin's in cooperation so these are all of the top companies which are using apache kafka for distributed processing and distributed synchronization now Intel part already provides the kafka setter for you guys so you don't have to worry about the setup at all so you can just go through those support documents and you'll be able to figure it out and if you have any doubts you can reach out to our support so they are available 24/7 now the spring us to the end of the module let's just go through a quick quiz to recap all of that stuff so we've got our first question over here now each kafka partition has one server that access so what do you think is the answer right so the answer is leader so each kafka partition has one server that acts as the leader then we have our next quiz or so Kafka provides only your - or durable messages within a partition so Kafka provides only a total order over messages within a partition question number three so Kafka maintains feeds of messages in categories called so what would be the answer to this - the answer is topics so all of the feeds of messages are stored in categories called as topics all right right so let's start by understanding the Kafka cluster now whenever we are dealing with small amount of theta or we are working in a local development to work then it's fine if you are using just a single broker but let's say we have a huge amount of real-time data and we have around one terabyte of real-time data coming every single second now to process this one terabyte of real-time data every single second one broke up wouldn't be enough so we'd have to scale this load across multiple servers so this is where we would need multi broker system set up now instead of using this one broker what we'll do is we'll have multiple brokers and all of this theta would be scaled across multiple servers so that the load across one single broker is reduced now another advantage with respect to multiple broker setup is the topic is not just sent as a single topic it'll have its replicas as well and when a single topic has multiple replicas this would also given more fault tolerance now also as we have seen Kafka cluster is effective for applications that involve large-scale message processing right so now let's go ahead and look at some Kafka command-line tools so the Kafka cluster can run against this following proper setup so we can either have a single broker clustered or we can have multiple bruco cluster so in single broker clustered we just have a single broker which would serve all of the requests and when we have multiple broker cluster what happens is the load is divided across all of these multiple brokers and these are some of the commonly used commands so we'll start off with a zookeeper so we'll start dot s hedge now whenever we are working with Kafka or real-time processing we need zookeeper so to start zookeeper we would have to run this command so we'll type in zookeeper service dot dot assets as we see over here it starts zookeeper using the properties configured under config slash zookeeper dot properties so in simple terms this command line is basically used to start the zookeeper service and then we also need to start the Kafka service so this is the command so we'll type in Kafka service dot dot sh and this is a description so it starts the Kafka server using the properties configured under con fig slash so what properties now when it comes to topics we'll use this command so we've got Kafka topics dot sh and this command is used to create topics list effects delete topics and modify the topic and then we've also got producer and consumer now you also have Kafka producer and Kafka consumer so as we all know producer is used to send messages to the Kafka cluster so this is the command will type in Kafka console producer dot sh to send messages to the Kafka cluster and then we also know that consumer is used to consume messages and this is the command to create a consumer so we'll type in cough-cough console consumer dot sh and so as it was stated over here it is our command line client to consume messages from the Kafka cluster right now let's look at the different types of kafka clusters available so first we have single node single broker cluster so in a single node single broker cluster we just have a single node now what do I mean by single node so single node basically means a single system so in the single system we have a single kafka broker over here and we have producers sending messages to this single node and these messages would again be consumed by all of these consumers over here so we just have a single system or a single node and in that single system we have a single broker present and that broker is responsible for maintaining the balance between the producers and the consumers so next we have a single node multiple broker cluster now this means that inside a single system so single node basically means a single system so inside the single system we have multiple brokers so as we see over here we have a single system and inside the single system we've got 3 brokers broker one broker two and property so these producers send all of their messages to this cluster over here right so this is a multi broker cluster and these consumers consume the messages from this multi broker system and then we have a multiple node multi broker cluster so when it comes to multiple node multi broker cluster we have more than one nodes or in simple terms we've got more than one system and in more than one system we've got multiple glucose present inside it so as we see over here this is system 1 and system 2 or node 1 and node 2 and inside node 1 we've got broker one and broker 2 and similarly inside node 2 we've got broker one and broker 2 so over here these producers send their messages to node 1 as well as node 2 and similarly over here this consumer we see that it consumes messages from this node and these two consumers they consume messages from this node over here so this is how multiple node multi broker cluster works now it's finally time to go ahead then configure our single node single broker cluster right so these are some prerequisites we should have so that we can configure our system so we need to have Java Kafka and zookeeper pre-installed so that we can go ahead and set up a cluster all right now to set up a single broker so we'd have to start off by opening your terminal and we'll launch the services of both zookeeper and Kafka so as we have already seen to launch the zookeeper service will type in zookeeper so was start dot s hitch and after that we will go ahead and launch this zookeeper so kafka slash config slash zookeeper properties so this is basically the part where this is present so inside the kafka folder there is this configuration directory and inside this configuration directory there is this properties file which is zookeeper properties so this helps us to launch the zookeeper service now similarly to launch kafka will type in Kafka's / star dot sh and inside the kafka directory we have the configuration directory and inside the configuration directory we have this server dot property so it should help us to launch the kafka service right so we are starting off the zookeeper service and they're also starting off the kafka service right now to see if both of the services are running or not we'll just type in GPS and this would give us all of these results so this Kafka over here the space key means that Kafka is running and this coram pure mean this means that even the zookeeper daemon is running so good though Kafka daemon and the quorum pure main daemons are running so now that we first started the Kafka service it's time to create the topic and send it from the producer and consume it from the consumer so this is how we can create a topic so as we had seen earlier this is the command create a topic so I'll type in Kafka topics dot SH and then we'll type create and after that we'll initialize zookeeper and then we'll set a localhost port number so we are setting up the localhost port number to be equal to two one eight one and the replication factor which we are setting up is equal to one after that we'll also set the number of partitions to be equal to one and then we'll give the name of the topic so here the name of the topic is example one right so basically we are creating a topic where the replication factor is 1 and the number of partitions is also equal to 1 now if we want to have a glance at all of the topics which are present then we just need to type in Kafka topics dot sh list zookeeper localhost so this would give us all of the topics which are present so now that we have created a topic it's time to launch our producer so this is the command for that so we'll type in Kafka console producer dot s hitch and well given - - pro vocalist and then we'll give the port number where the producer sends these messages so it is nine zero nine two and then this is the topic which the producer is sending so topic example one so example one is the topic which is being sent by this producer so we have launched the producer now we have to create a consumer to consume or process all of these messages so we'll use this command over here we'll type in Kafka console consumer dot SH and then we'll start this bootstraps over and we'll set the local host to be equal to nine zero nine two because both the producer and consumer are listening at the same put and this has subscribed to the topic example one the producer is sending the topic example one and also the consumer has subscribed to the topic example one and it is reading the topic from the beginning right so when it reads the topic from the beginning this is the result which it gets hello this is my first example right now let's go ahead and actually perform this so this is pretty I'll just go ahead and login so this was the ID of your training and then I'll go ahead and given the password so let me just type in the password over here right so how successfully logged in to pudding now it's time to run the Kafka server so I'll type in Kafka so we'll start dot s hitch and then I'll type in Kafka slash config slash and then after that I need to type in so word so let me try the spelling correctly so this has to be so word dot properties let me hit end up so let's just wait till the Kafka server starts right so we see that behalf success we use in the kafka say what no what I'll do is I will duplicate this session over here and again let me login over here let me type in the password as well so now that I've started those so now it's time to go ahead and create the topic first so this is the command to create a topic so we'll just type in Kafka topics dot s hedge create zookeeper will set the localhost to be equal to two one eight one and I'm sitting there application factor for this topic to be equal to one and I'm setting the number of partitions also to be equal to one and I'll set the topic name to be equal to example one so I'll hit on enter now let's just wait till this topic is created right so we see that we have this message created topic example one so now that we have created the topic let's have a glance at all the topics which are present now so this is the command for that so kafka topics dot SH list zookeeper so this gives us all of the list of the topics which are present so it tells us that these are all the topics which are currently present we've got example1 example2 flume Kafka topic one consumer offsets in my topic so now that we have created the topic it's time to start the producer to send the messages from this topic so again I will duplicate the session let me log in again and I'll also type in the password over you right so this is the command to start the producer sukhovka console producer dot sh broke a list and then I am sending this topic from the producer so the name of the topic is example one so the producer has started let me go ahead and type in some messages so I'll type in hello how are you I good wait so these other messages which should be sent from the producer to the consumer now I'll create another session and I will start the consumer in that let me type in the username and the password let me also type in the password again and this is the command to start the consumer so we've got Kafka console consumer dot sh and this consumer would be listening to this topic example one from the beginning so let me just wait till all of the messages are coming from the producer side to the consumer side right so this is what we have hello how are you doing I am good right now let me actually open up the producer again right so let me add some more messages over here so I'll type in Sparta plus 300 now let's see if we have these messages in the consumer as well right so we have these messages in the consumer side as well now again let me add some messages from the producer so I'll type in let's see this is my first Kafka project and I love Kafka do you like Kafka let me just type all of this right now let me go to the consumer side right so we see that we have all of these messages on the consumer side as well so this is how we can set up a single broker set up now let's see how can we configure a single node multi broker cluster so again to set up a multi broker system so we'd have to start off by loading the zookeeper service and as well as the kafka service so we'll type in zookeeper service star dot SH and then we'll load up this zookeeper tour properties file similarly we will type in Kafka's / slash star dot sh and then we'll load up this server dot properties broker now once we start the kafka service and as well as the zookeeper service we'd have to go ahead and create multiple brokers till now we've just got one broker instance which is already in the config slash server dot property spread so this file which we had loaded up earlier this is basically the only broker which we have till now now to create multiple broker instances what we'll do is we will copy this existing so our properties file into two new files and rename them as so one dot properties and so were two dot properties now after we do that it will go inside the Kafka slash config directory and then we create these files right so we'll go inside Kafka slash config and then we'll create a copy of the original broker file so we'll type in CP so r dot properties so under properties so we are just creating the copy of this original so what our properties file and when aiming it to be server 1 dot properties similarly we'll create another copy of this file and we'll rename it to be server 2 dot properties so now that we have created our two brokers would have to go ahead and make these changes in this so we'll go inside the server 1 dot properties file and we'll make all of these changes so in our earlier so word dot properties file the broker ID was 0 now we'd have to make sure that each broker instance has a unique broker ID so we'll give this new file the broker ID of 1 and this will be listening to the port 9 0 9 3 so the original broker was listening to port number 9 0 9 2 now again it needs to be kept in mind that one can listen to only one port that is why since this is our new broker this will be listening to a new port which is nine zero nine three and again we will make changes over here so we'll set the log dot there's over here now initially to ask Africa underscore logs so we'll change this to be Kafka logs one similarly will go inside the servitude or properties file and we'll make these changes so over here we'll set the broker ID to be equal to do because this is our third broken and this broker will be listening to the port number nine zero nine four so the first broker was listening to port number nine zero nine two second broker was listening to port number nine zero nine three and this broker which we've created would be listening to port number nine zero nine foot and again we'll make changes over here so we'll change this to logs do now once we set up this we'd have to go ahead and start all of these brokers which we've created so the command would be same will type in Kafka so was dot dot SH and we'll load up this server wondered properties similarly we will type in Kafka service star dot asset and we'll type in sir would do dot properties so this is how we can start these two brokers which we have just created now what we have created the brokers will go ahead and create a Kafka topic so to create a topic again it will be the same so we'll type Kafka topics dot s ht8 and then we'll give the name of the topic so which is a topic example two so example two is the name of this topic and then we'll type in - - zookeeper and then we'll set the poor of the localhost which is two one eight one so for this topic which we are creating we are setting the number of partitions to be equal to three and we are setting the number of partitions to be equal to three because we have three brokers earlier when we had setup just a single broker system over there since we had just one broker the number of partitions but one and over here since we have three brokers we are setting the number of partitions to be equal to three and similarly we will said that the application factor to be equal to two now to check which broker is listening on the current created topic you can just use the described command so when will type in described we get an idea of which partition is listening to which topic over here so now that we have created the Kafka topic we have to do the same procedure we will have to start off the producer so I will create my producer over here cough cough oil producer dot SH and this time I actually have a broker list so this would be sending messages to three brokers over here which are listening to these port numbers nine zero nine three nine zero nine four and nine zero nine two and the producer would be sending messages from the topic example two right so these are the messages which are being sent over here Biagio do you understand Kafka now hope you like this session and then going at will also start the consumer so here I'll type in Kafka console consumer dot SH and I'll start the bootstrap so would I say the local has to be equal to nine zero nine three in the topic is example two because the producer is sending messages from example two and this will read everything from beginning alright so now let's go ahead and perform this demo so I've got my virtual machine running over here so I'd have to do the same thing I'd have to start off by loading the Kafka server so I will type in Kafka so word let me type in the correct spelling over here so Kafka so would start dot as hedged after this this would be Kafka / config slash and then I'd have to give in though so without properties name so dot properties so let me just wait till the first broker starts alright so we have successfully started though for is broker which is listening on the port number nine zero nine two now we'd have to go ahead and start the other two brokers so I'll duplicate this session over here let me again load in so I'll type in ID which would be training and then I'll give in the password so the password is rights or Kevin the password now as we already know we have this over dot properties file so I'd have to make a copy of it so to make a copy I'll type in CP and the name of the file is over dot properties and I will make a copy with the name server 1 dot properties now similarly I'll make another copy with the names over to dot properties so I will change this to be sober to right so I have created so one dot properties in servitude Patties now let me go inside sir one dog properties and make the relevant changes I'll type in VI M let me open this file up so this would be so one dot properties let me edit this right so let me go up and let me show you what exactly we are supposed to change over here so as we see we've got the broker ID to be equal to zero because we've just copied the original file so I will change this to be equal to one similarly I will go down and wherever we have broker ID to be equal to zero we'll set it to be one we'll make all of those relevant changes over here so we'd have to do the same thing now once we change over the broker IDs to be equal to one let me just head down over here and let me show you what is the next chains to be done so a little more down over here so you see this log dot directories over here so over here I would have to change this to 1 similarly let me just go to the bottom of this page and let me add a new line over here so this time the port number would be equal to nine zero nine three so the original blue code which was listening to nine zero nine two and since we are creating a new broker this would be listening to nine zero nine three now let me just press escape I'll type in : WQ so this would help me to see the changes which have made to this new file right now similarly let me also make these changes in the sir whatever dot properties file let me type an insert over here let's change all of these broker IDs for this time I'll be changing the value of all of these broker IDs to be equal to two so this would be two over here the value would be equal to two so I'll just delete the zero over here which is present and I'll change wherever it is equal to two now similarly let me head down and go down to the server settings over here so over here we have this log orders and I will change it to Kafka logs too and I'll head down to the bottom of the page so over here Isles at the port number to be equal to nine zero nine four let me hit escape and let me save this file so : WQ I'll hit enter so we've made the necessary changes in sub under properties in servitude our properties now let me again open up a new session let me hear it our duplicate session over here so let me login now I will go ahead and start these new pokers which I have created Kafka so we start dark as hedge Kafka's slash will be config slash so one dot properties let me attend up and let me just wait till the new broker which I've set up which is so one loads up so you see that though you broker is starting now again let me open our duplicate session again let me login inside this let me type in the password now I'll go ahead and start the second row cook Kafka's over star dot as hitch and after this I'll type in Kafka / config slash name of the file is over to dot properties so now again I'll just wait till this loads up so I have successfully started both of these new brokers over here so now that I have started these two brokers let me go ahead and also create my new topic so you'd have to keep in mind that for every single thing which are doing you'd have to start out duplicate session now I'll type in the command to create a topic so this is the command CAF car topics dot SH and I'm creating a topic with the name example due and this topic would have three partitions and the replication factor of this topic is two so we have successfully created this topic example two now I'll also go ahead and start there producer so this is the command to start the producer Kafka console producer dot s hedge so the broker list is localhost 9 zero nine three nine zero nine four and nine zero nine two so these are the three brokers which are listening through these three ports over here and they are sending the topic example to write so now the producer has started and let me just send him some messages over here so I'll type in I love let's see Paris I also love India and I love let's see sherwani as well so these messages we are sending from the producer now I'll again duplicate this session and start the consumer so this is the command to start the country whoa so cough cock console consumer dot SH and this is listening through the port nine zero nine three and this has subscribed to the topic example 2 and this will be processing it from the beginning or it will be reading it from the beginning so we have this result over here right so producer has send these messages and the consumer has consumed these messages I also love India I love Paris I love Germany as well right now let me add something else over here so let me live it up the producer and let me add a few new lines or I'll do is I'll just add some gibberish over you so I've just added four lines of gibberish let me go to the consumer so we have this four lines of gibberish over here right so this is how we can set up a multi broker system now we'll look at some basic topic operations so let's see how can we modify a topic so to modify a topic we just have to use this alter topic command so this is the entire command but the main thing which I have to keep in mind is you just have to type in altered topic so this cough car topics dot has edge zookeeper and it is listening through this port over here to one eighth one half third I will type in - - altered - - topic and then I'll given the name of the topic which I want to call it so this is the name of the topic example one so initially when we had created this example since it was a single broker set up the number of partitions for equal to one now since we have a multi broker set up I can actually set the number of partitions to be equal to two and this is the result which we get so if partitions are increased for a topic that has a key the partition logic or ordering of the messages will be affected all right adding partitions succeeded now we will see how to delete a topic so to delete a topic this is the command so we'll just type in delete topic and then we'll give in the name of the topic which is flume topic one and this is the result which we'll get so flume topic one is marked for deletion this laughs no impact of delete or topic dot enable is not set to true and when we go ahead and check the list of all of the topics available we see that flume topic has been deleted now let's go ahead and perform this demo so as we have seen in the PPT so initially our first example one had just one partition now I want two partitions in it so this is how we can make the changes alter topic example one and I am setting the number of partitions to be equal to alright so adding partitions succeeded now I'll go ahead and delete a topic so now I actually want to delete the topic example one so I will type in delete topic example one like we hit enter right so you see that example one is marked for deletion now let me go ahead and check the list if this topic has been deleted or not so let me just type in this command over you so list of car topics or assets let's do people over those two one eight one and we see that we don't have example one in our list of topics so this means that we have successfully deleted example one right guys so this brings us to the end of the session so let us go through a quick quiz so this is our first question over here so cough car is run as a cluster comprised of one or more servers each of which is called so what do you think is the answer the answer is broker isn't it right so you have got multiple brokers or a single broker in the cluster so a second question point of the wrong statement so the first statement is the cough cough cluster does not retain all published messages second segment does or single kafka broker can handle hundreds of megabytes of reads and writes per second from thousands of clients the third statement does Kafka is designed to allow a single cluster to so as the central data backbone for a large organization and the fourth statement is messages are posted on this can replicate it within the cluster to prevent data loss so the wrong statement among this is the cough cough cluster does not retain all of the published messages all right okay guys a quick info if you want to do an end-to-end certification on cough cough in telepods provides a complete certification training on Kafka and those details are available in the description now let us continue with this session we will set up a multi node crosswalk Lister then we will see some important education commands and after that we will see some Kafka tools that can help us in different operations on capital steps these operations can be graceful shutdown balancing the D balancing - that is going to be used when you expand your cluster or you decommission your cluster you can also move the topics partitions from one worker to another broker using that rebalancing tool and also you can increase the replication factor let us start with the multi node copper cluster implementation the next few slides are exactly same that we followed in our previous sessions that is downloading the kafka topple and this you keep at our bows so I am going to however these two or three slides very fast this is the link that you can see on the on the web user interface where from where you can download zookeeper now the important setup is so to complete the multi node cluster setup what we will do we will first set up a single node cluster and after that we will simply copy the zookeeper and half guitar walls to rest of the servers and start the services so that they can join the cluster so on the first node we will simply untack the directories that is kasper tarpaulin zookeeper tar wall we will rename them according to our convenience then we will set up the environmental variables like Kafka home and zookeeper home in doorways RC file we will create the Kasbah - locks tag tree where crafter will store its data a zookeeper underscore home for us less jitka data where do keeper will store its data we will complete this step on first node and after that we will make some configuration changes on the same node let us prepare the first node first and this is my first machine and I am having two turbos here one after one for the zookeeper I am simply going to untie them you now I am going to rename this newly created directory you now we have to hunt our directories one is for Kafka and one is for zookeeper that of the Kafka home and zookeeper home in Dogpatch RC file I have done this already this is the gospel home this is the zookeeper home the next step is to create the data directive forecast by the zookeeper so I am going to make a direct create Pavan icon looks this is my directive and tocqueville store it's better similarly I am going to create one direct paper zookeeper that K Delta this is the director will suit where zookeeper will store its did now we have completed the first step now we will make the configuration changes for zookeeper and raka and after that we will simply copy these craft and zookeeper directories to rest of the service now let us see what our slide says we have completed this step we will make the zookeeper configurations in this first for this we need to go to this directory which is present at zookeeper horns and zoo dot CSV the two important properties are the data directory that we will update and the last three lines where we are mentioning that we are having a zookeeper servers which are running on innovation - Hadoop one dot abc.com the second is running on nourish - how - two toed a b c doha the third two people server is running it in a rage - hadoo three-door abc.com let us complete this first I am going to the computer operation directory of zookeeper here you can see we have Zoo dot you wonder core sample dot CFG simply rename this to zoo dot CL open this file first of all just update is data directory you remember we just created the directory on copper zookeeper man that came data this is the data directory where zookeeper will store its data secondly we are going to mention the names for all of the zookeeper servers the first to give our server is going to run Matt Matt is not how to door marriage - Hadoop one dot abc.com the ports used will be two triple eight and three triple it and other servers or the rest of the two servers is going to run off in adage - habbo to abc.com 3.0 you see the first part of the configuration changes for zookeeper what is the second part secondly we need to create my ID file and put the same integers used in server dot X properties in do dot CSV which means we will create a file my ID at this directly and we will put an integer one for the first degree per server 2 for second duty per server and card for third duty personal so we are on the first duty per server so we will simply go to VI data directory yes z:k data you can create a file my ID and this is our first to keeper server so we will simply put one here this is simply to tell zookeeper that this is my first zookeeper server and for the rest of the service it will recognize that this is the second and the third one zookeeper service if you are using only one zookeeper server we do not need to mention this my ID 5 now we are done with the zookeeper configuration changes let us make the configuration changes as well for that we need to go to the config directory of Khafre and where we can see a server dot properties file where we need to mention be zookeeper dot connect property and it will improve all the zookeeper server entries you can see 3u keeper service I think I skipped one property here that is important property that is brokered or ID and we need to put a unique integer in this field for every kappa server so we are on the first custom machine so let us make the changes for crafters well CD babka conflict here you can see this server door property is fine let us open this end make big changes you can see this is the broker ID what of first server let us you can start with 0 as well so then west open to machines where we will run the Kafka server we can simply put the propriety 1 & 2 there or even simply start this one the main point is that the holder should be screen ship like 1 2 3 or 0 1 2 so I am just putting one here I am mentioning that this is my Kafka server and it's poker ID is one this is the port that is going to be used nine zero nine to after that I will make the changes for low tor di yachts means the data directory that we created for Kafka we created at it Kafka ok this was the path that we created for after two storage data and the last property is zookeeper dot Connect here we need to mention all of our zookeeper service all of the three servers either mentioned here so we are done with the kafka configuration changes let us move to the slide now the next step is to start these services so whenever you set up your crafter cluster the important parameter the important processes zookeeper because it is making coordination among all of your after services and if your zookeeper is not running the rest of the path of brokers and service will not run and they will simply shut up so first of all we need to start this looking for services on all the machines by mentioning all the machine I mean the machines that you have decided to be a zookeeper forum in our case we are having three zookeeper service so we will need to start all of them first and before starting these services let us make the copy of craft and zookeeper that we have done on our first node we will simply SCP these crafts and zookeeper directly to node second and North third now I'm going to SCP the changes that we have done on this first node to second and third one first let me se P in the kafka directory a second node that is innovation - due to dot abc.com and I want to copy that - kafra directory you similarly OPD zookeeper directly to second we also made changes to don't bash RC file so that we can set the environment variable let us copy that - so that we do not need to manually change this door pushes you on each node okay so from this first node we copied kafka directory zookeeper directory and dot wedge RC file to second node let us copy the same on the third node that is nourish - Hadoop 3 dot abc.com I am popping the tough connective first OPP zookeeper directory now and lastly open the door better see what so that we need to not need to make the environmental variable changes manually okay we have copied the required direct edge to rest of the servers now the important thing is that the my ID file that we created for zookeeper should have unique value inside it and the propriety of a crafter server should be unique so on this first server we are having B value for my ID file is one and the value of propriety for the surveillance what so on rest other machines for example on second machine the venue of my addition be second that is - and the venue of Road ID should be to let us make this changes on second machine I want the second machine you can see the battery side here I am going to make the change for my ID and making it to similarly I am making this 3 on the third zookeeper server now which time to change the forgot properties file where we need to mention the unique polka dot heidi the rocket or dieting for first server is 1 so that is put on this second server rope ID - no need to make any other chain simply save this file do the same on third node open the server dot properties go to polka dot ID make it unique and let us name it 3 so we are having 3 pokers first second and third ok we are all set to go let us say what our slide says now now it's time to start the services and as I said that zookeeper is the main service that should be started at first otherwise rest of the Walker servers will not be working because they will simply shut it shut down let us start be zookeeper servers on three machines this is present at the zookeeper start notice it script is presented zookeeper hall bin set start you can simply do JPS and you can see we are having quorum be and main that is this routing / services that we have just started let us start the same on second and third machine is here we can see the service is not equal zookeeper let us do the same for third node okay so all of our zoo keeper servers are running now we can simply start the kafka brokers let us start this on the first node first I am simply hoping this command I am on the first node I just use Kafka - sir / - start or passage and I provided the configuration file data server tour properties from these APIs come on we got one new service that is broken when the same command to start the broker at second and third node this is the second node I think we've got some error let us see what is the error let us take a hop on it come on oh god we have not said we mentioned the Kafka home and zookeeper home in toadfish RC file because we has to be that file from node first but we event problem this source for Paris I command which will which is required to set up the new variables now we have done this by running source dot which has a command key now we can start the server and it should be a success you can see the second doctor broker is also up let us do the same for third machine first let us see into the source code beta C and copy the command post article broker okay we got our third Kafka broker up so we are done with the multi node plus the setup of Kafka let us move to the slides now the next thing to verify is to create a top tip first of all let us create one topic you simply used Kafka - Tokyo TSS script this is the command the zookeeper details but application factor which is giving we are keeping one the partitions that we want we want only one partition and finally the topic name we got the output created doping test we can simply verify this by running the list come up so we got locked that we are having a topic and its name is touched we have we are done till here and further you can simply run a console console - producer dot SS to produce some messages and Kappa - console - consumer dot I set to consume those messages we have done this while we were setting the single node after cluster so I am simply skips keeping this step this is the output that you can see on your screen when you will run the Koshka producer and consumer okay as we are done with the multi node setup of Kafka let us move to some important administration commands that you are going to use generally the first one is to create a topic that we just did we run this command after - poking tortoises - - create it mention the replication factor the partitions that we were and finally the topic name then we check the same thing if our newly created topic is there or not using - hi best command now the next command is to describe I want to see how many partitions my topic have how many what is the replication factors and what are the configurations that are already set for my topic for that you can simply run this comma describe comma let us turn this I from heaven describe - - topic name I want to check desktop okay we got the output and we check for popping tests we are having partition count one that application factor is one where is this topic present at present I mean on which server is it on the first node second node or third node we can decide this band leader leader who means this is present at second Kappa both replication is we are having only one poppy so replica is also same and is our that is in sync Grampa that is also on the same node because we are not having any practicals let us see the next come on now as we were having only one partition let us change the partition number two three you can use the alter come on here - - alter you can simply mention these partitions that we want let us do this the command is - - to alter - - partitions I will need three partitions now here in this command okay adding partitions succeeded I can smoothly again run the describe command to see the details this time we should have partitions count 3 okay you can see partition count is 3 partition 0 1 & 2 as we are not having 4 we are having only one application factor the only one leader is 2 1 to means for this part this partition is present at the first broker this is present at the second broker and this partition is also present at the second block similar is the case pin replicas if we will have multiple depth across them there these will be mentioned here in Kumasi printed form like if you are having the replica on first and second then one former two will be here okay if you want to delete a topic you can simply to use - - delayed but that if you want to change the configuration for example I want to set the maximum message bite you can see I'm setting this here so earlier you can see that we are not having this configuration and I will take this part now let us set the new configuration I am simply coping this I use the same command that is - - Delta and provided the new parameter is - - config and the configuration that I need to put you can see the output updated config for topic test now you run the dispel command again and you will see that earlier we were having no configs here and this time you will see then you will be headed configuration here ok you can see some things match got my sister whites is equal to this this is what we have recently added now again if you want to delete any configuration you can simply use - - believe - conflict let us delete the newly added configuration now ok updated config hot topic test again we can turn the describe command and we should we should not see this country mismatched dot message - to whites here you can see it is gone so these are some of the important commands that you might have need to run on the topics now okay so we are done till here let us move to the next patch select their going to cover some important tools and topics in Kapaa first of all let us see what is a graceful shutdown so when you have a upend anakov per cluster can get some server dumps you can get these failures also or you can simply keep this servers down intentionally for any maintenance purposes so what will happen when a broker goes down or it we think they've done two things happens first one is to sync all its log to risk it is it will be done automatically by Kafka the second thing is to move all the leader partitions from that down node to rest of the nodes it doesn't mean that we are moving the partitions manually but we will simply transfer the leadership for example if we are having three replicas and the one off let us say one broker goes down and it was having one leader replica so now the leadership will be transferred to rest of the replica partitions so to make this happen at every stage we need to set this property control door shut download unable is equal to true if this property is set the leadership election will be done automatically by Kafka okay and one important point to note here is that if you are having only one replication factor then this control door shutdown would enable is equal to true would not have any meaning because Kafka need some replica copies so that it can transfer the leadership to that replica copies let us move ahead now now balancing leadership so what will happen as we know that as soon as a node will go down a broker will go down all of its partitions leadership will be transferred to its replicas consider that note came back up what will happen that as the leaders ship has been already transferred the bokor that just came up will simply work as the follower which means that all reads and writes will not go to that node because the partitions that are present on that broker are only for robots there is no leader so what we got we got imbalance to maintain or to handle this kind of situation we need to run this kappa - prefered - replica - election DOTA set script as an example you can see on the screen that if the list of replicas for a partition is one five nine okay then note one is preferred as the leadership to either note 5 or 9 because it is earlier in the replica list you can tell cough-cough cluster try to restore the leadership to the restored replicas by running the below come on in this scenario if not first goes down and then simply it simply comes back up after that you can run this step to restore the leadership back to node one as we or nobody you know like to do the manual work we can simply automate this by setting this property Auto dot leader tour to be balanced or unable is equal to true now extending the cluster in Kafka it is very easy to add a node or expand your cluster to do that you simply bring the not copy the or the Kafka configuration directory from any of the note2 that you know and put a unique new broker ID in the servitor properties file after that you simply start the service broker services and it you are done but what about the rebalancing means the new broker will not have the data automatically well itself if the new data will come that will simply go or distribute will be distributed among all the servers but till then consider some of the topics are heavily some of the workers are heavily loaded and we got some new broker server in that case we will need to run the balancer mended how can we do that for that we can use partition reassignment tool first thing is that it is not automatic we need to run this manually using a script this tool has three modes first one is generate second one is execute and the third one is verified generate will take some inputs like what is the topic name and where do you want to move them it will take the topic names and the broker list where we want to move those topics after that it will generate a result that this is your proposed reserved depth this is what you want if we are satisfied with that we can simply copy that into a JSON file and then run the next mode that is execute mode this is the important mode that will actually move your topic from one property to another block and the third note is the verify mode after the completion of execute mode we can simply run the verify mode and it will tell us whether our target was successful or not let us take an example we will create two topics that is for one and for two okay with replication factor to one and partition one after that we will see the locations of these two topics and then we will try to move them from that location to some other broker let us first create this one and who I am on the first node I'm going to try create one similarly I'm going to get now let us describe these two newly created topics so that we come to know where does they exist actually you can see that it is having one partition that is partition 0 and it is present there third bakka let us see where does the second topic that is 4 - where is it present now ok you can see it is present at second Walker now first of all we need to create a JSON file because this tool works on the JSON file only in that JSON file we will simply mention that I want to take action on these topics okay let us copy this position format from here okay we will simply name this is to move we are will create a file top picks to move okay I just pasted that contents here okay you can see we are taking action of 4-1 taking action for one and four just save it so we will run our first mode that is generate more that will simply tell us that you are going to take action on these topics and these are the new broker lists where this topic will be moved to so I am just taking this from and from here we generate is the note that I want to run okay you can see top picks to move JSON file this is the file that we created where we mentioned that we want to move through one and four okay so our phone is present at third broke up so I'm going to move that to first and second and the so to Oh big is present at second so let us see what will happen when we will run this okay I think we are not okay with the command that is let me check let me copy the complete come on and then I think the command is not correct so just let me check it on the internet what is the exact comma so this is the exact term and then we see what we are missing here I'm just poking it pasting it into the first not just updating the occurs localist while is saying operating the zookeeper server as well okay now I'm pressing enter now it should generate off proposed JSON file you can see the current partition replica assignment is who one is having partition zero which is present a third machine who - is having partition zero which is present at second machine and the proposed partition reassignment configuration is that we want to move for one partition zero to the first node and for two partition zero to first node means we are going to move both of these partition two personal now it's time to run the execute not let us run the acute more now okay before we do that we need to create a file where we will store the proposed results and we will use that file to run the execute more so I am going to create this file expand cluster reassignment dot chaser what yeah what I am going to put in there we propose results that is this simply save this now run the Institute you okay so you can see you got success successfully started a segment of partitions we will now run the third mode that is very high mode which will just confirm that our target was successful or not this is the command that we will use to verify you can see that the reassignment for for one partition zero completed successfully Oh for to cartesian zero completed successfully let us move ahead with the custom reassignment my custom reassignment means that earlier we were completely moving one topic from one broker to another but consider I want to move some partitions only consider adoptees having three partitions and I want to move the first partition only to some new server in that case we what we will do we will just like earlier we will simply run these three modes okay the generate which will show the proposed results then we will do the execute more and finally we receive verify mode okay so in this screen you can see consider if I want to move partition 0 of topic for one to some propers and partition one of topic for 2 to some other brokers we will simply mention in the custom - the assignment or JSON file that topic is this I want to move partition 0 and I want to go this two nodes I want to create up together these two modes 5th or 6th similarly for photo I want to move partition 1 and I want them to move to 2 when 3 this is called custom reassignment nothing has changed we just updated our JSON file on that we want to run the execute and verify mode we will then simply when the execute more and then simply than the verify more to verify if we got success or not now how to decommission or broker so this functionality is not there in cough cut means there is no tool to decommission a broker so your admin will have to manually move your topics from the broker that you want to be commissioned to some other broker and after that you can simply shut down that broker and that will be decommissioned so in future disease of pasta we can we may have this big Commission tool so that we do not need to take actions manually now in my previous session I said that we cannot increase that application factor so by that mean that we cannot simply you know for on the altar come on to change the application factor if we want to change the replication factor we need to use this reassignment - okay so for that we will simply add this replica thing you can you remember you know earlier we are simply mentioning the topic name then we try to do the custom reassignment by mentioning the partition name and now you can see we can also mention the replicas that will increase that application factor to the number of servers that are given in this field okay now let us try to do this thing let us prepare this file that is in cheese replication fact about this we are going to do this for you okay I open this file I am just hoping these content syndication format so that I do not mess up with the format of creation file because it is a very critical thing I am changing the topic name to one and I'm saying that I want the application set one two and three I want this one all three books now we will run the execute comma that is this hope to copy it you can see we are using this increase - application - factor for JSON file that we just created earlier you can see we got success successfully started the assignment of partitions for one to these three brokers let us verify this by verify come on if our three assignment was successful or not okay we got the result that for one partition zero is completed successfully we can also verify this by running the describe command and earlier we were having only one application factor so let us see what this shows us this time okay you can see partition zero replication factor has been changed to three and these three replicas are present at first second and third norm and Instagram because means all the replicas are in sync okay guys a quick confirm if you want to do an end-to-end certification on cough cough in telepods provides a complete certification training on Kafka and those details are available in the description okay guys we've come to the end of this session I hope this session on cough cough was helpful and informative for you if you have any queries regarding this session please leave a comment below and be allowed to help you out thank you
Info
Channel: Intellipaat
Views: 58,304
Rating: undefined out of 5
Keywords: kafka tutorial, kafka tutorial for beginners, apache kafka tutorial for beginners, learn kafka, introduction to apache kafka, introduction to kafka, apache kafka tutorial, kafka architecture, kafka components, Kafka connect, kafka streams, kafka architecture tutorial, apache kafka, kafka, kafka architecture diagram, kafka architecture overview, kafka producer, kafka consumer, kafka basics, Intellipaat, kafka intellipaat, intellipaat kafka, yt:cc=on, what is kafka
Id: daRykH67_qs
Channel Id: undefined
Length: 111min 29sec (6689 seconds)
Published: Thu Feb 13 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.