What is Kafka | Tutorial | Beginners Guide

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so what is kafka apache kafka is an open source distributed event streaming platform also a message queue or message broker system in traditional http networks a client sends a message to a server if the server is slow we get latency if the server dies the request fails and has to be retried the client is coupled with the server however in kafka the client sends the message to a kafka broker the server who may be interested in the message will get the message from the kafka broker this decoupling creates a lot of architectural advantages it adds scalability and high availability because we can scale clients to produce more messages add more brokers to deal with messages and scale receivers to process messages messages can also be replicated across brokers further bolstering its resilience in kafka the clients here are called producers messages go to brokers which are kafka instances servers that consume messages are called consumers messages are stored on the broker in what's called a topic topics can be divided into partitions and the message goes into a partition this allows scalability as we can tell kafka to store copies of that message on separate brokers in different partitions when message 2 comes in it's replicated and distributed across multiple brokers in partitions same for message three if a broker dies messages are not lost the consumer who is interested in messages one two and three can subscribe to the topic it'll start receiving messages 1 2 and 3 in order with an index number where it read up to if the consumer fails or crashes it can use the index number to retry and continue where it left off welcome to another video and in this episode we'll be taking a look at apache kafka we'll take a look at a kafka instance by building up a dockerfile we'll then take a look at a kafka topic and how kafka stores topics on brokers then we'll proceed to take a look at a producer on how to produce messages into a kafka topic and finally we'll create a consumer that will read messages from this topic and process them we have a lot to cover in this episode so without further ado let's go [Music] all things start with a docker file we're going to install a kafka broker and run it as a docker container this will allow us to create three kafka brokers as separate containers then we'll ask kafka to create a topic called orders and split it by three partitions one partition on each broker and this is where order messages will come in [Music] so if we take a look at my github repo i have a messaging folder and in the messaging folder i have a kafka folder with a readme and this is our introduction to kafka pointing to the official docs how to build a docker file and all the steps i'm going to be showing you guys today so be sure to check out the link down below to the source code so you can follow along so if we take a look at my readme the first thing we're going to be doing is building a docker file so what i'm going to do is head over to the kafka website and on the right here is a link to download kafka and here they have links to the latest version the documentation release notes as well as the zip file for the installation so if we head back to my repo we can see in the messaging kafka folder i have a docker file now kafka needs the java runtime in order to run so what we're going to do is create a docker file starting with the java runtime and then installing kafka using a simple docker file so this docker file is very straightforward the first thing i'm going to do is say from open jdk and i'm going to run the java runtime on buster which is a debian container then i'm going to say apt-get update and install cool and i'm going to use curl to download the latest version of kafka and then what i'm going to do is i'm going to provide environment variables to set the kafka and scala version this is used to construct the url so if we take a look at my next statement i'm going to create a temporary folder where i'm going to download all the kafka binaries and then i'm gonna proceed to download it with the url provided in the documentation of kafka i'm gonna say curl and i'm gonna call the archive.apache.org and you can see i'm gonna download a specific version of kafka here and get the tgz file then i proceed to download that into a temp directory and i'm going to go and extract it using tar then i have a simple file on the outside here called start kafka which is a shell script that will start up the kafka instance and point it to a configuration file i'm gonna copy that into user bin i'm gonna give it execution rights and i'm gonna then i'm gonna create a command to run that file which will start up a kafka instance now the cool thing here is that this kafka installation that we're downloading has a lot of files to allow you to start and stop kafka as well as produce messages and consume them so it has a bunch of shell scripts that we can use to explore kafka which is really useful so what we're going to do is build up this container and then take a look at those so to build the container what we're going to do is change directory to the messaging kafka folder and then we're going to say docker build dot minus t and then tag our image as kafka 2.7.0 so i'm going to copy this spacer to the terminal and this will go ahead and build our container image now as i mentioned the kafka installation comes with a bunch of cool scripts it comes with a script that starts a kafka instance stops the instance as well as some cool utility scripts that allows us to create a topic describe a topic a producer that creates a message and places it into a topic as well as a consumer that reads messages from a topic now to take a look at these cool utilities we're going to explore the kafka installation and to explore it we can run our container by saying docker run dash dash rm call it kafka and run it in interactive mode and then also create a bash entry point i'm going to copy paste this to the terminal and this will start up our kafka instance and provide us with a terminal where we can run commands so what i'm going to do to explore the instance is i can say ls minus l and point to the folder where kafka was installed and you can see there's a bunch of cool scripts in here there's a kafka console applications to consume and produce messages as well as a kafka service start and stop script there's also a script that allows us to deal with kafka topics there's also an important script which is the zookeeper's service start and stop and we'll come to zookeeper in a second now since we've installed kafka to the slash kafka folder another important folder is the config folder if we take a look at that by saying alice kafka config we can see a bunch of base configs that kafka provides us so there's a server.properties which is used for kafka which is a basic configuration file and then we also have a zookeeper.properties which is a config for zookeeper so the kafka installation comes with a basic configuration file for both kafka as well as zookeeper and we can explore that config file by saying cat slash kafka config server.property and this is the base configuration file for kafka now to take a look at these configs i think the best practice is to create another terminal i'm going to run another terminal i'm going to change directory to the messaging kafka folder and i'm going to use the docker copy command to copy a copy of that zookeeper and that kafka configuration file to my local machine so we can make changes and then mount that config in when we start up kafka instances so to grab these files what i'm going to do is say docker cp and i'm going to copy it out of the kafka container out of the config folder and i'm going to copy it to my local machine and then i'm going to say docker cp and i'm going to copy the zookeeper configuration file out as well so i'm going to paste that and now i have both configuration files on the left one for kafka and one for zookeeper now because i want to run three instances of kafka what i'm gonna do is in my messaging kafka folder i'm gonna create a new folder called config and in that folder i'm gonna create three folders one called kafka one another one called kafka two and a third one called kafka three and then what i'm going to do is move the server.properties file into my kafka1 folder and then i'm also going to create a zookeeper folder inside the config folder so now we have four config folders one for kafka one two and three as well as zookeeper one and i'm gonna move the zookeeper config into the zookeeper one folder so now we've split out our configs for each of our instances so let's take a look at the kafka configuration file so in the messaging kafka config folder i have kafka 1 and i have server.properties inside there let's take a look at that file and the file is well documented we can see we have broker id and this is the unique id of each broker so what i'm going to do is i'm going to set this id to 1 to match kafka 1 and then we take a look there's a section here for socket service settings we can specify how many number of network threads we want number of i o threads and this may include disk i o as well as some buffer settings we can set here for message buffers and next up we have log basics so this is the log directory where kafka will store all the messages so it's important to note when we run this as a docker container we may want to mount this folder into the container so that it is persisted among reboots if you're running in kubernetes you might be using a persistent volume to mount this folder into the container so if the kafka container restarts or crashes and comes back it will still have all its files persisted then we can also specify the number of partitions by default there's also a setting regarding retention hours this is the minimum age of a log file eligible for deletion due to age so this is basically the default setting for how long we want to keep messages on disk and then we also have zookeeper settings to where to connect to zookeeper so currently that's set to localhost localhost2181 we're going to want to go ahead and change that since we're going to run zookeeper in a different container we're going to change it to its hostname which is going to be zookeeper one on port 2181 and we can also set a timeout for connecting to zookeeper as well now for the zookeeper config if we take a look at that one it's very simple there's just a data directory on where to store data a client port a number of max client connections and an admin server and admin port setting which i'm just going to leave as the default so next up we're going to want to create a config for each of our instances so i'm going to take the server properties file i'm going to copy paste it into the kafka 2 folder as well as the kafka 3 folder and then i'm going to go to the kafka 2 config file and change the broker id to 2 and i'm going to go to the kafka 3 file and i'm going to change the broker id to 3 because the broker id has to be unique across each of the brokers and that's it for the kafka configuration now in order to run kafka you saw in the configuration file that we need to point to a zookeeper instance so what is zookeeper and why do we need it zookeeper is a centralized service for maintaining configuration information naming and providing distributed synchronization for apache services like kafka zookeeper keeps track of the status of kafka clustered nodes and it also keeps track of kafka topics and partitions now zookeeper helps make kafka highly available especially when running multiple kafka instances as a cluster in this demo we'll be taking a look at running one zookeeper instance for demo purposes if you'd like me to cover zookeeper in more detail in a future video leave a comment down below now we saw earlier that the kafka installation comes with the zookeeper script and a configuration file so let's go ahead and take that and build a separate docker file for zookeeper so what i'm going to do is in my kafka folder i'm going to create a new folder called zookeeper and in here i'm going to create a new docker file and the docker file is going to look almost identical as the kafka one i'm going to start up by running the jre buster for java i'm going to go ahead and provide the same environment variables i'm going to go ahead and install cool download the same version of kafka extract it but this time i'm going to point to the start zookeeper script and start that one as a command if we take a look at the zookeeper folder i have a copy of that which is basically just running the zookeeper service start script we saw earlier and pointing to a zookeeper configuration file now to build the zookeeper container image it's very simple all i'm going to do is change directory to the zookeeper folder inside the kafka folder change directory to that and i'm going to say docker build dot minus t and i'm going to tag the zookeeper image i'm going to copy paste that to the terminal and that's going to go ahead and produce a zookeeper container image that we can run for demo purposes so now that we have our kafka container our zookeeper container and we have configuration files for each of the instances let's go ahead and create a kafka network run a zookeeper instance and then run three kafka brokers and connect them so to do that firstly i'm going to change directory and go one folder up to make sure i'm in the messaging kafka folder and then i'm going to proceed to create a kafka network because i want to run all of this containers on the same network so i'm going to say docker network create kafka that'll go ahead and create a kafka network and then i can start up my zookeeper container first so i say docker run in the background by saying minus d dash dash rm dash dash name i'm going to call it zookeeper one i'm going to run it on the kafka network by using the net flag and here's the special magic i'm gonna mount in my working directory slash config zookeeper one and i'm gonna mount the zookeeper properties file this is the zookeeper config file that we created earlier and i'm gonna mount it into the location where zookeeper expects the config and in here i'm going to run the container image tag that i've just built i'm going to copy paste this command and i'm going to paste it to the terminal and run it that'll go ahead and start up a zookeeper instance i can then say docker ps and we can see we have that zookeeper container up and running now you can see i have three separate docker run commands for running each of the kafka instances to run kafka 1 i'm going to say docker run minus d for background mode dash dash rm the name is going to be kafka 1 it's going to run on the kafka network and i'm going to mount in kafka 1 server.property so this is the kafka 1 configuration that we created earlier i'm going to mount it to the expected location and i'm going to run my kafka image that i've tagged earlier so i'm going to go ahead and paste that to the terminal and that will proceed to run kafka 1. i can then do docker ps to see that it's up and running i can then proceed to run kafka 2 in this exactly the same manner the only difference is the name as well as the location of the config so i'm going to copy paste that to the terminal that's going to run kafka 2. i can do docker ps to see that second instance is up and then i'm gonna proceed to paste the third command which is kafka three and if i do docker ps we can see we now have all three kafka instances up and running we also wanna make sure there's no errors i'm gonna say docker logs and i'm just gonna look at the zookeeper logs to make sure there is no errors and we can see there's just a bunch of info messages so no real critical errors or problems then i'm going to do the same thing for kafka 1 and we can see a bunch of info messages all good kafka 2 as well as kafka three so we're all good ready to go so now we have three kafka instances linked to a zookeeper instance so next up let's go inside of the zookeeper instance and let's play around with kafka topics we can also use the producer script to send messages to that topic and see how kafka stores it and then finally we can proceed to look at the consumer on how it consumes these topics this will allow you to play around with the topic configuration as well as replication and creates a nice sandbox on your local machine for test driving the kafka features so next up let's create a topic that allows us to store orders for an online ordering system so under my readme i have a topic section which explains exactly what we're going to do next so i'm going to access the zookeeper instance and you can use any instance to do this but i'm going to say docker exec minus it zookeeper 1 and create a bash terminal entry point i'm going to access that container and then i can create a topic by using the kafka topics script that is embedded within the installation so i'm just gonna call kafka topics and then call the create flag and then also point it to the zookeeper instance where to create the topic and then i can also pass in a replication factor and for this demo i'm just going to use one because i don't want to replicate it more than one for demo purposes so that we can see how this message is stored then i'm also going to proceed to say i want three partitions and this will allow us to distribute the message across multiple partitions and then finally the topic name which is called orders i'm going to copy paste this to the terminal and that'll go ahead and create my topic called orders then there's another cool feature of the kafka topic script which is a describe flag which allows to describe the order so i'll go ahead and run that and we can see this explains exactly how our topic is created and stored we can see that we have three partitions partition zero is on replica three partition one is on replica one and partition two is on replica two and this is because we've specified we want three partitions for topic orders so to keep track of this during my demo i'm just going to copy this output and i'm going to paste it to a text pad up here so let's say we have a micro service architecture for our ordering system and we have one micro service that produces orders so what i'm going to do to simulate that we're going to split up our terminal and create a producer so to do that we're going to use the producer script that kafka provides us so i'm going to say docker exec minus it and i'm going to create another terminal that xx into the zookeeper instance and firstly i'm just going to register the consumer that's basically interested in orders so this could be a processing system that looks at all the orders that's being created and processes them so for this i'm going to look at the kafka console consumer which basically bootstraps a server and looks at the three kafka instances that we have running i'm gonna point it to the three instances and i'm gonna say that i'm interested in the orders topic and i'm interested at all messages from the beginning i'm going to go ahead and copy paste this and run this in our zookeeper instance you can see this is now currently paused so the consumer now sits and waits for messages to enter the topic so to start producing messages i'm going to split the terminal again and this time i'm also going to exec by saying docker exec i'm going to go into the zookeeper instance and i'm going to produce order messages by calling the kafka console producer and passing in the broker list so i pass in the three kafka brokers and i also pass in the topic where i want to create an order and this script looks at standard in so what i'm going to do is i'm going to say echo order with an id of one so i'm just going to pass a message this is just a random string i've created this could be a structured message like a json file or xml file that describes the content of your message in this demo i just have a simple string called new order one and i'm going to pass that to the producer so i'm going to copy the script and i'm going to paste it into my producer terminal over here and if we look at the left here we can see that the consumer has picked up that order already so now that we have three brokers running and we have three partitions we also have a message that's been produced by our producer and consumed by a consumer it's important to explore these kafka instances to see how the message is being stored now let's start by going into each of the instances to see where the message is stored and how it's structured so once we have a message in kafka we can explore it inside each partition so let's say docker exec minus it kafka one and go into kafka one i'm just gonna go out of the zookeeper instance and go into kafka one and then i'm going to say apt install minus y and i'm going to install a utility called tree which allows me to visualize the folder structure and then we're going to take a look at the folder where we've told kafka to store logs and this is under temp kafka logs i'm gonna run the tree command on that and if we take a look we can see a bunch of offset directories and here is orders one so if we take a look at the partition we've described earlier we can see that partition one is on replica one so we can see we have orders dash and the number of the partition and we can see here is where the log is stored so this is where our message is now since we're running three partitions our message could be on either one of the brokers and we don't have replication which means our one message is sitting on one of the instances and not on all three and we can confirm our message is not on the first replica because if i do ls minus lh on that partition folder i can see here's the log where the messages are stored and it is zero bytes so let's go look at the other kafka instances i'm going to exit out of this one and i'm going to take a look at kafka minus 2 by going into that one once we're in kafka 2 we can do ls minus l and look at the kafka logs directory we can see orders dash 2. so on replica 2 we have partition 2. that's why we have orders dash 2 over here and if we run ls minus lh on that directory we can see that we have our log file here with 80 bytes so our message is inside of this container we can then quickly head over to kafka 3 to make sure our message is not in that one and since we know replica 3 has partition 0 we can say ls minus h we can look at orders partition 0 and we can see the log file is also empty on this one so 0 bytes so we know now that our message is sitting on instance 2 in partition 2 and we can further confirm this by going back into that instance and we can use the cat command to run cat on that file on that log file in that partition folder and we can see that our message is printed out so order one which is in instance two on partition two so now we know the basics of how to run three kafka instances link it to zookeeper we have the basics of how to create a topic how to split the topic into multiple partitions as well as produce messages and consume messages so hopefully this video helped you understand what kafka is the concepts but more importantly how to get it running and provide you with a sandbox to explore kafka further in the next video we'll be taking a look at how we can use docker compose to start up these kafka instances automatically improving our development sandbox and then we'll proceed to take a look at how to build real world applications on top of kafka if you liked the video be sure to like and subscribe and hit the bell and if you want to support the channel even further be sure to hit the join button below and become a member and also check the link down below to the community page and as always thanks for watching and until next time [Music] peace
Info
Channel: That DevOps Guy
Views: 6,516
Rating: 4.9796438 out of 5
Keywords: devops, infrastructure, code, kubernetes, k8s, cloud, training, course, cloudnative, development, deployment, containers, docker, rabbitmq, messagequeues, messagebroker, broker, queues, web, services, apache, kafka, streaming, stream, message
Id: heR3I3Wxgro
Channel Id: undefined
Length: 23min 34sec (1414 seconds)
Published: Sun Jun 06 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.