Apache Kafka Crash Course

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
a party Kafka is a distributed stream processing software developed by LinkedIn and written in Scala and Java in this video I want to focus on the basics on Kafka I want to talk about the components of Kafka how it works and finally spin up our own Kafka cluster on docker and write some applications some producers and some consumers using nodejs and Kafka GS if you're interested stay tuned you have no idea how many comments I got to make this video right everyone is asking about Kafka it's ever since I made the video about RabbitMQ and publish/subscribe all it's just flooding with Kafka right system I know at some point I wanted to make this video and here it is it took a lot of research this is a very interesting Tech self so if you're interested let's just jump into it as I go through the table of content or their gender you will see your start seeing time codes where you can jump into the interesting part of the video alright the first topic is Kafka components this is the longest topic really I'm gonna break down the components of Kafka I'm gonna explain how Kafka actually works I'm gonna introduce one component at a time as I explained the necessity of this company I'm not just throw all the components and just explain it because I started that way and I got confusing really so I start with the basics brokers and producers consumers and will slowly build up up until zookeeper and all that self so then obviously we're gonna go an example because yeah right I need to see this working right so we're gonna spin up our own Kafka cluster I'm gonna spin up a zookeeper because you need that stuff and we're gonna write our own code nodejs and producer and consumer and create a topic and do all that jazz right so that is an example and then finally we're gonna talk about the pros and cons of Kafka because guess what nothing is perfect right I don't gotta summarize the whole thing it's just yeah I'm going to a few here welcome my name is Hussein and this channel we discuss all sorts of software engineering by example so if you want to become a better software engineer consider subscribe and hit that be like and so you get notified every time I upload a new video well that's it let's just jump into Kafka guys the first component of Kafka is what we call this Kafka server or the Kafka broker and that Kafka broker is this basically the first server that users interact with and since it's a server right it's listening to some TCP connection to accept connections right that's how rabbit in the queue work that's how web servers work that's how pretty much any networking application works you have a server get a lesson on a port that's the default port for Kafka it's got 1992 and it's called a broker and there's magic inside the stuff right we're gonna go into it and there are two pieces of abstractions here which we call the producers this producers produce content it publishes content to the broker and then consumer consumes content from the broker okay so that's seems simple right it just seems very simple and to the point right your eyes if you are interested to know like more about the publishing a pop sub system versus queue and all that stuff I'm gonna reference a video that I did just talking about pops up right because I'm gonna glance over the idea of pops up because I do get deep into that technology just in another video so go go and check that out if you're interested but here's the thing right so the other abstraction and Kafka is essentially the connection right and then you have obviously a producer connect to the broker using a TCP connection and that's a raw TCP connection so it's bi-directional so broker can send information to the producer and the producer can send information to the to the broker and so on consumer is the same thing you establish this TCP connection right there is a protocol I couldn't fall the details of that protocol to be honest and well on my searching so I'm not sure what's the protocol here maybe it's just custom binary TCP okay and here's the thing so once you establish these TCP connections which we know how works right we made we talked about TCP I'm gonna reference it here just talk about just general TCP connection here's the concept there's a topic right topics are basically this logical partitions where you write contents to write so such as logical partitioning of data right and then when the producer writes it has to specify which topic to write to okay so hey I wanna write message a to topic a right that was bad I wanna write message hello to topic a write consumer hey I want to consume topic be the broker will send the messages to the consumer right and then and so on essentially so that's essentially these two pieces so topics consumer producer broker sounds good self simple let's complicate a little bit so we're gonna zoom in here to the actual topic so I made a little bit of space here so I have a topic called users and my Kafka broker and then I'm gonna talk about how producing work how the Kafka producer works right so I'm gonna set an example here where hey I own producer hey broker because we establish the connection right now we send a request and that request says hey broker publish John the string John to users topic okay and it did right just go take that string and then attends it to the topic okay and then very interesting the word opinion we're gonna we're gonna explain the concept of a pending later and Kafka which is very critical thing here okay you can always add stuff to Kafka you cannot delete stuff it's always there penned only right so it's just like hey shove that sounds simple all right let's publish something else publish to users it goes into the end of that and each message is referred to by essentially the topic and the position right and the position is very fast access because hey go to the position number zero that's John go to the position number one that's Edie right so that's very simple to go and index very quickly to that right because it's everything is sequential essential so here's a topic with a bunch of users bunch of data and let's say I'm gonna produce another message just say hey Leo publish Leo to users topic and append it to the end and you can start seeing that the topic is getting larger and larger and larger and larger right and we're gonna talk about what we're gonna do with this right because we as software giant database engineer have solutions to these kind of problems things that grow really Raj right so what happened if I consume so consumer says hey I want to consume it topic users right let's go ahead what's going on there so if this is the first brand new consumer based on configuration it's gonna read from position number Z no so it's gonna get John so the moment to get John it's gonna get the next one which is e D and so on so the moments start polling information consumer is actually pulling from information this is not a push model okay well we're gonna talk about that in detail later on the video right so the consumer just start asking for more asking for more asking for more right unlike rabbit Emma Q where they actually the broker pushes information to the consumer yeah so that's a very important thing to distinguish yeah okay sounds simple still simple saying that doesn't sound complex how about we dive deeper alright now we know right topics grow large what do we do guys with databases when they grow large when their table goes to millions millions millions of rows we do sharding right because it says okay consumers from number one 200,000 goes to this table to this from hundred and one thousand until two hundred thousand goes to this table on this database and you guys consumers or database clients please know if you know we're querying customer number hundred you not to go to database a if you're quitting customer number 205 two hundred thousand and five then go to this database so that's essential that console sharding and Kafka just borrowed that concept right because we want to distribute the data because queries get slower and slower if the data is large right so what do you do is like hey let's shrink it up okay and where's what we're gonna do we're gonna do the same thing and tough call them partitions the same concept is shortening essentially and what we're gonna do is that hey this users is so big right so let's let's say I'm gonna create two partitions partition one user word first names that start from A to M goes here and users from n to Z go to partition two well that sounds simple that doesn't sound hard and now we're working with a little bit of manageable data and then these can grow as as independently right this can grow and dependently and that's cool alright so this is simple but you know guys it's constant of for sure and we didn't talk about charting in this channel I gotta make another video about it but the moment you introduce shorting or partitioning then those guys suffer right because now they have to know what the heck is a partition right they need to understand what partition not only what topic to write or read from they need to understand what partition to read and write from and that kind of sucks because that introduced complexity to consumer and producer we're going to talk about that later alright so what happened now if a part a producer wanna publish nada a user nada to users to topic users and on partition two I just we just talked about now we know that hey now there is in an innocent between N and Z so yeah so here they producer have to figure out which partition to publish the which kind of sucks but it all because of a scalability we're gonna suffer right life is suffering as a Jordan Peterson say right it's all suffering you don't you cannot escape it well you accept it that's what life is like it's suffering life is suffering yes so yeah so another will go to partition to let's go ahead and just write it to partition to that's so simple and here's what happened the moment you write it this now you get a new position right and that position is returned to the producers hey by the way the current put the position on partition 2 is 4 ok 0 1 2 3 4 and then consumers I say I want to consume partition to on position 0 right and I will start reading that and if you want to and it would update its position until it reaches the latest data essentially and says ok there's nothing more to read because you reach the end of the partition and you can see how fast this thing is because you only work with indexes you work with positions and partitions and you don't really say so look star from topics where first-name equals era alright you don't do that right this is not relational Postgres database right you don't use Kafka to do ad hoc queries you you use it for fast writing and distributed of events that happens and we're going to talk about the benefits essentially alright so nice Before we jump into the next abstraction and Kafka we need to talk about this difference between Q and a pops up right so a Q is essentially when a message is published all right it is published once and it is only consumed by one consumer right and that's that's makes sense right it's a queue you publish it to the end of the queue some consumer pulls it or pops it from the queue and consumes an and go gone it's gone from the queue right nobody else can consume it so that has it said advantages and you want to do that rabbitmq is great for this and this is really good for like job execution hey please execute this task you wanna you want one of your workers to pull the task and execute it you don't want someone else executing the same task twice that's bad right and some some things some application some use cases you want to execute only once and queue is great for that and the other concept is a pops up says hey I want to publish a message once but I want it to be consumed by a lot of users okay a lot of consumers the same message I want to broadcast it I want to multicast it to multiple consumers I don't want to remove it from my data right from my mic you it can't say Q it's something else but I don't want to remove it from my source right I want it just to broadcast it and there's a lot of use case for this as well right hey I just another example we talked about that where let's say YouTube right I want to publish a video right you upload a video the moment you upload the raw video well you want to do you want to multiple services to consume this raw video like one service could be that cans compress service right take that raw video and do some stuff on it right compress it the other video take the rock the raw video and make it into multiple Kodak right let's also like 480p and 1080p and 4k just encoding different formats for for mobile and streaming right another one is like hey at the copyright service let's pull that up all the raw video and check if someone is like a infringing on some Content ID right if saying someone is using other people material or there is a music that you can't really use or something is a fan offensive right so that's an example for pub/sub architecture where a pops up is useful okay so observe cue each has used case Kafka came in here and the picture and says we want a boo do both and that's bald man that's bald caution right and it is what what RabbitMQ did and I believe they messed up with this Robin and Q started as a queue it's the name message kill rabbit message kill right they started that but then they said oh oh but people wanted to do pops up and obviously you cannot do Robert and Q pops up but they invented the concept of exchanges I believe ran a might be wrong to kind of hack their way to introduce this feature right which made it really complex and and make the system really weird and awkward okay Kafka answered this question from the beginning from the get-go from the design they designed they built a system with those two in mind okay and the answer to this was consumer group right and this is one of the most confusing to be honest abstraction in Kafka it took a while for me to I really understand and nail this down so I'm gonna explain consumer groups right now all right so consumer groups who are invented to do essentially parallel processing and partitions right cuz now consumers can read from my partition you're aware of the partition right and that's bad so the consumer group can actually fix that problem remove the awareness from the consumer of a partition another benefit is it can kind of run and consume parallel data like consume parallel information from multiple partitions and we're gonna talk about that so let's assume here you have here a consumer group as go to group one I'm gonna add a new brand new consumer just joining the group right if you create a new consumer and then do you say join group and the moment you join a group and if you're the only consumer tough luck man because you are now responsible of all partitions in this topic because now you subscribe to topic users all right and that topic has two partitions and if had seven you are responsible for the seven partitions and what does that mean it means that any time you start consuming right you will get essentially a message from partition one write a message from partition two or if there is existing partition three so you start receiving messages from both partitions and that's not bad okay that's okay because if you're a good consumer and can handle the load you can just essentially receive messages and you don't really care which partition this is coming from right so what this is this is where really interesting the moment you add another consumer to the group the group rebalances is you okay consumer one was really overloaded so let's remove partition one and give you partition two consumer to alright and here's the thing each partition have to be consumed by one and only one consumer right right you can have one consumer consumes - that's good confusing you can have one consumer consuming two partitions or three person or four partitions but one partition better be consumed with one consumer that's the rule and consumer group makes sure of that okay so now we have the rule that's good and there is a thing consumer free cannot join the group because they say hey you you don't have anything to do man right so and the moment you do that and something interesting happen because now those two can work because they are different processes I would assume right it doesn't have to be but you can put them in different processes and the moment you do that you'll start receiving right you can start consuming these two partitions in parallel and that's really cool concept right and the moment you do that right it starts consuming these partitions you will act essentially like a queue right the system becomes a queue and you say who saying how right and here's the thing if partition 1 and consumer all is a responsible partition 1 it will only receive data from partition 1 and the moment it consumes one piece that said the position is updated in the groups is or partition 1 zero position zero has been read move on okay the moment you read one it goes to two and three and four and five you can get just keeps going right that's by default obviously you can fix the position to going back and read but by default if you just left the group as it is it will act like a queue right the moment you read John that's said it's almost like popped off the queue and you start reading the next information in the next information right and consumer two will never be able to read John because it's responsible for partition two right so here's we just achieve the ability to do a queue which is amazing right so consumer group if you want to act like a queue put all your consumers in one group and immediately you will become a queue the system becomes a queue right because what will happen is you will each consumer in the group will be responsible for one partition and that partition will never be seen by any other consumers in that group so the moment you that consumer read one message it's gone right you can read the second message it's gone the third message it's gone right when I say it's gone it has been committed as read it's still in the system right but it has been committed as right you still can do stuff and go back and read it but by out on the Box it will just normally work which is powerful stuff guys right so that's how you do it how do you occur if you want to do a RabbitMQ set up put all your consumers in one group and tada right if you want to act like a pop sub system where the message is broadcasted to every consumer each consumer goes into its unique group and that's okay because a partition can be consumed by multiple consumers in different group right because the partition is a group dependent right and that's okay right so that's how consumer a group essentially act right and what do we get as a result we get parallel processing for free which is amazing right because if you have like multiple consumers in running group they can start reading multiple partition in parallel and do so much kulish stuff all right almost at the end distributor system so yeah that's another piece of the Kafka puzzle right it's a distributive system but how right well distributive system means take a broker and copy it right and make it a leader follower system right where we have one leader where it takes all the requests and one follower where it just reads from the leader right all right kind a master follower kind of a configuration right so let's do that well that's gonna spin up another Kafka broker who's gonna listen to another port that sounds easy right and there is what we're gonna do we're gonna copy copy stuff and when you copy we're gonna mark things as leader and follower right so if you copy everything let's copy this users topic this is the major broker right and the moment I do that I start copying partition one copying partition two I'm done right what here's the thing how do I know which broker responsible is the leader and which broker is the follower you need a system that tells you that all right we're gonna talk about what the system that is right and here's another constant Kafka says like no in Kafka we don't want a leader broker and a follower broker no I want that a finer level I wanted a partition lever I want you to be the leader I want this broker to be the leader of partition - but this broker the leader of partition one and that's cool because now if the the whole broker went down you can you don't lose your entire data right because you don't have one master one leader right the leader is responsible for multiple partitions essentially and that's good for partitioning and and having that ability to do to a distributed system so we're having distributed at the party level but where is this data stored man where is this concept sorted Oh broker 909 3 is the leader of partition 2 on topic users and 90 or 909 2 is the leader of partition 1 on users topic and where is that information stored meet zookeeper the biggest disaster in Kafka again this is a controversial topic a lot of people like zookeeper right I personally did not use it by a lot of people do not like this technology because it caused a lot of pain than good alright but essentially what a zookeeper does is it does it just hurts the cat and they as they say right just just says hey you are the leader you are the leader follow the leader leader leader follow the leader that's what it does right ok and let's go an example dude let's go through an example produce and how do we produce in this configuration where we have zookeeper right and here's the thing is this ok when you connect to create this to bi-directional connection from a producer to the broker you actually what do you do is as you asked before you submit something you ask man who is really the partition leader of the thing I'm about to write you and then you write to it ok so let's say ah we're about to write as a user Zayn to users topic and on partition to because I had to figure out what partitioned by doing this simple check and my end a little bit complex but sure why not right and then or just like shove that information says hey I want to write to that I have no idea who's the leader and here's what I am not sure about and someone can correct me in the comments section if you guys know right I'm not sure if the producer is aware of who is the partition leader right because you got a right to all the leader you cannot try to a follower right you can only read from a fellow er right if you were writing you better write to a leader right but how do you know right so what I'm not sure about is the producer does the producer know that oh for partition - oh let me query partition - is actually nine oh nine and three let me establish a connection to nine on or three and let me write to nine or three because that's the partition leader is that the case or is the other case where hey any broker and I think it's the latter any broker that I'm connected to 909 - please write this message to producers partition - and those guys gossip between each other that's another word zookeeper actually they use a gossip system to determine who's the leader and they talk of each other oh so yeah so that the brokers gossip between each other and this is ooh who's the partition - later okay okay yeah it's me 909 three let's go right so a wall they will communicate between each other and they will write same to the correct partition to the correct broker to the correct topic right once it does that what happens is wait a second we just raw raw - the partition - on 909 three the follower need to read that information and then just copies it right that's what we do right seems good how about consuming right I want to consumes topic users on partition one again with a consumer group you can avoid saying partition one if you want to right because the what happens essentially is the group will assign you a partition right and then you'll be responsible for that partition and you can read just read you don't have to specify a partition right and we're going to show that by an example sweet right so hey I want to partition one well you can either write write read from a leader and depends on whatever is available in this in pencil the completely depends on zookeeper and how whatever this algorithm is right and then you can hit another follower node and read that for same information because they better be the same right essentially sweet let's jump to an example guys here's what we're gonna do all right so the first thing we're gonna do is we're gonna spin up a zookeeper instance because even if we're gonna work with one broker unfortunately we need zookeeper I have no idea why it's just baked into the technologies it's based on the baked into the tech and the whole thing is just they need it right to start just to start right so you need zookeeper to just hurt the cast as a herd the cats as we said and then we're gonna spin up a Kafka cluster single Kafka slugfest are both of them on docker I don't want to pull it with your machine with installation and all that stuff alright and guys I didn't use the confluent QuickStart whatever it's called docker composed to me it was very complicated it needed to just eight gigabyte to install that stuff and there are they have like Kafka connect and streams and the command-line I don't need any of that I'm just showing you guys how to do Kafka and zookeeper so I'm just spinning up just those two docker containers so that's that's how I'm gonna do it in this video all right and we're gonna learn how to create a topic we're gonna create the users topic I'm gonna use this beautiful nodejs package for library for Kafka is right I checked out another library called a note cough cough cough connote it wasn't great really was so buggy and didn't have promises so just out of the box is just I had to just work with something else and thank God I found this Kafka JSO again I lose that we're gonna create a producer we're gonna write a message to the user stuff to the users topic and then we get a write a consumer to consumes from the user stopping let's just jump into it alright guys so let's start with by spinning up a zookeeper docker container right so the first thing we're gonna do is let's make sure you have darker go and install it for Mac or Windows that will work the Montt you can do docker hello docker run hello - word and you can just see that there is a result here hello from docker you are good to go sir you can start this tutorial right now let's spin up a zookeeper instance and to do that we do docker run ok and I want you to do docker run the first thing we need to do is give the container a name that's always a good idea so how do you give it a container name - - name we're gonna call it Kazuki / ok and then since you keep her running on a port you're gonna expose that port outside - to my machine so we can communicate with it right there are better ways but this is just a tutorial right so we're gonna do it the the hacky way as they say alright so what are we gonna do is just expose the port which is the port of zookeeper and the port of zookeeper is called - won 8-1 ok and we want to expose port to one eight one that is in the container to the same port on my machine cuz it I don't have zookeeper running so this will work right so that's the second step right and finally what we're gonna do is just pull from the image that is called zookeeper and then run it's very simple we'll start downloading if the image if it doesn't have it and then we'll start running and just like that we have zookeeper running that terminal is now occupied I know guys you can do dash D and make it detach by always if you if you know me I like to have the terminal and I see what's going on right so that terminal is now occupied with zookeeper let's spin up another terminal so we're gonna do is show new tab with profile and let's do now this is what keyboard we can tell this is a keeper which is nice right now I'm gonna run Kafka and that is now you're gonna you're gonna have to copy this right I'm gonna go through the command one by one but I'm gonna leave the actual command in the description below so you just copy and paste it right but here's what you do docker run same thing and obviously we need name the container let's call it Kafka right that's not hard so far so far simple all right another thing we need to do is expose a port because again Kafka is running on a port and what's the port for Kafka it is remember guys was 909 - and obviously 909 - okay so I'm gonna expose the 909 to that is running in the container to my machine 909 - so if I say Hussein Mac which is my machine name that's how we're gonna communicate with a broker right and that will be mapped to the internal one I can call this whatever I want but I'm gonna call it the same import because I don't have anything running on 909 - what else now you're gonna start writing the list of environment variables for the container to start spinning because Kafka what does Kafka broker needs guys it needs the zookeeper instance and what is the zookeeper instance it's very simple we just do - II which is environment and here's how you specify that zookeeper instance it's called Kafka zoo keeper underscore connect and you say equal and just by doing that you will specify the zookeeper instance and do you guys know right we we just spin up zookeeper on port two one eight one so I'm gonna do it's essentially Hussein Mac which is my machine : two one eight one so that's the first environment variable what is the second environment variable is when we do this calf qey broker you need to expose the address of the broker to your client like producer and consumer how does how do how do the brokers do that so you have to tell them and the reason is you have to tell the kafka broker what is your address because you can have multiple listeners in one kafka cluster that's where it gets really complicated to configure this thing man it's a beast right so so we have to specify and in this case the advertised listener is just one and that's the one we're listening to 99 - so what is called as Kafka and it's called advertised underscore less ten right I know how to pronounce listen I'm just being silly guys because that's that's how I learn English by the way I actually over pronounce all the letters because I don't know how they are spilled right that's why I have to say listeners and and you can laugh it's okay won't be mad at you all right so here's the thing so maybe just do let's push some spaces here so so we can just see that all right sweet - a nice list - notice we have all the listeners so what we do here's the thing Kafka supports SSL and plaintext communications right and for some reason if you do plain you have to do like plain text protocol which is the encrypted version of things and the other one is I think it's called either TLS or SSL forgot so because obviously you need just sometimes to do to do secure communication all right versus unsecure and how does the system know that this is secure versus unsecure this is not HTTP HTTPS right so we need a Kafka similar terminology so plain text is the HTTP well and then the other one is the HTTP right but yeah once you do that I say okay where is your broker and it is he'll say Mack that's what guys now I know 9 - because that's what we will we send back to the consumers so they can actually send the communicate with the with the broker ok sweet so far almost done almost done the final one which really got me so many times right is this guy yeah Kafka you know when we when we did sue keeper and all this replicated stuff right we copied the broker into a new instance right we still we spin up a 909 three right Kafka by default or zookeeper by default spin up three instances by default right and if you don't tell if I if I just now run Kafka just like that it will assume you have three instances I will did really confused and says hey I only see one one broker where are the other two I'm sorry there will die so what are you gonna do essentially to get force Kafka hey idiot Kafka I have only one offset topic replication so that's why you do offsets underscore topic underscore replication underscore factor and you can say equal one right obviously guys you can skip through this right you can just copy and paste but I like to explain what's going on but with every single command that we put right and finally guys the image and the image for some reason doesn't live by itself it lives in this confluence which is the the company that actually maintains this project right corn flew went in /cp F gap and we're done let's see if we nailed it alright so yeah run it and we'll start running and we'll check if the zookeeper is running and everything is good and just like that it was just a for the and we have a Kafka Custer spen and we have a zookeeper right we really don't care about zookeepers gonna be in the background but we really really really really care about this guy right and looks like it's up how do I communicate with it we say in Mac 909 - that's broker that's how we deal with them how about we write some code guys let's just jump into it and write some code all right guys I have Visual Studio code here and I have no GS installed that's the only two pieces you need to do this tutorial right you already have docker we already spin up a broker right that rhymed alright so we're gonna we're gonna now write our own first code to communicate with the broker so let's start a project how about that we're gonna go ahead open go to JavaScript JavaScript playground and got a great a folder called Kafka and then just open it it's a brand new project and then for let's spin up let's initialize NPM guys cuz because we're gonna need NPM right so let's do NPM you need - why so that's a project called Kafka that's pretty cool right so that's all right now we have a package.json let's create a new file and let's call it topic the GS okay and that's we're gonna create a topic that's the first thing we're gonna do yeah and to do that to create a topic guys first of all we need to require the library that will allow us to do the fancy stuff to communicate with the broker and Kafka and that rubber is called Kafka yes so I'm gonna create a constant called Kafka equal require Kafka yes and then you can either do this right or you can do the destructuring assignment right sometimes I find this confusing that's why I like to do I show you guys both right but I find this sometimes confusing but this is exactly the same essentially so whatever rocks your boat right so essentially this will give and I made a video about the structuring assignment I'm gonna reference that video well that still gets me right distract me advance I mean sometimes in cases it gets really confusing to understand it and it becomes readable it's pretty to look at but very hard in my opinion I might be just a bad programmer though all right so calf care we have Kafka as an object here what we need to do essentially the next step is is do this we're gonna create a function right and that's func you should call that I don't know run because this will be since we're gonna deal with promises we need to do an async function right and just do the Jazz try catch all that stuff right and then it's just to a console that error in case of error something bad happen and then did this do this right yeah you know their usual Slav stuff and here's what I wanna do we will create an admin right this is something we didn't talk about to create a topic you have to create an admin connection right and to do that you can do this before we write any more code the first thing remember guys what do we do the first thing we do when we communicate with a broker we need to establish a TCP connection is that right that's what we need to do how do we establish a TCP connection it is very simple right so what are we gonna do is you create a Kafka object here and we're gonna create a new Kafka object and then this guy takes an object okay and and it takes like Kafka object but we're not getting any intellisense because Kafka is is not installed so let's go ahead and just install Kafka yes so how do you do install Kafka yes npm install Kafka yes that's not hard don't you install it now you should start seeing some sort of intelligence which is very important there you go Kafka convect now what we're gonna do is essentially what we want what we're interested in here when we create a new connection to Kafka you have to tell Kafka what is your client ID and this is just a string and you can call it anything you want like I don't know my app all right and if you do that will uniquely identify the client right the second information and here's the thing and when you do codes well I'll tell you right there are a lot of other stuff but we're interested this guy brokers very interested the war brokers not broker and this is very important because since you're working with zookeeper one broker can go down and the other can go up so you can you can't actually give the client multiple broker as an array right that's why you can you can have multiple brokers and they can give all of them and then the client will choose which one to connect to right in my case Hussein Mac 909 - that's my broker right there guys all right so now we have a connection the second thing we do is since we're creating a topic we need Thee in the admin interface to create topics right and house this is how you get it essentially you do Kafka don't admin right and then the moment you do that you will get back an admin object so let's go ahead and just do admin equal this right and then when we have admin now up until this point we didn't really connect right because we didn't really tell til to explicitly connect you can throw in some configuration to auto connect but let's be explicit all the time so it holds better and here's the thing the moment you connect right this is a promise and what do we do promises we have waved them because something can go wrong right so let's do that console that log I'm connecting right and then once we make it here we do a console dot log connected right how about that okay so what do we do that now I'm connected what's the next step I'm gonna create a topic right and how do you create topics very simples literally cultural topics and the reason it says create topics and not topic is for efficient right let's send all the topics you want to create and we're gonna take care of it and then when do you say create topics you know this is a wire call right a network call and I've been without looking actually I know that it's gonna retain a promise so I have to await it right and then when we have that create topics let's do what I see what it takes it takes an option so - it's a conveyor it's essentially a JSON object and what does it do it takes time out topics valid only wait for leaders I'm gonna leave that all of them as default I really care about the topics which is literally an array right and an array of how many topics I just need one really right so you need to tell me what is the topic name here and then what's the topic name here I'm gonna create a topic called users and the next information here is how many partitions are in this topic you can specify here so that's the called num partitions because I would have never remembered this without intellisense really guys so you do that and says hey it's too partition because remember it's a - what M + n - Z right so that's the topic we can make this fancier a little bit and make it like take a parameter to create any topic we want but we don't really care because we're gonna run this once and that's it once we done console dot log done right created successfully and then finally once we've done we can disconnect and we can away this thing and then once we're done we can do like a finally here I think and then do what let's just process that exit because we good all right the last step is I don't know if you see but this is like a little bit lighter color that means just you never called me son so let's go ahead and run this thing so run will run this create a Kafka object with these parameters for the admin object actually connect create a topic once it's done disconnect looks good how about we run it guys so we can either run it from debug or really just run it from here so zoom in we do node what was the J's file for god what did we call topic Jas all right so we're gonna call it topic da GS and connecting connected created successfully we can take their word for it but let's see if it actually works right so we know we created something here so how about we actually start producing something right so the second thing we need to do is write a producer let's do a producer ojs and here's the thing about producer is very similar very very similar so I'm gonna copy the whole code here right and here's the thing I am going to remove this stuff right because it's really I don't really care about that right but here's what really differs about this the only difference is instead of saying I want an admin you say I want a producer okay and that's it let's call it really a producer for simplicity and we're gonna connect with the producer and we're gonna disconnect to the producer and all looks good right but here's the cool part the producer have a function called send right and when you do send you can actually send a record and the record is an object and then in the object you can specify which topic you want to send to all that's cool right the moment you do that what's the topic name guys the topic name is always called users so I'm gonna gonna bother sending that as a parameter okay so it's called users and then the second parameter is called messages okay and then send since you can actually send multiple messages that's why it's an array yeah that's nice okay so each message will have an object okay and what do we have here we have a key partition value right you can leave these information empty headers and key right I really interested in the value what's the value of that message the value of the message is and here's the thing I want to call it message msg right and I'm gonna ask the user for it how about that that's even better right Kohn's message equal process dot Arg V sub two sub zero this is our arguments by the way Sub Zero scorpion MORTAL KOMBAT Sub Zero is the node.js application sub one is the file which is producer two gs sub two is the first argument so if I do like no would producer Jas and I say test then sub sub two is actually this guy right that's what essentially that's why I did that okay so that will be test or whatever the user sends right and if I do that I'm gonna send that message and here's the thing I want to actually specify the partition right so the partition is actually let's call it partition but how do I know which partition remember guys if it's a whether it was it if a to M is partition 0 and in to Z is partition 1 okay so how do I can tell this it's it's actually very simple so const partition equal you basically take the first character of the message right and if it's less than in what does that mean that means this partition 0 if it's greater than n else it's partition 1 right so that's the ternary operator in JavaScript so check that if it's less then obviously the first letter is listening and then it's the first partition otherwise the second write simple stuff we have a producer guys okay let's how about we test our producer we do note producer Oh Jess and that's the test created successfully how would I call it sent successfully right and the next thing we did if we forgot to do is actually we forgot to away this guy right because it's definitely a promise and let's do console cons to result because this definitely returns a result right and let's do since successfully and then let's just throw in the result here how about that right let's see what do we get I'm gonna send it again object well a string defy it see what we get and there you go we have topic name users partition error code base append and all that stuff it will tell you which what is the current offset yes guy we talked about all right where's the error code if there is any and which partition did we write you write how about if I write to Ali right that's partition zero how about if I write to Zane right that's partition one because Z right and so on we stick we can stop populating stuff and you can obviously duplicate right it doesn't really care these are not really users but I'm testing right so now I'm position number three zero one two three right and then three and position partition number one three and partition number zero right so the the offset is unique per essentially pair partition okay we have producer seems to be working but I really need to consume guys how about we consume let's write a consumer very simple very similar so I'm gonna copy the code consumer digest then paste and then I'm gonna call this guy right so when we consume right we really need to consume a topic right so we know that the topic we're gonna consume so I'm gonna specify and we don't need to specify a message maybe we need to specify a partition to consume from but that will change based on which group you are so we don't really specify anything we just run the consumer and will tell us what to consume so how about that let's do it so not producer I want a consumer right and here's the thing I want to specify here information here and I won't specify the group ID and they grew up here any consumer that we spend it's gonna belong to the same group all the time and I gotta call the group Ayano test right and then we're gonna get a consumer back right then we're gonna connect that seems ok and I don't really care about that I don't really care about that I'm not sending anything I'm not doing any of that I'm not yeah we're still disconnected that's sure I don't want to disconnect l also I want to keep the consumer running all right because what is essentially the consumer is doing is they do long polling and it will keep polling for messages right in a long polling manner and we're gonna talk about that sweet guys all right so let's do this so consumer dot subscribe that's how you subscribe to a topic right and you essentially what you do is is hey what topic you want to subscribe to well it's called users and do you want to read from beginning even if you are a new consumer well yes sir all the time right from the beginning right so you've got it you have an option as a consumer to read from the latest position and forward like or if you're interested to read everything that's really up to you right but sometimes if the topic is so big and you really don't care about old messages you can say false Argos now that we are subscribed to the topic the next step is actually run the consumer right and just keep it running and let it pull the topic for results right how do we do that we do a function called run and when we do that run there is a function called here called each message right that's a key and you can pass the function or we can call it a saying result function and that function will get executed for each message you receive and then essentially what we're gonna do is just print the result dot messed you can you can print the result of partition topic and all that stuff right we're gonna interest we're interested to print the message for example right let's say a received message or just received message result dot message the value right on partition partition result dot message of partition how about that let's just do that okay and here's the thing we need to do we really move to we need to remove that process to the exit we can't just kill the process consume we need to make it always running right let's do it no consumer the jeaious guys come on joining the group there you go we received all the messages essentially and this guy is still running and you can see right I receive messages from partition zero I receive messages from partition one and it's amazing stuff guys all right so here's what we're gonna do I'm gonna open terminal write a new brand terminal here and how about we do that yeah and then we're gonna do a new tab and also a new tab and this will guide this will be my producer right so JavaScript JavaScript playground Kafka and then you say node right this is the producer and this guy I want it to be the consumer so how do we do a consumer C JavaScript JavaScript playground Kafka and then just do note consumer the GS wait for messages and this guy is a group test right so start joining the group that's cool and now join the group there are no other consumers right now it is a responsible board for both of our stations right so here's what I'm gonna do I'm gonna go here and then go a nerd producer and let's say test I just published something immediately we got a result here hey receive message one partition one well how but I'm gonna do something on partition zero Ali right that's partition zero right message partition one that's because it's lowercase let's do it capital right I didn't take into consideration their lowercase so that should do it right so now receive them partition zero right so bar bra also partition zero right so this guy's responsible for both partition how about you go here and say especially need to go to the JavaScript Kafka and then node consumer join the group sir now this guy joined a group a he will be responsible for the other partition I have no idea how they gonna zookeeper I'll get balanced nose right but this guy support one partition this guy is another partition let's find out who is who write the node producer let's publish Ali so that's a or Adam Adams partition zero we come here we receive the Adam those this guy's partition zero right let's publish again Adam one all right we still got here this guy did not get Adam because he's responsible partition one apparently so let's do partitions either one zine this guy did now get it right but this guy got it right and now you can start just doing a for loop and just produce a lot of messages and let's just do st. twos and to look at that can produce as much as possible and those guys immediately will get it because they are subscribed they're running they're consuming stuff guys I'm gonna make this code available guys for you you don't have to pause the video and look at that but essentially all the coders are smaller is available in the description below let's go find analyze there this course guys by talking about the pros and cons almost done guys almost done there's a lot of stuff this this this statistic is amazing guys so let's talk about a little bit about the pros of Kafka ok Kafka is an amazing technology right and there's a lot of advantages to it and the first one or I took up talk about the append-only commit log this is at the heart of Kafka all the partitions all the topics everything message use you right it goes to a log and that log is append-only right so it always goes in the end and if we know anything about computer science and computers they thrive on appending stuff because it's the best you go to the end it's all right there you know where the end is first ass that's the first thing you always know where the end is you can go very fast to the end and you can append extremely fast to the end right you're not seeking to the middle and you're inserting data between two block of data right unlike relational databases and B trees right you're not doing that you're not manipulating the actual desk space and fragment of stuff it's always append only that's why there are special SSDs that you can write once too and that you can never delete too right so that's what Kafka does a lot of databases by the way do the same thing right they always have the append commit log where any event you do anything any insert goes into this append love first committed like Cassandra works the same as well and then what it does essentially you creatives this logical tables and and all that other stuff materializes differently for you right so that's how we work Kafka works like that so it's very fast the performance is amazing with with Kafka because you're always commenting and you when you read you know what you're gonna read and they are very fast they are very fast and indexed because we work with positions and partitions and that's it and append logs so seeking to a position is extremely vast because hey I I wanna go to sector five block seven on the desk go there right and immediately pull that data that's easy because that's the input so the consumers really build a little bit suffer because they have to figure out which position they want to read from but you cannot disadvantage or this you cannot do like a select star or from topic where I D call this I mean maybe the new sequel Kafka whatever they call it have this I think it's called Kafka sequel have this ability but at the heart of it and it's an append commit load only right happened only commit luck so that's like that's how it's very fast you can read fast you can write fast because that it is it is designed for events right an event happened and video has been uploaded a video has been compressed something happened you go to happened and the events do not change right if you if you do something you cannot undo it right in real life it is done and that's how a calf car essentially works in certain that so performance is amazing because of that distributed with zookeeper right well well with zookeeper the concept of partitions were still it's an append log right it's an append commit log you can distribute this partitions and you can have one leader partition and you can always rewrite to that one and the of the partition will follow that and get copies of the latest information latest messages right so it's very distributed and and because of the design because like they're short this partition they can very easily distribute these things around and you can easily scale as a result long polling I want to talk about that a little bit so RabbitMQ I made a video about rubidium Kim then reference it here and essentially what RabbitMQ does when it tries to send an information to a consumer it actually uses the push model and it can because it's a TCP connection even Kafka can do that but it doesn't right and the limitation of the push model is the consumer cannot really consume as fast as producers most of the time because you have you can have producers like Gary Vee pushes content like 700 videos a day right and and we poor consumers cannot consume any of it right apparently consume any of it so if you stop pushing pushing pushing information consumers don't have the cycles to consume that stuff with push model so Kafka flipped it it says hey let's use the polling model where the consumer ask for a message but we don't want to do a dumb polling model where consumer says hey do I have a message they have a message they have a message they have message no here's the criteria for long polling give me a response here's a is there a message and don't respond immediately if you if there isn't any messages right don't tell me hey there isn't any messages wait for X amount of time and by the way if the message if there are at least seven messages push them to me or if there's at least 700 bytes of messages ready for me push them for me if there's only data available push it to me don't don't tell me don't send me empty responses essentially right so that's sensual long polling is you make a request and you wait for it right so there is no empty misses right there's no miss requesters you'd make a request and you find out that hey there is nothing and these are very harmful if you make requests that saturate the bandwidth right and in soon CPU cycles both from the server and the client right so long polling in that other hand whatever we did right we rent the consumer and the consumers start doing their long polling behind the scene it's all the calf kgs in this example which is an amazing library by the way it does drug only for us right and I'm gonna make a video just separated just to talk about a long point so it's an event-driven architecture it's a pops up and it's a queue right all at once right it's a queue because you can put all your consumers in one group and each group each consumer will receive one and only one message no message will be received by other consumer which is that's what a queue is right it can be a pops up a broadcast message hey i uploaded this video please consumers consume a head right go and check the copyright service can consume that stuff the caching service can consume that stuff in cash the stuff that the the Content ID can consume it the the codec service can consume it other services can consume and they can do stuff for the same message can be sent to multiple consumer and you can do that by creating consumers with different groups right because if you do that we we essentially send a message for for each group the message is a group dependent thing yeah and it's also event-driven and a lot of people use micro services with Kafka right and the reason they do that is like Kafka is kind of work naturally with micro services because instead of markers are talking to each other let since everything is like almost like event-driven hey a video has been uploaded hey a video has been compressed hey a video has been copyrighted hey a video is ready to be edited something like that right so all these are events you can store the events in Kafka and listen to those events as a consumer and do something with it right so you can have everything as an event-driven Oh when something happened do this when something happened do this and instead of having to do this logic in your application which kind of gets complicated obviously it's scales like there is no tomorrow right because just spin up another broker and you're done because zookeeper will just pick it up and they will understand that stuff what I'll processing fewer consumers in one group then you can just shove just consume the rest like there is no tomorrow right because all of those consumers will hit that list of partitions in parallel so one topic multiple partition you can read them in parallel if there was no partitions you cannot do that you have to sequentially read the log but because now I use sharding I sharted my data I now can consume in parallel data the problem with this is as a producer it sucks for me now I have to kind of know that partition which kind of sucks which kind of complicate the application a little bit cons this is the big biggest hurdle and I think the community is working on I think I read somewhere that the community is working on removing zookeeper as a dependency from Kafka I might be wrong but I read a couple places I forgot where I I'm trying to find the article on generes reference and in the description below but zookeeper I haven't worked with zookeeper to be honest right so I'm I'm not an expert in this technology but I heard a lot of people engineers complaining about this especially at scale if a scale it behaved really weird with microservices a lot of people use zookeeper as a service discovery mechanism says hey tell me who's who where is the service discovery service that I need to communicate with the nf it's down it brings the entire system to its knees and that's bad right because you're kind of relying the entire system on this piece of technology and if it's down oh my god things can go wrong while I was making this video zookeeper start start making started acting really weird and I had to restore several times and and this is just with single broker right I didn't make a video yet to make it multiple broker and it's it's really complicated right guys so zookeeper kind of complicated things and skinning becomes maintenance essentially it becomes really hard and I can also add the consumer to this but producer essentially having the knowledge of which partition to publish to right can lead to problems because now you have the complexity of knowing the partition and we have the same problem today with relational databases charting right because the moment that you start shortening yeah you have database one two three four five well I'm I want to read user seven where the heck should I query which database should I worry is a seven seven eight and you have to know all by the way you have to keep the ranges the partition key they called her the sharding key you have to know oh oh this all this users actually on this database so I better connect to this server and yeah it's very beneficial but it complicated the clients that's why there's another software called Vitesse I think and Vitesse abstracts the sharding from you as a client and takes care of the sharding and all the Shari sharding and just make a query and we'll does everything for you and I think it's running on my sick one I think that YouTube actually uses it the tests that's what's called I'm gonna reference in the description complex to install configure manage it took me a whole day to figure this out guys to the video I made for just to spin up one broker and one zookeeper don't me whole day right just imagine how do you do how do guys how do people do multiple brokers I know there are like scripts that you just run and and just just take care of everything but I really didn't want to do that for you guys I didn't want to download that compose docker thing and just say hey there's download this unwind I didn't want to do that I wanted to understand what's going on I'm this is I think this is this is on me really I just do not like to copy and paste code I don't like to show you things that I don't understand because I owe it to you guys if I'm explaining something I need to understand it right I'm not gonna show you hey go this take the sea amel and they just run in and is gonna do everything for you maybe I'm gonna do it if if I understand the things but that that's why slow slows it that's why I it takes me a while to make videos because I really need to understand every single piece right and then and a lot of technologies are like that so you need to understand everything that's why I try to isolate everything make the setup as simple as possible but believe me if you go there I could not find those two lines of code that spin up a zookeeper and Kafka cluster has a single I could not find it I swear God or that's why I had to write it myself I have to understand what's going on and essentially I'd do it right the rest of the stuffs are very complicated and I wanted to make as simple as possible summary guys huge video I know I know I know all right so hopefully humanity the end right the comment section I made it the end all right if you did I really appreciate it guys hope you don't really enjoy this video so essentially we talked about kafka component we talked about the brokers I am talking about partitions we talked about producers consumers right we talked about zookeeper we talked about Venna we did at an example where we spin up a Kafka cluster a zookeeper all that jazz we wrote a producer and consumer yes we show how the load balancing is done we'll show how the the rebalancing and a group is done and all that stuff we talked about the pros and cons of Kafka right hope you enjoyed this video guys all the resources will be available in the description below I hope you enjoy this video I'm gonna see you in the next one you guys stay awesome
Info
Channel: Hussein Nasser
Views: 132,866
Rating: 4.9481621 out of 5
Keywords: Apache Kafka, Apache Kafka course, kafka components, kafka pros and cons
Id: R873BlNVUB4
Channel Id: undefined
Length: 78min 6sec (4686 seconds)
Published: Wed Nov 27 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.