Understanding Kafka Performance Metrics

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello and welcome to another video tutorial by easy academy in this particular video we're going to be covering how to understand the metrics that are being generated in our performance test so this is going to be discussing some of the metrics you will see when you run the producer performance test client and also the metrics that you will see when you run the consumer performance test client so let's get started with today's objective in today's video i will start by showing you a sample output from the producer and then showing you a sample output from the consumer and then we'll take a look at some of these different metrics for each of these clients to understand what it means and then i will take a look at some of the different web pages and documentation from apache kafka that you can take a look at yourself to do a deep dive on each of these individual metrics to understand what you what they mean as you tweak the different knobs to see how your adjustments and your configurations are affecting the overall performance of the environment either for the producer side or the consumer side so let's take a look at some example in this example here i'm going to run some tests on the producer side so here we're going to generate 10 000 records each record is on a bsi about approximately 1000 bytes and we'll go ahead and run that so let's do that and see what happens so when we are doing this what you're going to see is that it's going to show that it's attempting to transmit the records and then immediately after it transmits the records we see this huge output that is being displayed let us scroll to the beginning and see what is going on here so we see we see the output and then we see that we were able to send 10 000 records so let me copy this output here let us inspect what is happening at this producer for this particular example so here the the very first line is showing us the total number of records that were sent in this case 10 000 records were being sent the average number of records per second is 925 records per second it shows us the latency so on average we had about 4.5 seconds or 400 milliseconds average latency the maximum was 99.37 and then we have this percentile numbers so some of you may not really understand percentiles i just i'm just going to take a a few moments and explain what this means the 5th 5th percentile here shows us that 50 of the records fell under this particular amount of milliseconds in terms of latency so about 50 of the records were below 4348 milliseconds in terms of the latency that was observed at the producer then we moved to the 95th percentile here we have 88.38 so that means that 98 95 of the um examples of the instances or the population fell under this particular number and then 99 percentile here shows that seven nine one so nine thousand seven hundred ninety one uh milliseconds was the 99 percenter if you have a sample population if you have a particular percentile so like 50 it means that 50 percent of the total sample set fell under that particular number in this case here the latency is quite high so we would like to keep this very very low and not observe this particular type of latency so we have the 15th percentile we have the 95th 99 and then we have 99.9 so that shows you the sample set and what each of them represent in terms of the percentages of the of the population that fell on that particular number so it's good for us to keep the maximum very low keep the average very low and then make sure that a lot of them are falling under a very low number so that is what that particular one means then we have these other ones here that are showing all the commit id the start time the version and all those things the commit id this has to do with the version control like git hash and the commit id for that particular client and then we have the start time in milliseconds for unix time when this particular test started and then we have the version number of the client in this case i'm using 3.1.0 and then we have the total number of metrics over here we have 135 and then each of these different things here like batch size batch split red buffer available bytes all these different things you can find them if you head out to the kafka documentation page so you head out to kafka.apache.org documentation and then under monitoring in section 6.8 subject section 2. if you take a look at that you're going to see some of the metrics that are available for the producer consumer and the connect and streams outgoing byte rate request read request total all these different things are being defined here so you can take a look at them and see what each of them means like we have request total so this is the total number request that was sent from this particular instance they call it node so this was sent by the particular consumer instance or producer instance and then if you want to focus on just producers we have things like buffer turtle bytes so if you take a look at this particular output here we have buffer total bytes but what does this mean so this is the amount of buffer memory that is available for the clients to use um whether it's used or not and then we have the available bytes so this is the one that is free and it's not yet allocated all these different metrics are being explained in this section so it's important for you to take a look at this the link will be in the description and you can do a deep dive as soon as you get the output you can explore what each of these different about 135 of them what they mean i'm going to take a look at some of them in this one here we have such things like record send red and record send total so the total number of records that were transmitted from the producer to the broker was 10 000 and then this is the send red so if we copy this here and we head over to this page we can take a look at what this meant and we can see so the send rate is it is an average number of records that were sent per second and the same total is the average to the number of records that were sent and then we have all these other details here so this is very important that you take a look at all these different data points and understand what they mean um they are quite a lot so going through all of them here will be very time consuming but i encourage you to take a look at it because as i am running the performance test it's really important for us to understand what each of these numbers mean so that we don't just look at it and then we don't see how the adjustments of the configurations is having an impact in the overall flow of our test that is what i have to cover now from the performance producer side of things so all the metrics are available here all you have to do is head over to the documentation go to 6.8 under monitoring and then you can look at the producer monitoring metrics and you'll see all the different definitions for each of these things here so we're going to take a look at that the next one i want to cover is when we run the consumer the consumer is going to go ahead and pick up the metrics the consumer picks up all the records that were being transmitted to the broker so running this here i see different data points that are coming out like start time end time i see data consumed i see number of messages per second and all these different things similarly if we head over to section 6.8 under consumer monitoring we're going to see these different metrics as well we'll see the comment rate this is number of commits that call that i made per second we see um let me take a look at another one so we see byte consumed average number of bytes consumed per second to the number of byte consumed the latency is described here the fetch size the fetch rate total number of records and effect are fetched and then we have the records lead which it indicates how far we are from the high water mark so the high water mark is the position in the log the kafka partition lock that describes where we are in terms of the current replica of that particular um partition instance so if we if this number is too high what that tells us is that we are kind of um far behind i may have to do something about that if it is too low it indicates that if it's very close to zero it means that we are pretty much caught up with the particular replica and you need to do something about generating more records from the producer side so a very high number is not too good and a very low number means that we're gonna we might have to wait because the consumers are kind of like very fast in terms of catching up with the replica followers so that is what this particular one means it's not really obvious by records lead but if you take a look at that you can see so we have the bytes consumed the fetch size and all these different things if you take a look at that you would see all the different metrics that we have latency is going to be very important for us to pay attention to that so looking at all these different lags and lead times and consumed red and latency average and minimum max fetch and all those things it's gonna be very important for us to pay attention to that but i just wanted to show you an example here where we have the output and using this uh plugin we can see all the different colors and understand what they mean the start time the end time of the test how much data was consumed in megabytes the number of megabytes per second we have the data consumed in terms of number of messages number of messages per second the rebalance time the fetch time how many bytes megabytes were fetched and how many megabytes per second was fetched so that is very important for us to understand and also take a look at this and this will also give us metrics that we can use to analyze how our adjustment of configurations is impacting the overall flow of data between the producer to the broker and then to the consumer and then that is that is what we have covered so far so we reviewed the producer metrics we also reviewed the consumer metrics and it is going to be very important for us to head out to this page bookmark this section 6.8 because this will really help you as you are running this test to see what is happening at the producer and what is happening at the consumer in other videos we might take a look at connecting streams but for this particular one the main focus is to analyze what is happening as data is flowing from the producer to the broker to the consumer so this page gives you all the information that you will need that was everything for this particular video now if you like this particular content i strongly encourage you to subscribe because if you subscribe and click on the notification icon as soon as new content is available you will be the first to find out right away you can also support the channel by visiting our patreon subscription on the patreon platform to support easy academy and then you can check me out on twitter to follow my twitter handle and get updates as they are available i also have a website where you can check out some of my content and my courses and then if you are very interested in learning more about how to use open source data platforms like kafka like flink and mongodb and so on using all these different open source data workloads and software to process data you can take a look at this course and give me feedback on what you think about it so that was everything for this particular video i really thank you for your time and i will see you in the next video
Info
Channel: Data Engineering with Izzy
Views: 977
Rating: undefined out of 5
Keywords: understanding metrics, kafka metrics, apache kafka, performance benchmarks, producers, consumers, streams, brokers, latency, speed, performance, linkedin, uber, throughput, availability, durability
Id: DCSsyYIAf3s
Channel Id: undefined
Length: 14min 44sec (884 seconds)
Published: Sun Feb 20 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.