Better way to consume Kafka topics with Confluent Parallel Consumer

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hello and welcome to the channel let's talk about uh Kafka consumers today so we all know uh that there is like a great uh Kafka clients that contains producer and consumer and probably you all using that but what if I tell you that there is a probably a library that worth considering as a replacement for the plain Kafka consumer and uh today I'll show you what is the biggest problem uh that um a lot of applications uh will encounter with a plain consumer uh so you'll have to build a lot of uh cord around that uh to unblock your processing and how this parallel consumer Library which actually comes from confluent um that are known for the Kafka expertise um and how that Library could be really helpful in your application so to start uh let's discuss uh some basics in Kafka so we have um topics right and in topics we have partitions so let's say uh we uh have two or let's say we have three partitions uh P1 uh P3 P2 and uh P3 and in uh they are all in same topic let's say T1 uh so inside each partitions we have uh messages uh that we produced to the topic so let's say three messages here three messages in Partition two and three messages in Partition three right so first of all uh why we need uh partitions in Kafka and basically the reason for that is that partitions in the topic is the way you SC scale horizontally right so each uh consumer group uh consumer group instance um we'll have uh some some partitions from the topic and uh the guarantee is that uh a single partition could be only assigned to a single consumer instance if uh a broker decides that you need a rebalance uh your consu group enters the rebalance process and uh the partition is revoked from consumer instance and is assigned to a different one and if we get more events in our system uh we want to scale our processing somehow and uh the answer for that in Kafka is just to increase your partition uh count in the topic so if we have more partitions we can spin more instances and maximum concurrency that we can achieve is to spin uh exact same number of instances in the consumer group as we have uh partitions in the topic um and that's all good good but um it's not really flexible so first of all partitions are not uh coming for free right uh you create more partitions you need more resources in the broker uh because it's additional files additional uh things to manage to the broker uh and also I remember some time ago I noticed that even like a empty partition entirely takes quite significant um amount of space on the ER due to some meta data Etc so that's the first thing the second one is that you can uh kind of easily scale the topic up so you can increase the number of partitions the problem that you cannot scale it down at all so the way to to do that you need to create a new topic replicate your data uh and like basically F the new topic with less number of partitions with the same data um it's a bit annoying to do uh and also you cannot do it uh fast enough so you can forget about uh doing the repartitioning on the Fly uh trying to adjust your topic uh to the load uh because it's not supposed to work like that so if you know that your uh application is using more um traffic will have more traffic uh or just like the traffic is growing you just need to scale your partitions ahead of time but that's not the only problem um usually consumers uh just Loop through the messages that they get uh from broker and one if one of the messages is failing uh we cannot continue because we usually want to retry that message so our consumer is entering like a retry Loop and I can demonstrate it here so let's say we have uh first message in Partition one that was successfully processed but then something happened in this message and the logic in our uh consumer is now doing a retra Loop right here and trying to process this event so we will uh uh never go here so this will just wait uh until the message before will be processed but what that means uh if I have a topic uh with three partitions a single message that fails in a partition basically um reduce the processing uh capacity of our application uh like basically 1/3 right so we fully blocked here in Partition one uh and until we manually skip this event somehow um or this error is self-resolved uh we cannot process with uh with this partition and uh here why you want to use something more um complex compared to a simple cut consumer and here we have a parallel consumer library from confluent so it still use uh plane consumer under the hood to get the messages from uh from the the broker but after that inside the Library we will have uh a thread pool let me like thread pool inside uh parallel consumer so all these messages uh from a partition will go here and it they will be splitted into a cues and the most uh interesting uh pattern that we can use here is to use uh key partitioning so let's say uh key one will contain uh will be a separate q and key2 which is the keys different keys and let's say we have uh uh so this was uh key one and this was uh key2 and then have key one here and um sorry key key one here and then we have some other messages let's say like key3 and um key1 here so in this case if we configure parallel consumer with uh key ordering guarantee uh we have a different streams of events and only uh queue that has a the failing message will be blocked so we won't get uh this uh K2 message uh before we process this failed uh message but all this like key1 will be processed key1 and let's say we have some other keys key3 there will be processed as well so this is pretty cool uh of course uh we have a different problem now let me remove this uh and the problem is um that we cannot commit uh processed messages if there is a gap in processing so let's say we have a graph like this so we have log which is uh number of records that our consumer is behind the real time and then we have time here and uh usually uh your application lck will go something like this under normal conditions so it will be for partition one partition two and then uh somewhere here we have this failed message and partition two wasn't affected so it still goes smoothly and then here in Partition one we have this button of lug growing um so the picture is exactly the same in parallel consumer and in normal consumer the difference is that here we still processed events from this partition but but we cannot report to Kafka that we can commit all the messages be because we still have have a gap right here uh and one message is still in the retry Loop right so parallel consumer will wait and it can only commit a sequence of successfully uh processed events but now it's just um reporting problem right because usually you'll have something like this which will be your alert threshold when we reach the point right here you'll have like a message and like on call engineer will have to deal with this uh and in normal consumer it basically means one partition is completely blocked and uh all events that happens to be in this partition are not processed in our case much better um we need to figure out how to solve the report aert but actually only small number of mess is affected and that probably means that if we partition our topic by custom ID if there is a problem or bug specific to one event um only one customer will have some problems everything else will will work just fine and um to solve that in uh parallel consumer Library uh there is a micrometer integration uh which you can expose metrics specific to the internals of the Library into Prometheus into Data do whatever um monitoring tool you're using and you can monitor how much events are in the reprocessing queue uh and some other useful metrics like how long it takes to process a single event Etc so I will share the link to the library uh I really recommend you go and check the docs uh really nice documentation on GitHub uh there are demo about uh performance compared different uh order guarantees like you can potentially even use uh fully unordered processing so all the messages no matter what key they have will be processed in parallel which will achieve uh the highest throughput but usually applications have some uh ordering guarantees and that's why we use uh some specific key so we keep uh messages ordered inside that key and we can block uh just subset of events and um also nice video overview uh much more in depths that I've done here uh but I hope I just gave you a bit of uh motivation to check this out and now let's uh quickly take a look into uh an example uh I created a simple application so what we're using here is uh test containers and kafer test container to start kafer in Docker and then uh this is the library I was talking about you don't need any other dependencies they'll come through through the library uh and then in the main uh I just starting the um test container for Kafka and then I have uh just plain uh consumer and plain uh producer uh and then uh a single topic single partition for now uh and I have two keys basically key one represents customer with ID one key2 represents customer with id2 and in case we send a fail as a message uh it will throw an exception so you see here we check if message equals fail we just throw runtime exception and in a normal um consumer when we have single partition uh you will process this first event then you process second event and then on this one uh the consumer will enter an infinite retry Loop uh and we will see that in in a sec if I change this ordering here to uh partition this partition means exact same order guarantees as normal Kafka consumer so basically we read all events in order inside a partition and then if something fails we block and retry that event so in this case um um just quickly uh this is how you define options for parallel consumer uh there are more settings you can provide but uh the most important one is ordering uh Max concurrency is uh kind of size of the threat pool which will be used to process the events and then you need to pass a plain consumer because plane consumer is still used as uh mechanism to pull messages from the broker and distribute it into internal thread pool uh then we have stream processor uh we just created from the options uh then we need to subscribe to a topic and then inside this poll uh you basically provide a Lambda that gets the context and uh here inside this function you'll basically write your business Logic for the event processing for my case I'm just print out some information print out some uh metadata about failed attempts you'll see that in a sec and also if there is any message except fail we just do nothing but if there is a fail in the message we throw a runtime exception uh let's run this thing and see how it looks but yeah as I said as we have this uh partition guarantee now we will uh process first two events and then fail on the fail event and as we can see here uh we processed first event then we uh processed the second one and then we have this fail and then now we have retry Loop and our failed attempts count is going up so to let's stop this and to compare uh now if we change it to key uh key ordering uh let's see what messages we have so uh this will be the same this will fail but as we have key to here we now will see this message and let me put it here uh message three as well so we will see this and this but we won't see this one because it still have the key one and it will be blocked because this message will be in a retry Loop let's rerun and see so we're producing data all right uh let's see we have message one uh then we have uh two of two and we should see yeah two and three so two and three is the last message that we supposed to receive and we still have this infinite Loop for the failed event but now compared to the normal consumer uh this is in retry Loop and we're just blocking this one uh for the same key if uh there'll be a real time event with a key three it will be processed just fine um so yeah that's in a nutshell how it works and uh we are using that library in production uh and I think it it was a successful um outcome uh because before that we had some custom waves to ways to do same thing kind of same thing and uh I think it's better to rely on a library where a lot of people put their effort in um than trying to recreate something similar so yeah hope you enjoyed the video um please like subscribe and uh comment and there is a way to support my channel on buy me a coffee if you want link will be in the description or in the pinned comment thanks a lot have a nice day uh see you next video bye-bye

Info

Channel: Andrey Fadeev

Views: 407

Rating: undefined out of 5

Keywords: #clojure #bun #htmx #software

Id: mVYe_r0SBV8

Channel Id: undefined

Length: 17min 6sec (1026 seconds)

Published: Thu Apr 25 2024