You want to use Kafka? Or do you really need a Queue?

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
do you want to use kafka or do you really need a message broker in queues while they can seem similar they have different purposes i'm going to explain the differences so you don't end up trying to brute force some concepts and patterns on kafka when you're better off just using a message broker hey everybody it's derek martin from codeopinion.com i post videos on software architecture and design as well soon i'll have a video coming out on a kafka and that it's not an event store so don't miss these videos make sure to subscribe this video is brought to you by event store db the stream database built from the ground up for event sourcing cqrs and event driven microservices for more on about stored eb check out the link in the description so you want to use kafka okay great but what does it actually do so kaka is a log more specifically it's a partitioned log i'll get into the partitioning part later in this video but how it works is i have a producer here and you could have many different producers that are sending events to our log basically they're appending new events to our log so we could have many different producers and i'm just continuously appending events here now as time goes on there's a retention period for these events in this log it may be forever where messages are never removed but there can be a retention period that after a given period of time then subsequent messages once they reach that threshold are removed from the log so with event driven architecture we can have many different producers sending events appending events to our log so we have our three events here and our producer doesn't really know that there's any consumers at all there could be zero there could be many my example here i have two different consumers so the first consumer can basically consume that first message from the log our second consumer can do the exact same thing it could do do them concurrently and then from there maybe the second consumer it finished that first one it can get now the second event from that log now the thing is is that these events are still in the log whether you've consumed them or not that means that the first consumer it has processed the first message but there's really nothing stopping it from pulling that exact same message off again now this is a really important distinction to make is that just because you process an event from the log does not remove it from the log that means that you could add a brand new consumer and start at the very beginning of the event log and process events or an existing consumer if it's at a certain position within the event log it doesn't have to stay there it can go back because the events are still there assuming because of the retention period that they're still there and it could reprocess events that you've already processed again the retention period is what's determining when events are potentially removed not because you're consuming them so this distinction is important because there's two types of messages i've been mentioning events and events or types of message as well as commands so the way commands work is a little bit different we can have many different producers of a command that are going to send a message to a message broker so we have a producer sending some type of command to our message broker to a queue and we have a single consumer single consumer there will only be one not zero not two one consumer for that command and it's basically the authority to process that message that command so that consumer will pick up that uh message from the queue process it maybe it's doing some state change etc but there will only ever be one consumer for a command that means that events and commands have different purposes commands their intent is to invoke behavior that could be making some state changes some mutation there's some side effect there events is you're defining that something already happened you're letting some other part of your system know something happened the ownership is a clear distinction here is that commands are owned by the consumer because there's only one consumer with events they're owned by the publisher who's publishing the event is the one that owns that that definition that's schema for consumers a command there's only one consumer for events there's many different consumers or none who sends a command there could be many different publishers many different senders the events it's just that single logical boundary that's going to be a publisher the one that owns it and then again naming here is how you name these things less important for this video but generally commands are going to be the form of kind of a verb in a noun and events are going to be in the past tense now if you have commands or you want to use commands as a way to invoke behavior within some particular boundary but you're using something like an event log well how do you make an event a command now something like this which i'm not advocating i don't think this makes any sense based on what i just said which is it's called a command event which our events often fall into one of two categories messages and commands based off what i just said this does not make any sense at all messages are events or commands but you can see how you go down this road of kind of inventing a command event when all you really have is an event log and everything's an event how do you perform commands you can't you need a message broker now this is where a message broker differs from kafka and our log is that when we have a producer send a message to our queue to our message broker and we have a consumer pick up that message and process it what it's ultimately going to do it's going to tell the broker and it's going to acknowledge that it's actually successfully processed the message once that's occurred that message is then removed from our queue from our broker now this applies to whether we just have a queue what it applies to it well is if we have we're doing pub sub and we have topics if you have a consumer group that processes a message an event from a topic ultimately a queue that's still removed once you acknowledge that you've processed it that means that if you add a new consumer later there's no way to get these existing events that were published like we can with an event log another difference is how you scale now they both apply the concept of the competing consumers pattern but a little bit differently here so with a broker typically what this allows you to do is to process more messages concurrently so you have kind of first in first out guarantees with a queue or most that you're using and that means that you're going to be able to uh consume a message or pull out a message out of the queue in the same order that it was sent to the to the queue to the broker but that does not necessarily mean that you're gonna process them in the same order and that's because of competing consumers it's kind of the trade-off here is that you are able to process messages concurrently so as an example we have two messages two consumers within a consumer group and they're pulling messages off the exact same queue so we have our producer or it could be many different producers and there's two messages in our queue what can happen right now is that both consumers the first consumer will get the first message and then the second consumer will get the second message so we're pulling them off in order but we could be processing them concurrently and this can have many great benefits in terms of just adding more throughput but there is a kind of a cost here with how you're ordering and if you're expecting to process messages in order now at the beginning i mentioned that kafka is a partitioned log and this actually has its benefits when we're talking about ordering but it also has some downsides as well so in this case in terms of competing consumers it applies this pattern but to an individual partition there can only be one consumer so in my example i have a consumer group of two different consumers but you can only have a partition as i'm outlining kind of at the top and bottom of what looks like my my log there in the middle is that there can only be one consumer per partition so that means that as we're producing messages say i put this on partition zero and then i produce another message and i put it on partition one and we say have the first consumer at the top say it's responsible for the partition zero the first one at the top then it's gonna consume that message the bottom consumer is gonna consume the message from partition one the second one so while you can expand this out this makes you have to decide where you're publishing message to and what partitions you're part uh publishing them to so it's not just as easy as adding more consumers if you're dumping a whole lot of messages and you have a lot of messages going to a particular partition you're only going to ever have one consumer able to process one at a time so do you want a partitioned event log or do you really want a message broker in queues hopefully this video illustrated some of the differences because there's many more but there's a lot of confusion on trying to use something like kafka when you really want a message broker specifically if you're doing something like orchestration that i've shown in other videos where you're consuming events and then sending commands to other logical boundaries to execute some long running business process in workflow same thing goes with competing consumers understand the differences between how they're going to work with something like kafka and partitions and how competing consumers are going to work with a message broker in queues because you can get in a lot of trouble if you're trying to do it with an event log and something like a partition event like kafka where you have a single consumer per partition that's not to say that i'm against kafka i'm not i think in the right situation it makes a lot of sense the biggest draw here to me is that you have events that are persisted potentially forever or again depending um on that window of time that you allow them to be there that retention period but this is a lot this could be really good for data distribution um analytics these types of things where you can add new consumers and start from the beginning of the log and process that data but when we're talking about kind of business events and things that you want to occur or business concepts that you want to occur and again they're part of workflow generally there's we're talking about two things here we're talking about commands specifically and events that are derived from those things that have happened in those places i'd say you more often than not you're going to be wanting me to look at a message broker not a partition event log like kafka now they're not mutually exclusive you could be if you have the need be using a message broker and kafka so be it if you fit into that mold where you need these two different things but again my biggest thing with this video is don't go try using kafka and shoving something like workflow in command events whatever that is when you really just want a message broker thanks all the developer level members on youtube and patreon they get access to a private discord server where you can communicate with other like-minded developers about software architecture and design if you're interested in joining check out the links in the description if you enjoyed this video please give it a thumbs up if you have any thoughts or questions make sure to leave a comment and please subscribe for more videos on software architecture and design thanks you
Info
Channel: CodeOpinion
Views: 18,087
Rating: undefined out of 5
Keywords: kafka queue, software architecture, software design, event sourcing, software architect, microservices, message queues, kafka, event bus, event driven architecture, azure service bus, rabbitmq, message queue, message queuing, messaging patterns, microservice architecture, kafka queue tutorial, kafka queue vs topic, kafka queue and topic, kafka queue partitions, kafka queue vs stream, kafka queue vs sqs, kafka queue limit, kafka queue naming conventions, kafka queue full
Id: dpl4xKkPxHY
Channel Id: undefined
Length: 11min 42sec (702 seconds)
Published: Wed Aug 17 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.