How to handle message retries & failures in event driven-systems? How to retry when using Kafka?

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
in this video i'm going to be showing you how to handle retries and failures in a event-driven system when you use a mission platform like apache kafka but this approach can be applied to any mission brokers meshing platforms out there and goes actually beyond event driven systems so in a event-driven system you would have an event processor which would be consuming subscribing from a cure topic and it would be processing messages as they come in and so in this example you'd have the event processor processing message one so it would process message one and if it was successful it would publish the outcome and acknowledge or commit the message depending on your terminology and it would also do it for message two you would consume process and publish the outcome so now what happens if the event processor starts consuming the third message and it suddenly encounters an issue well you could just keep retrying that message so if you're using something like apache kafka you can't move on to the next message until you've acknowledged well committed to the uh consumers index so you're just going to keep on processing that same message over and over again unless you do something with it but if you just commit the offset of the consumer you're going to lose that message so what could you do in this case well you could publish a the message again to the topic with some additional header information like the retry account last time it processed and then put that to the next message so but in this example we've already had a message to come in so the full message and now the retry mess is now message five so after his publish of message five the retry message it would then acknowledge and then process message four and it will come to message five but again it may encounter an issue so you may do the same thing by publishing it again with additional retry messages and having logic inside the system we could say if after free retries push it somewhere else like ignore it or log it to the database and move on but it's not very good technique because you could actually have other event processors services consuming that event topic as well and what you're going to be doing is you're going to be polluting that event topic but a better alternative solution is to actually publish the original message message free to a retry topic and then actually have another event processor which could be within the same service consuming that retry topic processing it and if it's successful would publish the result to the same outcome topic however you might be thinking what happens if the retry event processor also encounters an issue what do you do in this case well the answer is actually quite straightforward and that is you publish to another retry topic yes stay with me so let's have a look at the full retry topic solution so the first retry event processor would fail it would then publish another message to another retry topic we've got a second retry event processor if it can't process it would then publish another message to another retry topic and another dedicated retry event process would be consuming that topic and if it's successful would again publish to the outcome topic but if it's fouled now depending on your now depending on your requirements if it's found so many times you may want to push it to a dead letter q where meshes can no longer be processed go and what you would normally have is you'd normally have a dead letter processor which is consume that queue which would be forwarding it that message to wherever it needs to be this could be being stored to a database table or it could also be at the same time raising a notification to a support team to an operations team to a development team to go and investigate or it's just might publish it to the outcome topic on failure because it can't process and it needs to send the absolute result to the outcome topic and let a downstream service handle that but you may need to use a dead let up queue and how i've seen it used before is you'd have the dead letter processor consuming the dead letter queue storing it to the database raising the notification login an event into a log system monitoring system another event could be raising an email could be sent which would then go trigger a support team to go and investigate and you may actually then have a monitoring dashboard built on top of your solution which allows the ability for operations to investigate what went wrong and if if required retry that message again for processing and in that case you you'd have it being published to the first retry topic not the original events topic but the first retry because you're going to retry again and that is a very simple example of how to retry and handle fouled meshes in a event driven solution now you might be thinking so we need to have multiple topics for retries and multiple event processes to consume those topics as well you might be thinking that's a lot to do it's not straightforward well no one said event driven architecture was easy event-driven architecture is complex it's not straightforward it's not easy anyone who says differently is wrong or doesn't know enough about event driven architecture
Info
Channel: Daniel Tammadge
Views: 1,638
Rating: 4.8367348 out of 5
Keywords: Event driven architecture, kafka, apache kafka, event driven architecture, microservices, messaging queue, fault-tolerant, event-driven architecture, web development, Asynchronous event drive, Fault-tolerant distributed systems, Videos for developers, Software architecture, software architecture and design, software architecture patterns, software design, eda
Id: GTHaVuThj_0
Channel Id: undefined
Length: 6min 28sec (388 seconds)
Published: Tue Feb 23 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.