What is Kafka?

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Users of modern day cloud applications expect a real-time experience. How is this achieved? My name is Whitney Lee, I'm a cloud developer here at IBM. Apache Kafka is an open source, distributed streaming platform that allows for the development of real-time event-driven applications. Specifically, it allows developers to make applications that continuously produce and consume streams of data records. Now, Kafka is distributed. It runs as a cluster that can span multiple servers or even multiple data centers. The records that are produced are replicated and partitioned in such a way that allows for a high volume of users to use the application simultaneously without any perceptible lag in performance. So, with that, Apache Kafka is super fast. It also maintains a very high level of accuracy with the data records, - and Apache Kafka maintains the order of their occurrence, and, finally, because it's replicated, Apache Kafka is also resilient and fault-tolerant. So, these characteristics all together add up to an extremely powerful platform. Let's talk about some use-cases for this. Or, actually, before we do, let's talk about how applications used to be made before event streaming was on the scene. If the developer wanted to make a retail application, for example, they would might make a checkout, and then, with that checkout, when it happens, they want it to trigger a shipment. So, a user checks out and then the order gets shipped. They need to write an integration for that to happen, consider the shape of the data, the way the data is transported, and the format of the data, but it's only one integration, so it's not a huge deal. But, as the application grows, maybe we want to add an automated email receipt when a checkout happens, or maybe we want to add an update to the inventory when a checkout happens. As front and back end services get added, and the application grows, more and more integrations need to get built and it can get very messy. Not only that, but the teams in charge of each of the services are now reliant upon each other before they can make any changes and development is slow. So, one great use case for Apache Kafka is decoupling system dependencies. So, with Apache Kafka, all the hard integrations go away and, instead, what we do is the checkout will stream events. So, every time a checkout happens, that will get streamed, and the checkout is not concerned with who's listening to that stream. It's broadcasting those events. Then the other services - email, shipment, inventory, they subscribe to that stream, they choose to listen to that one, and then they get the information they need and it triggers them to act accordingly. So, this is how Kafka can decouple your system dependencies and it is also a good use-case for how Kafka can be used for messaging. So, even if this application was built from the ground up as a cloud-native application, it could still be built in this way, and use messaging to move the checkout experience along. Another use case for Apache Kafka could be location tracking. An example of this might be a ride share service. So, a driver in a ride share service using the application would turn on their app and maybe every, let's say, every second a new event would get admitted with their current location. This can be used by the application on a smaller scale, say, to let an individual user know how close their particular ride is or on a large scale, to calculate surge pricing, to show a user a map before they choose which ride they want. Another way to use Apache Kafka, another use-case would be data gathering. This can be used in a simple way just to collect analytics, to optimize your website, or it can be used more in a more complex way with a a music streaming service, for example. Where one user, every song they listen to can be a stream of records, and your application could use that stream to give real-time recommendations to that user. Or, it can take the data records from all the users, aggregate them, and then come up with a list of an artist's top songs. So, this is in no way exhaustive, but these are some very interesting use-cases to show how powerful Kafka is and ways things that you can do with it, but let's give an overview of how Kafka works. Kafka is built on four core APIs. The first one is the "producer" API. The producer API allows your application to produce, to make, these streams of data. So, it creates the records and produces them to topics. A "topic" is an ordered list of events. Now the topic can persist to disk - that's where it can be saved for just a matter of minutes if it's going to be consumed immediately or you can have it saved for hours, days, or even forever. As long as you have enough storage space that the topics are persisted to physical storage. Then we have the consumer API. The consumer API subscribes to one or more topics and listens and ingests that data. It can subscribe to topics in real time or it can consume those old data records that are saved to the topic. Now producers can produce directly to consumers and that works for a simple Kafka application where the data doesn't change, but to transform that data, what we need is the streams API. The streams API is very powerful. It leverages the producer and the consumer APIs. So, it will consume from a topic or topics and then it will analyze, aggregate, or otherwise transform the data in real time, and then produce the resulting streams to a topic - either the same topics or to new topics. This is really at the core of what makes Kafka so amazing, and what powers the more complex use-cases like the location tracking or the data gathering. Finally, we have the connector API. The connector API enables developers to write connectors, which are reusable producers and consumers. So, in a Kafka cluster many developers might need to integrate the same type of data source, like a MongoDB, for example. Well, not every single developer should have to write that integration, what the connector API allows is for that integration to get written once, the code is there, and then all the developer needs to do is configure it in order to get that data source into their cluster. So, modern day cloud application users expect a real-time experience and Kafka is what's behind that technology. Thank you! If you have questions please drop us a line below. If you want to see more videos like this in the future please like and subscribe and don't forget: you can grow your skills and earn a badge with IBM Cloud Labs which are free, browser-based interactive Kubernetes labs.
Info
Channel: IBM Technology
Views: 120,117
Rating: 4.9329844 out of 5
Keywords: Kafka, IBM Cloud, Cloud, Cloud Native, Event Streaming, apache kafka, producer api, API, consumer API, message broker, real time data, streaming data
Id: aj9CDZm0Glc
Channel Id: undefined
Length: 9min 17sec (557 seconds)
Published: Fri Sep 18 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.