Azure Event Hubs

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello world and welcome to this edition of tech on fire with blaze i'm blake stewart architected winnelect and today we're going to be looking at azure event hubs and how you can use this for streaming data on azure [Music] [Music] hi guys today we'll be talking about azure event hub and how this relates to other types of messaging platforms that we've talked about already on azure now we've already looked at event grid which is useful for getting data out of azure when something happens on azure say i create a blob and i want to respond to that well i can use event grid to respond to that event it's going to emit a message and then i can wire that up using any number of different services on azure we've also looked at service bus and how you can use this as a general purpose tool for building out messaging inside of applications now event hub is more of a specialized version of service bus in that it's designed for sending messages but it's designed more for streaming data versus just sending messages like service bus does now with service bus what you have is a publisher and subscriber the publisher will create a message put it on the service bus and the message will stay on the service bus until either the message expires or the client that is the subscriber will connect to the service bus either to a topic or to a queue and pull that message and respond to it with event hub that's not the case with event hub what ends up happening is you have the producer produce the message and the messages put onto the event hub now what the event hub is wanting to do is forward that on to the consumer right away without persisting that on the event hub for any period of time if there is no consumer then the message is more or less dropped and it is not persisted in a way that will allow the client to connect at a future date and then download or pull that message and then respond to it so that's the biggest difference between just service bus and the general context and what event hub is trying to do now i did say that the messages drop and it is not persisted on the event hub there is a way in event hub to persist messages so that if a client were to disconnect and it needs to connect at a future date and get older telemetry that was coming from producers then there is a way to persist that to something like a data lake or blob storage so it's not going to throw away data in this sense that you're going to lose it unless you just want to but the use case for event hub is strictly around that streaming type of workload so it's going to assume that you have a live publisher and a live consumer and it's just going to be basically brokering the connection between the producer and the consumer by collating messages and then forwarding those on to a consumer so let's go look at this in the more abstract sense and then we'll go over to azure portal and show you what this looks like there and then i'll show you some code samples of how it's implemented so vinhub has the same basic architecture a service bus has but it's based on service bus but instead of calling it a publisher we call it a producer now producer could be anything it could be an iot device it could be back in process whatever it is it has some kind of event that needs to let something else know about and that's what we call a producer creating a message and then it puts that message onto the event hub now there could potentially be thousands upon thousands of diseases producers all creating messages that get sent to an event hub and so event hub is designed to be highly scalable it can consume potentially millions of messages per second all being generated by these producers so it's got ways of partitioning those out and different schemes for handling that scale but the basic point here is that you have just thousands upon thousands of messages coming into the event hub from all these producers that kind of get forced into that event hub and then on the other side of that you're going to have a consumer remember this is oriented around streaming but eventhub can be configured to persist data to something like a blob storage account or a data lake and as i mentioned this is oriented around streaming and it's assuming that there's going to be a client to consume the messages it doesn't persist them like a service bus would it it's going to afford it or it's not going to do anything with it so if you don't want to lose data you can configure event hub to write all of that to a storage account or a data lake and then you can go back at a future date and then read those messages but if there is a consumer the consumer will be there to receive those messages as they come off the event hub and it can also go back and look at the blob storage account or the data lake to read anything that has been persisted to that blob storage or data lake as well so this has the potential to grow massively and also archive lots and lots of data that's coming into the event hub now this basic architecture is one that you will find a lot of times in iot workloads because it's a very popular way of capturing events but also persisting them in a way that will allow someone to come back and look at those at a future date and it's using a file format called avro which we'll see when we go and look at this in the azure portal but this is the basic architecture and i just want you to kind of have your head around this when we go look at this because it's important to understand that these are the basic working components of it it's also important to understand that the streaming orientation of this basically means that you have pursuit producer to event hub to consumer as a linear flow without interruption and if there is an interruption it basically breaks the flow and it doesn't try to persist it on the event hub for the persistent life of the message rather it just drops it or it writes it to a permanent storage solution such as blob storage or data like so with that in mind let's go over and look at this in the azure portal so you can get an idea of what it looks like there and then we'll go and look at some code samples so i'm here in the azure portal and i have a resource group with two resources created i have my event hub name space in my storage account and my event hub name space is the parent resource for event hubs and this is obviously where i'm restoring the messages whenever they come through off of an event hub now if i come down here into the event hub namespace i do want to look at something in this mostly the scale but most of this stuff is pretty typical of what you would have inside of an azure resource with shared access policies uh networking for firewalling and private endpoints and that kind of thing and other types of settings that you might have for common resources on azure the one i do want to call out it's the scale right here and this is because it uses something called a throughput unit which in uh in the then hub is basically a function of bandwidth which is uh the number of bits per second that you want to allow for on your event hub or the number of messages per second so 40 000 messages per second or 400 megabits per second depending on what the the threshold throughput unit is going to to have on it and this is one of those things that's going to impact your billing uh whenever you tune that so you don't want to over allocate this or in this in that sense where you're going to be paying for more than you need but at the same time you don't want to under provision it to impact your performance and that's why they have this auto inflate right here you can set a maximum threshold on this and this will allow you to have up to a certain number of throughput units and it will auto scale the actual throughput unit depending on the demand that's coming in through your event hub so this is a nice utility to set this up if you kind of know what your metrics are going to look like and that's pretty predictable you probably don't need the auto inflate but if it's going to be variable you don't know what that's going to like look like maybe turn this on and that will help you have billing in line with your actual consumption other than that the only other thing i do want to look at under this particular pain is of course event hubs and this is where i can create and manage the event hubs that i'm going to have in my vietnam namespace so my scale is a function of all the event hubs that i have inside of my eventhub namespace now if i wanted to create an event hub i can simply click on event hub right here and i can create a new one so if i go blaze dash eh-2 or something like that i can create a new one now i do want to look at these two uh sliders right here and then this capture right here because this is where i can figure where messages are persisted now partition count has to do with downstream concurrency so you can have a lot of junk being pumped into your event hub through events that are happening from your producers but downstream you can only have so many consumers so the more partition accounts you have the more possible consumers you have so that's what this this particular slider is going to be for right here message retention is for retaining messages on the event hub now i said earlier that event hub is not intended for long-term message retention it's not like a service bus where it's going to maintain a message either till the message expires or something picks it up it's not really for that so don't tune this expecting the message to be there when the next time your client connects this is really for playback so in the event that you're doing development work and you want to go back and be able to replay messages through an event hub you can turn on retention up to seven days and you can go back and replay messages through the event hub because the retention is turned on and so this is good for debugging things or building models on or things like that where you're going to be doing more dev work or debug work but this is not something that you should count on for the kind of thing that you would want a service bus for which is to have a message sit on the actual service bus until something picks it up this is not for that so don't think that's what this is doing whenever you turn that particular slider on now if you do want to persist messages in a way that you can access them at a later date that's what this capture feature is for and this is where you tune the capture feature right here and this is basically the time window for however many uh minutes for the interval where before it actually goes and writes data to the blob storage account or the data lake and this is the the the threshold like how many megabytes of data do you want to do before it actually writes the file so if you want a tighter time box one minute if you want smaller files you know tune that down to 10 megabytes whatever it might be and so you can then use this to tune that and that's just basically the shape how the file structure or how many files really you have inside of your blob storage account or your namespace and this is where you can choose storage counter or your data lake depending on which one you're using and then this is where you can go out there and you can pick whatever it is you're using i you selected data right here but i've already done that for my other uh particular event hub so i'm not going to reconfigure this one so once you have that configured you'll have something that looks like this and this is what the event hub looks like and the event hub has the ability to go back and re-configure the capture if you so choose and you can see where mine is shaped according to uh one minute and two minute 10 megabyte files and that's because i'm not pumping a lot of data through this and in the actual container it's going to create a a structure that looks like this for the the actual data path so name space event hub partition id year month day hour minute second and then it's gonna and that's going to be the name the file is the second and that's how it is going to look so i'm going to end up with something that looks like that for the file name in my given event hub and the storage that's associated with it now this once it's set up it's just a matter of attaching your producers and your consumers to it now you can also do something with this and that is do event hub processing through stream analytics i'm not going to get into that today that's a future video for another time but this is where you can connect stream analytics to an event hub and do real time processing on it using azure stream analytics which is the streaming service for querying and massaging data in real time on azure and then you can attach alerts and other things related to that however we're just looking at it emotionally from the context of developing producers and consumers and then seeing how those actually behave once we have our event hub set up so once you have this the next thing you'll that you'll need to get then is your shared access policy and this is where you can go over here you can create one of these and you can get your connection strings and i already have my connection strings and this is what we're going to be using in our code so let's go look at the code and we'll use the code as an example to show you kind of how this behaves so i have up here in front of me the client clients that i'm going to be using for my producer and my consumer now this is my producer and it's pretty straightforward it's the same basic code that i use for service bus i just adapted it for then hubs and i changed very little the logic is basically changed the sdks and some of the the calls to some of the methods and give it a new connection string and everything works fine and it's basically just sending up 10 up to 10 000 messages that are spread out by about 100 milliseconds so the producer is pretty straightforward the consumer code on this is different than what you might do with something like a service bus and that's because this uses this idea of a blob storage checkpoint and this is basically a way of bookmarking essentially where you last looked at the messages that were coming through the system and that's what this library is for it's a specialized library that uses blob storage to look at okay i last checked in an hour or two hours ago and so i'm gonna basically bookmark that and so the next time i connect i know where i was the last time i looked at the code or looked at what was there and so i can just get the changes in the messages and so on so this right here is the basically just listening for messages other than that piece of the code so again a pretty simple piece of code here but it does have that additional nuance with the actual storage that it's going to be using in conjunction with this because again uh vin hubs are not really intended for persistence now when i said earlier that they don't persist messages that's maybe not entirely true they do persist messages but it's not something that is tunable or something that you can really say with any confidence that the message is going to be there the next time your client connects so that's why you cannot depend on it and that's why you shouldn't depend on it so this is again designed for streaming and you might get the messages next time your client connects but the intent is not to assume that it will be there rather it's something that you just expect the client to be there whenever the the message is published that the consumer will be there to receive that message and if not that it goes and gets it from a persistent storage if that is so how you configured the actual uh event hub to behave so let's go ahead and start this producer and consumer i'm going to first start the consumer because i really want that to be there once the messages start coming through so node index.js and this is just going to sit there and wait for messages to start coming through and let's go ahead and start this right here this is our producer and i'm going to start with this one with node index.js and this is going to just start sending messages up to my producer which is right here at my producer sorry to my consumer which is right here and those are coming through basically as i send them as it's going to start writing them let's start up another instance of my my uh producer here and i can just start up these to my heart's content and let's create a few more just to see how this is gonna work and so we can have lots and lots of producers and a single consumer that's basically going to be handling all of those messages that are the producers are going to be writing into the consumer so my consumer is going to be going nuts right about now i mean again this is not going to be you know nowhere near the capacity of what an event hub can handle but you can see here that this is this is getting messages at a much higher rate than uh then i'm able to produce them with my my given producers which are producing them at a much slower rate so you can kind of see there that's that's what's going on with these various producers and consumers and if i kill these off again you're going to start seeing the drop off of the number of messages that get produced on this particular consumer over here and i still have one of these that's going right here and if i kill this one off then this is going to stop receiving messages right here and if i start it back up then we should start seeing messages resume so again producers multiple producers to a single point basically few consumers lots of producers that are going to be coming in through the event hub so you kind of get this funnel effect where everything's kind of going into a consumer from a bunch of different producers based on events and this is again for streaming data so you don't want to assume that the message will be persisted now with that in mind let's go over to look at the persisted data in any case that is in the storage account because that is where the messages are going to be written for a long term storage solution in the event that i need to go back and look at those and i need to make sure that those are being processed if something about my consumer were to go down so here's the storage account that i'm going to be looking at and the data is in a container called data that's just what i called it and it's in this particular folder here and if i drill down into this you can see that it is the event of namespace then this is the event hub this is the partition year month day and then uh here's the hour and you can come back up i've got the most recently this this hour right here so if i go back over here and look at some of the messages that i might have uh saw earlier this one right here and this is going to be one of the data files that came through now it's going to write one of these regardless if i had data entered or not so i'm basically just going to get one of these and download it so once i downloaded this this will be in my downloads folder and the the resolution on this again is depending on how you configure it now i have uh set up intellij on my machine because intellij has a built in avro viewer and if you launch intellij you can download the intellij idea from the internet and then once you have it downloaded you can get a plug-in for it and the the plug-in is basically just a a plug-in that you can install in lj and it is just the ability to look at these avro files which are more out of the the java world than they are out of the anything because they came out of the apache space and this is the the plugin you can get for it and you just download it drag and drop the zip file on top of intellij and then it installs this tool down here this parking uh avro file viewer now park a is used uh in data like workloads it's a a tabular format designed for uh data stores that are tabby tabular in their orientation but aren't databases so it's a file format that's useful for that kind of thing and this is one that is used for avro files because it's really the only tool there's other tools out there but this is probably the best for it so once you have the file downloaded you can basically just drag it over here and look at it and this is the schema and if there's data on there it's just a tabular uh data that you can look at and so this is exactly what we're seeing and this is the historical data uh message id and you can see this is you know the data that was coming through the sequence number the the system uh et cetera and this is basically just you know tabular data that was in that given time box for the resolution that i set so for a given minute this is what it produced and this is the schema that i read and then this is the data now with streaming uh analytics you can use sql-like uh queries against this data to filter it and do transforms on and that kind of thing so that's gonna be another video for another day but this is how you can work with the avro files in the event that you want to kind of see what's going on in those but otherwise the data is just persisted in the structure that we saw whenever we looked at the storage account so hopefully this video has given you some context on when you might use an event hub compared to something like service bus now i've tried to make the distinctions between these two services because they are very similar in their approach and service bus is more general purpose and this one is oriented around streaming so hopefully you've kind of seen that in the way that the service is set up and the various options that you can tune on this for the various kinds of things that you would use it for now we're going to be adding to event hubs and service bus and kind of linking some of this stuff together so you can kind of get an idea of an end-to-end solution on azure for a hot path type of data we've looked at batch oriented workloads when we looked at the the platform as a service offering through azure data factory and of course there is also data bricks that you can use for both hotpath and batch oriented workloads but this is just looking at the more the platform as the service offerings and we're going to kind of link some of these together so that you can get a better idea of how this would look like in a real world solution where you have data coming in through something like maybe an event hub and you're piping that into a solution that is going to be able to process that data and then present that through some kind of api or something like that so look forward to that video and hopefully this will kind of give you an idea of the messaging suite that is available on azure if you like this content please consider visiting us online at www.wintelec.com and there you can find about services that whenelect offers including training and consulting services also please consider subscribing this channel by clicking on the subscribe button and clicking the bell icon to get notifications when new content becomes available and also comment down below you can also follow me on twitter at the1mule and also follow intellect on twitter at winnelec now or at winnelect we are constantly posting things about azure related technologies and things related to software development you can also reach us by email at consulting whenelect.com until next time thank you you
Info
Channel: WintellectNOW
Views: 168
Rating: undefined out of 5
Keywords:
Id: zm1XUTAa9sc
Channel Id: undefined
Length: 23min 38sec (1418 seconds)
Published: Wed Dec 01 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.