Azure Event Hub Tutorial | Big data message streaming service

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey guys this Adam when I started my own journey with Azure I quickly became overwhelmed by the sheer amount of messaging services that were available to me but I found the best work for me is learning each one after another and for third reason today we're gonna get introduction into Big Data messaging service in Azure called outer even halves stay tuned I want to start with key characteristics when it comes to even halves first of all this is a big data even streaming service assuming is a process where you're continuously sending data to your service second of all it's scalable and it scalable both in size of the data that it can process it goes up to terabytes of data but also velocity which is millions of events per second that it can process and third it's reliable so there's no data loss because it is designed to be agnostic for failures and lastly even have supports multiple protocols and SDKs so if you're using a standard HTTP or AMQP protocols you're good to go and if you're using any of the most common languages on the market you have SDK available for you so you don't have to learn the API for the even hub when it comes to supported scenarios for even hubs you can think of it like pretty much any scenario where you analyze a stream of data feed anomaly detection life touch boarding transactional processing or just archiving data even hubs are good for that so let's talk about basics because when working with even hubs you're gonna get introduced to quite a few new terms that you need to learn first of all we have even producers those are applications and services that will be sending or even the data to the even hub through AMQP which stands for advanced messaging Queuing protocol HTTP or Apache Kafka those services are your sources of your events and those events are sent to even hub each even hub is partitioned you can have from 1 to 32 partitions I marked one as start because everywhere in documentation you're gonna find information that the minimum is 2 although if you're gonna go to the portal you will be able to choose minimum one partition and it's going to work just fine you choose the number of partitions when creating even hub and is not changeable after creation so it's good to choose what kind of partitioning and how many partitions do you need beforehand otherwise you're gonna need to recreate even have later on second of all when you start sending those messages they will be load balanced across those partitions there is no guarantee that the partitions will be utilized equally so you should expect that each partition will grow at the different right when looking at the scope of each partition each partition is ordered so if you look at partition for all events in that partition are ordered from the oldest to the newest in that partition is just like a queue but you should not expect that the order is maintained across partitions in fact it's not so if you need to process multiple events in order you need to introduce a partition key which will allow you to specify a partition key for the events that you upload making sure that they will land in the same partitions and since the partition is ordered they will be processed in order as well you also have something called namespace so if you have your even hub that's a single even hub representing a unique stream of data let's call it even hub a then if you need to process a second unique stream of data containing different data then you will create even hobby and a logical container to create multiple even hubs is called namespace this is pretty much what you're gonna create in our portal this is your scoping container that has moved to pusher properties like shared throughput share cost etc etc at this point this is the moment where we can actually go to other portal and create ourselves and England hub namespace so I'm going to add portal where I'm gonna press create resource and find even hub inside of our marketplace you're gonna find even halves but when you press on it it says even hubs but in fact what you're creating right now is even hub namespace when you hit create you're gonna actually see create namespace on the top so I'm gonna choose my either even hub tutorial resource group create a namespace give it a name I'm gonna call it AM demo i'm gonna choose location for north europe and the pricing tier in here you have two options basic and a standard basic gives you one consumer group and standard gives you 20 which stands for how many unique applications will be reading entire stream of the data we're gonna tackle that in a second but for now i'm gonna choose a standard pricing tier which allows me to use 20 consumer groups next you have throughput units and the fruitful unit is simply set a performance unit of your even hub so how many messages you can actually process using your even hubs you can choose from 1 to 20 and this basically indicates the performance for your event hub if you need more than 20 you can create a support request to increase this limit to 40 or you can create something called even have dedicated clusters if you need even more performance for our demo purposes i'm just gonna leave it at 1 you can hit review and create and that's pretty much it hit create and in just couple of minutes your service will be ready to be used after a minute or so our service has been deployed we can go to the resource and in here this is our even hub namespace it even says it on the top so that means this is our logical container for the even hubs and we can start creating even hubs and points in here you have couple of shared properties like shared access signature so your way of authorization and authentication to be in space you have scaling options so you can change those fruit put units one interesting option here is out to inflate if you enable this this is basically just another name for outer scaling so you can auto scale your throughput units and allow it to raise from 1 to how many depending on how many messages you are processing this is pretty much the most cost-effective option but if you need hard performance that dedicated performance then you're probably gonna choose throughput units and of course you have some additional options with Geo recovery networking and stuff so what we need to do right now is create entities which are those event hubs so as you saw the diagram we need to create this one even half which will be our unique stream of data that we're going to be using in this demo I'm gonna go to even hubs and create new event hub you need to provide a name for your even hub so I'm gonna call it my demo and next you need to choose partition count as you see as I said in the documentation it says the 2 is minimum you can clear clearly see that it's 1 and it goes up to 32 for this demand just gonna choose one partition because I don't have a heavy processing that we'll be using and you have message retention so how many days your messages will be kept in even half for processing because as I said this is a stream of data and multiple applications might want to read the data so you need to configure message retention and let know you're even hub for how many days should it keep all the data in there so if you have one day configured and the next thing comes in and you start sending data over 24 hours the old data will be removed periodically it's just a sliding window and you can go up to 7 days for the standard here for the basic tier is just one day let's leave it at one and let leave capture us off I'm going to talk about this feature in just a minute so I'm gonna hit create and my even have is ready to go as you see you can create more here if you need but that's pretty much it so when it comes to demo I want to show you how now can you grab this even hub and start outputting the data for this purpose I will create a dotnet application to do so let me go to visual studio code where I'm gonna initialize a new application written in dotnet core I will be copy pasting the code mostly so you don't have to know dotnet per se to understand the demo because you will be able to choose your own language when you start developing using even hubs so for the first part I need to initialize a project and I'm gonna create a new folder the first one will be called 0 1 send events and I'm gonna open that in a terminal and I'm gonna initialize new project first of all I'm gonna create a console project using dotnet new console and I'm gonna add new packages to this project this will be even have other messaging even hubs an azure messaging even halfs producer I'm adding producer because I'm sending events I'm producing all the events and then of course always it's good to gran dotnet restore to make sure that all the packages are downloaded once you do that you'll have initialized the project with a standard template for a dotnet console application as you see there is not much here so let's start adding things first of all we need to add usings using statement will allow us to use the packages like our messaging either even have messaging producer classes etc etc I can enable extension because it's prompting me and now we're good to go so the first thing we need to do is create a connection string and even happening properties so I'm gonna paste it here and as you see I created two private properties connection string and even have name here I'm just gonna copy pace my static main from the template because I don't want to focus on dotnet SDK specifics but I want to show you how powerful SDKs are and how easy it is to use them to upload events to even hub I'll go now through the code just briefly to explain you what is happening let's hide the console and see more code since we created the two properties already the first thing you need to do is create even hub producer client is your client for the even HAP's from there from there I'm creating a batch data packages I'm gonna be uploading multiple events into single bug and sending them as a single message to even hub in this case what I'm gonna do is use triad method and pass the event data class and encode my data using utf-8 this way I will send free events to the even hub after that it's just send us a method to send events to even hub and I'll put something on the screen now that our application is ready we need to grab the connection shrink and even hub name can go to Azure portal to grab the even that hub name from here this is my demo put it here as a property and then we need a connection string to grab a connection string there is a multiple ways you can actually go to shared access policy and either use a default policy that is created on entire even happening space or better create a specific shirt access policy further even have that you're working with so if you're gonna click here add you have options as you see managed to manage entire even hub sent to send events and listen to read the events so I'm gonna use send and call it my sander and once the policies created and can hit on it and grab connection string from here so just copy the clipboard and paste my connection string here notice that in the end it says entity puffs here to the specific entity hub that I created because I grabbed the connection string from the event hub itself the second place where you can get it is if you revert back to your even happening space you also have shared access policies here on the namespace so this is the shared access policies across multiple even hubs see if you want to create one connection string and upload data to multiple even hubs then you can create shared access policy in here if you want just specific then do it on the even hub itself notice one thing there's a rude manners shared access key here so this is the administrative key for entire even hub alright let's go to the code since we pasted it we can actually save it an open console to build and run our application to build application in dotnet just type dotnet build and let it build and I see I got error but it's quick pretty quick fix I forgot to remove this namespace for this demo because we're not using storage accounts in this case so just save it and build it again once it's built you can just type dotnet run to run the application and if your Frank works correctly you will see message of contents about of free events has been published that means we already published three events using batch to even have very quickly using dotnet SDK once this is done you can actually go back to our portal and start reviewing your metrics have two options to do so you can either go to the overview of your event hub and review the namespace wide metrics so those are the requests messages and throughput across and time namespace so you can review that here and if you switch to messages you're gonna find the three messages that we're sending women ago you can also review even hub specific metrics by going to even haps panel selecting your even hub and in here you're gonna get all the metrics filtered down to the specific even hab let me also show you how easy it is to send events to even help using Azure services light overload gaps so let's go and type logic ups open the panel hit add you can of course do the same by clicking here and create resource and searching market price for loaded ups let's select other even hub tutorial and call it a in demo send events choose the same region north Europe hit review and create and hit create once it's created you can go to the resource instantly a designer will open I will use when HTTP request is received to get the empty starting point for my logic up I don't need any inputs I'm just gonna close it and add a new step in this step I'm gonna use even hub connector as you see already have it on the reason but if you don't just type even hub to find the connector for the event hubs and then use action called send events it will prompt you to create a connection name so I'm gonna call mine come and select the even hub that was found in my other subscription in here it will ask me to provide a policy remember the policy is the shared access signature token and since I have only one right now it asks me to use the administrative one across entire even hub for the demo purposes I'm gonna use it but in production setting remember to manage those shared access signature separately so create one for each application that will be using it hit create once this is done a connection to even have be saved in our other research group here choose the my demo the even hub name and add a parameter so in which case you can provide a content a content will be a small message that we can send to even hub like hello world and maybe randomize name by using expression typing run from 1 to 10,000 so this will send hello world and a random number to our event hub can just save it run it and see results the result was successful we are able to send our event to event hub and the connector does the encoding automatically for us so we don't really have to do anything using logic gods to send events to even have is fairly easy although since the even hubs are designed for big scale I would not use logic ups because they might be quite expensive use maybe other functions instead I just wanted to show you how easily can you integrate now we can switch back to presentation to talk about next topic one thing to notice that when we created even hub namespace we really registered an fqdn called name of your name space dot service bus dot windows.net this is not a typo because even hobbies from the same service finally a service bus and this is the URL that you're gonna be referring to yourself is pass in a connection strings with dot covered we can go to the next section which is receiving messages there are a couple things when it comes to event consumers so the services that will be grabbing the information from the event hubs first of all there's a consumer group we briefly seen that when we are creating the namespace for the even hub we are able to choose from 1 to 20 depending on the pricing here and think of a consumer group like a unique view of even hub data what does it really mean it means that each consumer group can have a separate view of our entire even hub data they can read the entire hub data separately usually this means that each consumer group will be a separate application and consumers are processes within a consumer group that will be reading events of an even hub using nqp protocol ideally you're gonna have as many consumers as partitions to allow for very good scaling but it is still possible to have more consumers than partitions in which case each consumers will be reading from the same partition this is possible although not recommended because in that case you will need to handle the duplicate data processing yourself and another thing we need to learn here is an offset and offset is a position of an event within a partition so each even in a partition has an offset and a Dawson can be saved across move people partitions so each consumer will know where is it currently processing the data and process of saving the offset is called checkpointing and it's done on a client side so each client must save the offset that is currently processing on a client side if they want to continue creating this checkpointing mechanism allows you to have more reliability when using even HAP's because if the consumer dies if the process dies then you're gonna start from the very last offset that you finish processing this process is quite important when it comes to scalability because if your processing and your partitions have millions of events or even hundreds of million of events if your process dies you don't want to read all of those messages again so every now and then your checkpoint where you finished and save the process it's just saving state of your current processing and committing the changes summer remember this is done on a client side so if you're using SDK then you have that feature out of the box you just provide a connection string to either blob storage and your SDK will manage checkpointing for you you just need to color a method and as I said since the consumer group is just a view on the data an application a logical application you can have multiple consumer groups in which each consumer group will have its own view of the data saving its own offsets by managing their own checkpointing mechanism one thing to note here is that remember that even half contains data for several days from 1 to 7 so that each consumer needs to processes on data within that time period with that said we can move to the second demo about receiving data from even hubs so let's actually move back to portal but in the end we're gonna end up in Visual Studio again I'm gonna use dot as SDK to create receive events application let me quickly create a new folder for our new application ok there's no typo we can open this in terminal to be sure that we are in the right place now I'm gonna initialize my project by copy-pasting dotnet new console to create new console application and dotnet ad package for my asher messaging even hubs and again asher messaging even have processor so that we have all the packages needed to run the project for receiving data lastly run dotnet restore to be sure all the packages are in place now close this old program and open program cs4 our second application so in here first of all I'm gonna add using statements and notice that in a using statement there's either Stallard blob this is actually in this case required because as I said there's the process of checkpointing so each consumer group needs to save a check point whenever they're processing data and to save that check point we're gonna use SDK feature which will save that checkpoint on Azure blob storage so the first thing we need to do in order to use checkpointing remember to add a package for our blob storage by adding dotnet add package other storage blobs once you do it you can actually start using blob storage as the case as well but you won't have to because everything is done underneath the scenes for you by the even hubs sdk so for this demo you will need four properties in this case so you need properties connection string even hub name like previously but additionally blob storage connection string and container name where the check points will be saved again in this case and just copy pasting the code for my static main so that I can actually explain what does the SDK do and what does it provide an additional two methyls to process and handle errors in which case I'm just gonna save it close the terminal for a second so we get bit more real estate before we fill everything in let me show you what is happening here first of all you're creating a consumer for the consumer group notice that I grabbed the default consumer name group widget cases a dollar default if you go to our report all to your research group let's go to our report file select a research group either even have tutorial your even top name space go to even hubs my demo and this is the place where you can configure consumer groups for your even hub on the left hand side panel in consumer groups played you can open it to find a default consumer group every even hub always has a default consumer group and by stating this line here basically says I'm a default client for this event hub it's fine if you want you can create more consumer groups or not you can leave it at this second of all you create a blob container client this client is actually passed to the even processor client as a parameter which will use this client to save the check pointing information on a blob storage then you add two even handlers for processing the events this is the method below and for hunting errors during the processing and just simply run a weight processor start processing a sink in this demo I will close it Alysse of this will pretty much run this process for 10 seconds receive all the messages and it will stop processing one important thing to note what is hand happening here is that we have process even handler and this process even handler will grab the event from the even hub give it a message received event and the content of the event and decode the event data and then the important stuff here is update checkpoint passing so this call will make sure to update the blob storage with the current offset where we're finished processing as you see in this demo I will be updating checkpoint every single message that I will process but in real case scenario usually updating checkpoints is done in that time based intervals like maybe every 5 minutes but it still depends on the use case so you have to decide how you're gonna be updating checkpoints yourself and in case of errors we're just gonna throw some issue information on the screen if this works correctly we can open a console to type dot net build but before we do that remember we didn't fill out those settings so first of all let's grab connection string we can actually grab connection string and even have from our first example so let's grab this and paste it here all right so now we just need blob connection string and blob container information so let's go back to our portal to our research group and open storage account I created the storage account beforehand and now I'm gonna create a container I already have a container called demos I can reuse it maybe I'm just gonna call it check points I'm gonna copy that name and create container called check points which case in this case I'm gonna paste this check points as a blob container name so the last missing piece is the blob storage connection string so you can go to access keys and grab the connection string from here just copy to clipboard and paste that string here remember not to leave those connection strings in the production application but for the demo purposes it's fine if this works fine we can just run dotnet build and dotnet run as you see we've got an error and the reason for this error is because we reuse the connection string although the policy that we created was called my sender and it only had the properties that allow it to send messages not listen to it say quick fix for this is to create new shared access policy that allows us to listen to events to read the events as the name suggests in the error below so to fix that go back to our portal to your event hub my demo open shared access policies and remember we had my Center with only send claims so we need to call my receiver if listen which will allow it to read the messages of the even hub once you do it just please press my receiver and copy the connection string now you can actually use this connection string instead which will allow it to read the messages of an even hub so let's try this again rebuild the application and run it again and the result is here as you see we've got two events from the logic up hello world with two random numbers and three events from our dot and application the reason why we have two events from the logic up because I was running once again in a background to ensure everything was working properly but I didn't record this part so this is perfect we were able to receive all the events are synchronously using SDK from the even hubs fairly easy you can go back to portal and you can always review what is happening if you want in those metrics section but now we can actually go back to our presentation and talk about few last features of the event hubs so first of all what are the additional features that you get there's something called event capture so if you cannot process your events within one day or you might want to have a long-term retention of your events even have it delivers you a feature called event capture if you go to our portal 2 even hub the event capture can be found on the even hubs so remember to navigate to your event hub and the capture is on the features on the left-hand side blades when you go here it allows you to select this remember we were also able to select capture when we were creating even hub for the first time so when you select on as you see it asks you so what is the time interval that you're going to be creating dumps and what is the size window in case your size of the partition of the data grows up beyond some limit it will start saving date on to blob storage so this basically will grab the data from your event hub and save it permanently on other blob storage in this case I'm gonna select for the demo purposes one minute size windows something very small but it doesn't matter since the window it's either/or and you can select if you want do not omit empty files if no even secured this way if you don't send any events within the time period of this time window selected above then it would normally create an empty file so you can select to not do that capture provider is a sharor storage account or data like storage generation one select a container you can actually select from your subscription in this case I'm selecting my lazy even hub demo and a demo container select it notice that you can also select what is the pattern for naming folders in which case one blob there are no folders but it will look like so so you can change that pattern here and simply Save Changes now that you do it every single event that will go through even hub will be saved to ultra blob storage for long term retention or maybe other processing purposes like maybe some part processing using any other tool like data factory or maybe data breaks there's plenty of tools that can do that so after minute past you can go back the research group open storage account go to the containers and open demo content and if the time period already lasted you can actually see I am demo which is the name of our namespace my damn own the name of even hub and driving down by the date two year dumps in Avril format Avro is Apache format for data in very effective format for saving data just good to know that other data factory supports this format so it's fairly easy to start integrating this format with any other tool and the other system in order out of the box so we can now go back to our presentation what else do you get you also get auto scaling feature with outer inflate we were talking about this briefly so you can auto scale your throughput units to be able to periodically process more events and still get us most cost-effective solution as possible and lastly in case of geo disaster you can actually do recovery with to your application into another region so you can have highly redundant highly available applications running on even hubs in Azure now that we're done with our demo you can see that even hubs are fairly easy to integrate it's easy to send and receive messages out of it and also easy to integrate with other other services it's up to you to decide whenever your application fix a big data streaming scenario or not but for today that's it he that thumbs up leave a comment and subscribe if you want to see more and definitely see you next time [Music]
Info
Channel: Adam Marczak - Azure for Everyone
Views: 166,140
Rating: undefined out of 5
Keywords: Azure, Event, Events, Messaging, Integration, Event Hub, Hub, Service Bus, Big data, bigdata, stream, apache kafka, amqp
Id: Dc3P27BsK3E
Channel Id: undefined
Length: 32min 10sec (1930 seconds)
Published: Tue May 12 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.