Azure Stream Analytics Tutorial | Processing stream data with SQL

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

I'm always looking forward for some practical use of Azure services, thank you for your time in explaining and sharing this interesting function.

👍︎︎ 2 👤︎︎ u/Straight-Spring 📅︎︎ Jun 02 2020 🗫︎ replies
Captions
hey guys this is Adam if you are working with bi and data you probably are familiar with SQL query language today I'm going to show you how to use that query language to process and analyze streams of data this is a quick introduction to outer stream analytics stay tuned as always let's start with some key characteristics around stream analytics first of all you can analyze your stream of data using familiar SQL query language which is very good because SQL is one of the leading languages when it comes to data analysis by data engineers so it's very easy to find a skill set for people to start using stream analytics and with many integrations out-of-the-box connectors to other services it's easy for you to start using it and implemented in already existing infrastructures and in case SQL language is not enough for you you can extend it by building your own custom functions and to do that you can use JavaScript dotnet or machine learning services now we can actually start talking about how does it actually work so let's start with some basics first of all the thing you will need to create when starting with stream analytics is source and for the source you will need to define the title of the source that you're integrating fill in some common properties and use alias that you will use in your queries later on which for our example could be either even hub as our input type with an alias hub input once the input source is defined you need to define an output again you need to provide a type and an alias in our case this could be a synapse database with an alias synapse output once both the input and the output are defined you can start defining your SQL job query the square is the core of how stream analytics will process your data it will define how to pick the data what kind of transformation do you need to apply and how to output it later on and just like in standard SQL you need to grab your data from the source by using from statement and providing alias of your input source second of all you need performs transformation by using familiars select statement if you want everything you can just pass the star and lastly you need to output the data somewhere by using into statement and again using the alias of your output once the query is defined the only thing left for you to do is to start the job once the job is started the data will start flying in and automatically will be copied into the output of your stream analytics job in this case using select star means we're copying entire input stream into our output this example shows very nicely how easy it is to grab the data from even hub and stream it directly to synapse analytics and just like with normal SQL you can apply filtering so imagine our input data on our even hub follows this schema we have an ID importance and the user columns you can create a where statement and use that column inside of the SQL query in this case we're creating a condition so that only the data containing importance greater than 1 will be returned to our output and as soon as the data will start coming in it will get filtered on the fly into the output based on that condition and for certain analysis you will want to perform Windows Bay operations and to do that you can use group by statement just like in normal SQL and provide a tumbling window function which will define a width of your window in this case it's going to be every two seconds calculate the maximum importance of the incoming data remember says this is a group ID just like in normal SQL you need to provide an aggregate function when the data will start arriving every two seconds you will be grouped and the result will be calculated note that in this case it's always fixed interval of two seconds because this is how we defined our tumbling window with that said we can move to input and output whether the supported services in order that you can use using stream analytics when it comes to input we have three supported services its blob or our raw data like IOT hub and even hub stream analytics also allows you to use reference inputs in this case is a true blob storage or SQL database allowing you to pull some reference data an example of that kind of data could be some geographical data some dictionary data from your application in order to combine it with your stream data and perform additional analysis and there's a lot of services supported for the outputs you can use blobs data legs even hubs service buses and even power bi streaming data to power bi is one of the core use cases for other stream analytics because creating life dashboarding is really easy using the service just apply the query over your stream and you will see large dashboards in power bi in just couple of minutes this is part of the demo that I prepared for you today it should also be noted that shrim analytics allows you to use multiple inputs or multiple outputs within a single query so you can combine the data or output the data into multiple outputs using a single job and a single query it's up to you to design how you want to combine the data how you want to analyze it and which are the outputs that you want to support for your application today our end to end demo scenario that I will perform we'll start with an IOT device please note that you don't have to have IOT device to perform this yourself because we're gonna use a browser-based Raspberry Pi simulator in this case let's go to a browser and open a new tab and navigate to website other samples github io Raspberry Pi web simulator and the simulator here is fairly easy to use on the right-hand side you're gonna find a code for that simulator and the only thing you will need to do is to replace the connection string here and hit run to simulate the device now that we have simulator open we can go back and the next thing we'll need to create is a variety hub this is where we're gonna be sending telemetry from our IOT device and we're gonna use in IOT hub as an input for the stream analytics job let's go back to our browser and back to our portal select menu on the left hand side select create a resource and type IOT hub press Enter you will find a template from Microsoft click on it and hit create the only thing you need to do is create a new resource group in this case I will use Azure stream analytics intro I need to also select a region for me that will be north Europe and the name for my IOT hub will be AM demo IOT hub review and create and create when IOT hub is created select go to the resource and in this case I'm not going to spend too much time on the LT hub because it's not the topic of our demo today so just find on the left hand side IOT devices blade selected and in here you need to add a new device by selecting new on the top provided device ID in my case this will be my I o T device no other configuration is required here so select save when the device is created select refresh on the top to find the your newly created IOT device when you do that selected and in here the only thing you will require is to copy the primary connection string so let's just select copy and go back to your iot simulator and replace the connection string here with the one you just copied and once you do it select run to verify everything is working correctly as you see messages are being sent to your IOT hub from the raspberry pi simulator now that our demo setup is complete we can go back to our portal and start creating other services to analyze the IOT stream that we just created so our IOT hub is already set up and we have stream of events coming from the device now we need to create a stream analytics job again navigate back to portal select menu on the left hand side create a resource type in stream you should find stream analytics job select create give it a name so process IOT data select the subscription research group auto stream analytics intro select the location in my case it's north Europe and in here the most important choice is the streaming units as you see here by default it's selected free that means free servers will be processing my data at all times this directly impacts the billing so for the demo purposes always select one because we'll just need one server to process our data and remember when you see the pricing for stream analytics it's a simple calculation of how many streaming units have by the price of the streaming unit now select create creating stream analytics jobs is literally a couple of seconds because that job itself doesn't do anything yet we can use go to the resource button to quickly navigate to our service by default after creation the stream analytics jobs is not running you have start and stop buttons on the top if you want to control that but remember it's not running yet therefore you're not paying anything on the left-hand side the most important things is the top topology in this section you have inputs plate functions plate query and the outputs so once we created our job we can start defining inputs and the outputs but we are still missing an output in this case we're gonna use Azure blob storage to stream the data into blob storage and to do that let's navigate back to portal go to create a resource from our left hand side menu select storage account on the bottom choose the research group in this case outer stream analytics intro type a name for me that's gonna be utter demo storage for IOT ensure that the proper location is set in our case north Europe and everything else will be default so just hit create and review and create and the storage account is created go to the storage select containers and create output container where we're gonna be outputting our stream data select create and there's nothing else with that we need to do right now so let's go back to our presentation since we have our input and our output that thing we need to do right now is to define this as input and output in the stream analytics job and create a query now navigate back to our portal and that's open stream analytics job select the resource and stream analytics job that you created previously and now open inputs blade in this blade you need to select and press on added stream input choose the type of the input which in our case is IOT hub here you need to provide alias again as I said this will be our IOT input a panel here allows you to choose from other subscription and as you see it already detected that I have AM demo IOT hub created allows me to select messaging endpoint and it gives me a control over all the other default properties like the consumer groups partition keys format of the files but for now we're just gonna leave everything here as default once this is done select save with this couple of clicks you are able to add new input into your stream analytics job after doing that after saving it it will add it and also test connectivity for you to ensure that everything was set up properly and as you see we are able to achieve successful connection test next you need to add an output navigate to outputs plate select add it is very apparent that you have many more options for the outputs here in this example we're gonna use Azure blob storage to output our stream data selected provide alias again that will be blob output we can select from our auto subscription a container of output that I created moment ago was also detected we can use a pattern if we want which is very cool because you can just type date and it will use the date format here for the folder structure which is very good because there's gonna be a lot of a lot of files for streaming scenarios and you can change from JSON to overall or CSV for now leave everything else as default hit save and now new output will be added an accountant activity test will be performed after adding this connection test succeeded that means we can go back into the query section when you open a query blade you will see a small editor that allows you to run type and test your queries in here you can do everything you want when it comes to development of course everything besides creating custom functions but when it comes to creating queries you can do everything here by default there's not much here there's a standard SQL statement using input and output aliases which can be found on the left hand side with a small icon indicating that this input is currently used within query so we're using our IOT input as the top input notice that our Aaliyah's also disappeared from the list of the inputs on the left-hand side and we're grabbing all the data into our blob output and once this is done you can simply save the query and test it before we do that notice that on the bottom whenever you're selecting the inputs on the bottom you can actually review what is the sample data for this data set my recommendation here is always hit this refresh button here to get most up-to-date sample of your incoming data and in just few seconds live data will be pulled from your IOT hub and you will be able to see results now go to the test results tab in here when you say that and tests the query you can run it and see what are the results from your query in this example because we use select star we're grabbing all the data and streaming this into our blob now the only thing we need to do right now is go to overview blade and hit start on the job and hit start one or two minutes a drop will be started the status will be running and you can start reviewing your results the first thing always verify that your simulator is still running and sending the messages in our case as we see it's still running that means we can go back to other select our research group navigate to a storage account to our containers select an output container and in here you can see the Year month and current day and it opened this you will find a CSV as you see are already our CSV has some date on it and if you would just keep refreshing notice that you see the size grow every couple of seconds because our stream is currently running streaming data into this blob if you open that CSV hit on edit tab as you see this is our streaming data we have 113 rows but if we refresh again we'll have 115 because this is a streaming scenario and data will be coming continuously into this CSV if we go back to our example but else we can perform here right now is I will show you how to use the same query to output data into power bi in which case we need to add a new output for power bi and then we will need to define a new query which will grab average over time window and stream that data to power bi so let's go back to our portal to our research group open stream analytics job before doing that my advice is always open in new tab an open power bi to ensure that you have the license and that you have access in the portal and that you have already opened a session when you do that go back to your tab open outputs played add a new output but please note you cannot add or change outputs because the table is country running so what do you need to do is go back to overview and stop the job the job will be stopping for about a minute or two once the job is stopped successfully go back to outputs plate and now you can add a new output select power bi and the first thing you need to do is out rise and this is why i said you should be logged in into separate tab because if you are then you will not get any unexpected issues here as always first thing we need to provide is an alias this will be power bi output for me we can leave my workspace as a default one we need to provide a data set name so this is where our data will be stored in my case its IOT data and a table name in this case I'm gonna name this table temperature this is the name of the table that will be created within power bi that you can use for reporting hit save and usher will do the same stuff add the connection and test the connectivity for you so you don't have to worry about the details since our output is defined we can go back to query blade and prepare new query important to note here is that you are allowed to use multiple queries and those multiple queries can use multiple inputs and multiple outputs on the left-hand side you can already find your power bi output but you will notice that it doesn't have an icon because it's not currently used by the job now simply create new query type select don't use a snippet press escape and do an average of temperature as a new column called temperature and then type from in this case we are using our IOT input again we also need to output that data somewhere by using into in this case it would be our power bi output once this is done the next thing we need to do is to create a group by because this is an window base functions using average so we need to have a group by so at the end add the group by statement and create a group by by a tumbling window so use a tumbling window function and other parameters in this case it's a duration which is in seconds and the duration is two so every two seconds please calculate the average temperature from IOT input the very last thing that I like to do is I like to have the temperature but I also like to save what was the time that this event was generated in which case I will add a system timestamp which is get me the current date as a window end which will give me a column indicating what is the time that this event was generated you can test the query if you want by selecting this query and pressing test selected query on the top in just few seconds you can review results of your new query we were able to calculate average temperature and past the date now save the query go to overview and start the job you can start from the last stopped or started from analyzing from this point onward so let's do that and hit start after again two minutes our current job is running at this point we can review our results to do that go to power bi navigate to my workspace and in the datasets tab you should find your iot data streaming set here so what you want to do right now is create new dashboard hit create on the right hand side select dashboard give it a name I'm gonna call it demo hit create and inside of this bar dashboard you want to press add a tile on the top and add streaming data set here you need to select your IOT data set and select a card for the fields I'm gonna use a temperature field to display live data as you see it's already changing live because we are currently streaming data here if we want we can add another tile again select streaming data set our IOT hub and this time change from card to line chart in line chart we have this window and that I added a date and use as a values our temperature column and hit next and apply once this is done you can already see live dashboarding performed from our IOT device using stream analytics to analyze an output average data for this IOT device over the period of 2 seconds it's pretty easy to achieve that with as basic knowledge of SQL and a few clicks in Azure portal we can now go back to our presentation and talk about additional features that you get by using stream analytics for instance you get a lot of additional built-in functions you have functions for anomaly detection you have window functions geospatial functions etc etc most of the most basic things in SQL work similar for joints so you can join multiple data together and get some even more complex results additionally if you want there's a local development support from within Visual Studio and Visual Studio code they allow you to create those user-defined functions which is our another topic user-defined functions allows you to extend this sq by a custom functions written in JavaScript dotnet or machine learning studio or machine learning service and lastly there's also support for user-defined aggregates but this is only with JavaScript using Azure stream analytics is fairly easy you just need couple of clicks and you're integrating with multiple other services using familiar query language to process analyze streams of data now it's up to you to decide whenever this is a service for your application for today that's it hey that thumbs up leave a comment and subscribe if you want to see more and see you next time [Music]
Info
Channel: Adam Marczak - Azure for Everyone
Views: 28,029
Rating: undefined out of 5
Keywords: Azure, Stream, Stream Analytics, Analytics, SQL, Data, Power BI, Life Dashboarding
Id: NbGmyjgY0pU
Channel Id: undefined
Length: 24min 17sec (1457 seconds)
Published: Tue Jun 02 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.