Azure Synapse Analytics & Power BI for BANANAS SCALE!

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
yo what's up I'm Patrick I'm Josh and in this video we're gonna talk about the two big reasons from a power bi perspective that you should think about using as your synapse analytics stay tuned if you're filing this for the first time be sure to hit that subscribe button to stay up to date from videos from both Adam and myself stick around to the very end because I'm gonna talk about a banana scenario where we integrated power bi and synapse and got some amazing results so I want to introduce our very special guest today he's here to talk about as your synapse analytics it's Josh Kaplan a p.m. on the edge of synapse analytics team welcome Josh thank you for having me here really excited to talk to you about senesce for the first time we're gonna service out there let you build a data analytics solution going all the way from raw data to data visualizations while staying within the same tool that includes all your data warehousing capabilities they're big data capabilities like spark your orchestration of data movement from beginning to end and all your security and monitoring all in one place and that can easily work with power bi that then visualize and analyze that data alright so in this video we're going to talk about two big reasons Josh the first reason is scale and performance can you tell us a little bit about that success is designed to work at cloud scale that goes oh wait wait a minute wait a minute wait a minute drive instead of all this talking you think you can do a demo for us so instead of always talking let's do what let's head over to Josh's laptop yeah let's actually take a look so what you're seeing here is synapse to do this is that one tool at one place where you can do everything from raw data then data visualization right over the toilet takes me to my snaps workspace now Sam's workspace is very lightweight easy to create there's no cost to it just go into the azure portal and you'll see synapse workspaces available you just create it now once I'm in the workspace I need to get some data to work with now I have data that that's in an azure data light already so I don't have to go out and ingest and pull that data into my workspace I can actually go to that data like and connect to it directly and just start using that data so I'll come in the manager Josh yes Josh how much how much data is this this is little baby data or is this a big data this is the entire data set for the New York City Taxi Cab I know everyone's seen this in the past this is literally the entire data set not just yellow cab boobers and green cab this is billions and billions of rows so this is where that scale and performance just really hits home right where I started this is the date I need to work with that I need scaling before mice to work with it so I can literally come in here and say link and I can say late to data like Ken to Graham I did it is and just play my connection information and when this is done I'm just connected I have a move data to having computation but now I can go back over to my data hub here and I can see that we've now linked to that storage account I can go immediately from here Brett and browse that storage down now notice how do you include any passwords or anything it's all kind of single sign-on all pass through very easy to work together in one place but how these folders here called data I have my taxi data I haven't broken out by the different types of taxa date it's like a yellow cab and data here in the data links in files those files have to be partitioned by different years and different months so pick a pick a year can I'll pick them on I can see those files there I happen to be a parkade format this is a over 500 megabyte file has over 100 million rows in it and this is just one of those partition files without doing anything else I can literally start running sequel queries on immediately we have any provision any compute or anything like that we had just created a workspace we linked it to the storage account and now immediately writing full T sequel queries directly on top of that Parkay file that came back in about nine seconds there this is about five six hundred megabytes but 100 million rows parkade compresses very well I can I can I can query it just as if it was a a real database and this worked so are you telling me so you're telling me I don't need to do any ETL right I can just drop those files in and then starts query Annette data right there in the source and not do anything yeah what you just saw is with know each other literally just talking to the files directly over the data like I can even do more than just how select top 100 now I go over to my develop hub you're actually pretty broke some of these queries so I and since this is my workspace my saved queries are showing up here for me so I can reuse them I can actually come in here and do you know an aggregate in the group by and one thing I notice about this query the last one we started with one particular file here I'm actually wildcard in my path here so looking at every single year every single month every single day and we're doing a group by aggregate on here this is not small date at all this is one point six billion rows of data and what exactly done here is taking that query and distribute it across multiple different machines multiple different of CPUs are now working directly over the data like itself to go a bill to return that data so right there came right back again a full aggregate simple query here now across 1.6 billion rows are data with no ETL so so Josh why Tim just tell me why and one actually one question right that's it's fast there's a lot of data I'm guessing we could scale this out and make it even faster why is this book why is this so important for someone like me who just lives and breathes power bi well figure out all that data that gets just landed into the interlakes today and what you have to do to make use of it all that movement around all that cleanse and clean all the different engineers we have to get involved to go work with that data now you you can power bi he knows how to talk sequel now any tool that knows I won't even go and talk to this data directly on the lake with without all that time without all that movement now we might be looking at this career and say alright it's sequel but it still doesn't look as easy as me querying a table review what I'm thinking if I think if I think about the average user I look at this open row set and I go never using this I'm not gonna do this I'm not gonna touch it I'm assuming you can make this easier for me the the typical little Barbie I use it right yeah now you could paste this in and just start running the query against it but typically go in power query you'd see the list of views and then you'd be able just work with it graphically so we can hide all this complexity behind well views without actually having to create and move the real database and move the data yeah could kind of create a virtual database here come in over here again this is all just normal T sequel statements so I'll say create database entry first I will connect also create database and this will go create that virtual database if I come back to data tab here we needed NYC yellow and now my list of databases that are in my workspace I now see my C yellow looks like a normal super Davis so it's gonna be empty at the moment I'm going to create one view in here called yellow and it's basically it's a select star from that entire from that path that we used in the last query and this will essentially Stuber all that schema to go use the new database we created this will go then discover all the schema from all those different parking files or hundreds of files in there and now they're going expand views by refreshing expand views you'll see yellow down here the view we just created and you'll see schema actual schema and this is what you would see if you connect with any sequel client including power of Y you log it you go and you'd see these views or this view and there with its columns dead types and everything and you now clearly these views directly so in power of your eye you can literally just leverage this I can show you that so one thing this is I'm gonna be honest with you right I'm a power bi use it I'd do anything fast I needed to happen right just like that it's kind of slow so I'm just saying step back for one second and look at the back we have one point million rows are India and we're hearing that a lien any good a movement we're creating all 1.6 billion rows without any gate a movement and it's coming back in yeah 37 who was that 34 seconds right there 34 sick for all those things I would consider that pretty fast I can understand that if you're doing a direct query here in your interactive mode you know then 34 seconds could be a long time to wait per click now there's lots of techniques around speeding up their queries there's lots of techniques around you know best practices for directory and I know you guys have a bunch of videos on those I would give the same guidance on using synapse for direct query as I would any other data source and that as follows follow those best practices you need a solution that works really well all right Josh so we touched on we talked about scale we touched him on the performance but I'm guessing we're gonna head over to power bi now and you're gonna talk about more more performance and how tightly integrated as your synapse analytics is with power bi it's so tightly integrated that you can actually get start with power bi directly here in synapse to do so just as I lead to a storage account before to get my data I can actually link to a power bi workspace you go back to manage here per second and go back to link services it'll say new service and actually you never leave the power of the I work place before it comes up as the first thing here did you say connect the power bi well name it automatic it finds all my workspaces don't pass through authentication and you notice that's a theme throughout the product you don't have to keep being prompted to login and you know we enter credentials all you do is select the workspace is what say create again not copying anything not moving anything we were just linking the the workspaces together go back over to our develop tab here and that would see power bi is one of the artifacts in the list I can expand this I can see all the datasets I have already in that workspace I can expand to see all the reports that I have in that workspace as well too things like let's say we just created a new virtual database nighttime here called my you see yellow let's say I want to give you papi a dataset for that I can click new power bi dataset if you don't have power bi desktop already you can't go to uninstall from here but once you have desktop to start and it will show you all the databases you have in your workspace and you just pick the one you want to use here the I instead continue and this is what I give you what's called carbohydrate a source file and when you open up that power bi data source file is it already gonna be connected to that dataset it's going to yes it's gonna set you up connected to that data set if I'm just an analyst or I'm just a practitioner I'm just wanting someone that uses power bi I don't need to go through all these steps to connect as your synapse right it's just connecting to a sequel server at that point right any other single database yeah so it's it's nothing that I need to learn new it's nothing that that you know somebody's got to teach me I'm already connecting the sequel I've been connecting the sequel ever since Part B hours release and so now just synapse is just another sequel connection I don't need to learn all these steps so do anything like that it's just boom it just works yeah actually so I'm showing you the query Oh middle leg functionality but Samson is also the evolution of actually computer warehouse as well so you have that literally same functionality in there so you do move the data into sequel sequel storage then it is not only literally feel like the same thing it is literally the same thing let's connect see what I get to take advantage I get to take advantage of all the performance and scale that comes with you know the sequel data warehouse that we've been using for years now right and more and more and more truly an evolution so all I do is simply log in here connect and this will go like you said connect to a normal sequel to base so that same power query navigator comes up here we can see that that view we trade and we can click on this I can go ahead and say hey you can do further transformation of how great I want or just go ahead and load whole thing and I'll get that option that you would get with checking daddy's ticket beta base around do you want to do import or you want a direct query so always a question comes up which do I use here and when and Patrick pointed out that query across 36 sorry across 1.6 billion rows directly in the data life of no ETL movement 36 second now if you're gonna go direct query you may not want every single click to take 30 seconds what keep in mind that was no get it movement no optimizations we literally just created a view directly on top of the raw data sitting there in the data link there are four things you can do there's one guys but also all be all the capabilities of optimizing and tuning your data that book RBI and what what one really sequel data warehouse now as you I just have an analytics have you can use and leverage to make things faster and faster to your needs so if slice it's just database 101 right the same things you would do with your indexing and materialized views all those things right you still follow the same path all those things are still at your disposal as an when if you need them you can move all this stuff into single storage you can use things like materialized views to automatically optimize performance as queries come in - data gets updated the materialized views automate with the changes to basically aggregations but in the data warehouse you know through aggregations outside the datawarehouse here in our bi the same way you could do with any power bi directory reduce that so once you have those datasets created you can publish it to your synapse workspace you said continue up here and you can click on any of these you now click on the data set to actually go ahead and start building a new report here in in synapse to do and I'll took the one I've already done and you'll see it'll open up here in synapse to do in edit mode and I can go ahead and just start changing the report here do I was doing this in desktop we're in the power bi directly and then this is power bi so there's no copy of this report suppose I make a change and save it I go right back to my power bi workspace and see those changes immediately and any user who is consuming these reports and see those changes as well well that's nice man that's nice so it really is tightly integrated and those reports are still in power bi right so I still can access those reports from the PAR bi service if I wanted to right they are the report from the power bi service so we're literally opening up using the same power bi have embedded functions available to customers as well we're literally embedding those reports from your workplace right here in snaps - do you make the changes to them you say that you're literally changing the same report that is in the workspace how that works good all right think about this you're working with billions of rows of data and we start putting this project together and it was like layers right he's like an onion right we start peeling back these three layers the first the first layer was the aggregation so when you hit that aggregation was really really fast if you miss the ACK it will go back to synapse the synapse analytics the Azure synapse analytics where we have these materialized views then you would hit that quite as fast but still pretty fast and then if you didn't hit that it would go down to the detail the detail table where it could be slow but it still returned your results pretty quick that's not bananas I don't know what is same questions comments are you using Azure synapse today you know what to do let's continue the conversation we're in the comments below your first time visiting a guy in the cube channel hit that subscribe button you like my video big thumbs up as always from Josh Adam and myself thanks for watching see you in the next video
Info
Channel: Guy in a Cube
Views: 30,503
Rating: 4.9183674 out of 5
Keywords: azure synapse, azure synapse analytics, azure analytics, azure data lake storage, big data, big data analytics, data lake, data visualization, data warehouse, power bi, power bi big data, power bi big data performance, power bi desktop, power bi performance, sql data warehouse, sql server
Id: vOSzw40TohM
Channel Id: undefined
Length: 16min 56sec (1016 seconds)
Published: Wed Jun 03 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.