Azure Synapse Analytics - Power BI Integration & Performance

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello and thanks for joining me again looking into synapse analytics in Azure and looking at all the different components and how they work together so today is an interesting one in that we're looking at power bi and I do have a slight apology to make could what we started I started talking about this picture so I was like here's all the different interesting bits of synapse analytics and one or two people rightly so said where's power bi why are you talking about / bi here and my response at the time was well it's not really part of synapse analytics it's more something that just can query its synapse is a source that power bi can use and I was wrong so heads up it's actually a lot more integrated than I first thought so we can do is hook up datasets and build reports and see those directly within the synapse workspace which is pretty cool so I'll take you through how that works what that looks like I'll compare and contrast it to data bricks over some similar data and we'll see how it all hangs together so let's take a look if you liked the video don't forget to Like and subscribe and feel free to leave some comments about other things you'd like us to talk about and we'll go from there okay right back over here to get rid of let's and let's go and talk about sign ups okay so this is all sign ups to do things set up and there's a few bits and pieces in here so firstly I did have a notebook just to do this so I've got two bits of parking sitting in my leg classic NYC taxi data I've got one which is a pure taxi data and I made a date I mention now if you were watching other videos last week from the spark series as living a dynamic partition pruning so I've got this date - held in two different ways one a spark a one a delta and we'll come to that later for now we're looking at the Parque Dana so in signups created that I've prayed to the database called NYC I've registered these two tables and around that script so under data we've got this so we've got a spark database we've got our two tables registered and we can see the column let's take two under there so that's why I'm using parkade currently is because Delta's coming but it's not in there yet so if I register a table in spark and it's app like a file then does of the full metadata capturing so I can see what's going on there and the reason I've done this because I want that invisible to me on the sequel site so I can go into the Bela I can create a new sequel script and I can say I want to use and I can see NYC in my list so that means it's replicated over so despite the fact that NYC is a spark database sequel on demand automatically create a replicated copy of any of the hive databases what we've got inside spot so I can go in here underneath select star from let's you deviate on date just for simplicity I need to be in my NYC database and that's not control enter that is five and it's case sensitive okay there we go so it's gone off finding that date table I've got some data I mean real oil simple we know that works we've seen that before so the question is how do we then get this into power bi so I've shown you before but you can just use the secret on-demand endpoint and hook that up to power bi news as its data source that's an interesting but that's not the cool thing so if we go over to manage in signups we've got linked services so I can Freight as if I was connecting a new data source to pull data in I can connect to power bi here and you know give you that hint nice and big at the top you've power behind this this connects to power bi so I've got that option I can also just do it down here and connect the power bi either way goes off and this is connecting to a workspace so this isn't just saying connect my entire power bi account this wants a specific workspace so let's go and create one quickly so got pal you up here this is just me logged in as me I'm gonna go crate workspace let's pull business so that's all we need to create that workspace that's now exists in power bi so I can go in here get back just to refresh that workspace list and I should have signups there we go okay so I now have a workspace give us a link servers name second call this service power bi sure and that's what I can do now when I was first trying to do this I was back in my other description and that's in a different tenant to my normal power bi account and that meant I didn't get any workspaces so this is looking in the same tenant so it takes Who I am logged in as currently browsing the LEAs your portal and it tries to say in the tenant in the directory I'm logged into with the user and currently logged in as what can I see in terms of power beyond so if you have power bi set up in a different tenant or is your you're not going to see anything here and this is not going to work so I don't know how easy that is to fix if you have that problem but that's kind of a known problem as you're and power bi don't play nice if you've got them in different tenants so you care about that for now it's the same thing it's all in the same place so I'm going to align that create that ok so that's now created me a a linked service and then if I go up to the develop tab I've got this I can now see power bi as one of the things I have so I've got sequel scripts notebooks and now power bi of things in my develop area so I can go and have a look in there I can see datasets and reports now there's nothing in there currently so you know we need to dislike somebody to build some datasets and tell power bi how to talk to all of this sign up stuff now the tongue thing is there this is really view of your data sets so you can't create new data sets directly within signups it's a little bit disappointing for me because what I would like I don't need the full deck modeling experience I've get that that's a separate thing I like it just to automatically register the data sets that it has it not the dead sets that are here why do I need to do a load of stuff to do it but either way I can click new power bi data set then says you're gonna need power guy desktop as a hint click start then it gives me a choice of two of my sequel databases so says what are you trying to expose to power bi with shortcut it will get you some other way there do you want to do it and I've got a sequel pool and I've got a ad hoc queries which is a database I created through sequel pool on demand I don't have the replicator 1 I don't have my NYC database so once you go back over into the sequel side so my actual sequel scripts can see that a fault spark workspace they can see the NYC replica tables and ad-hoc and secure warehouse secretary it's only letting me see super pools and super on-demand it's not letting me see the sequel the spark replicated tables which is okay I get it but it just means that have some manual step so what need to do so if we decided to create that so I talk queries cut next all this is doing is making a power bi dataset file so I can click that download it open it inside power bi desktop and start going but that's not really good for me because what I need is that replicated higher table I want to be able to quit a create a query that NYC so when I do it manually and then we'll come back to reports so we need some data before we can do anything so I stood the way we've done before we've got my sequel endpoint dive into power bi I want to do an edge or sequel database because that's essentially what the super on demand protocols using so support a device it's gonna ask me for server I can say connect to my NYC database they're gonna force it to go to that replicated database use direct query and they can tell them to do it's gonna try and connect make sure it can never get so I can see I've got my date table I got my taxi table some logos in and again we've seen this before this isn't the super interesting part so let's get that in when it's done it I'll making a relationship and then we like to see what we can do okay so in there uh no pick-up day goes to condom day that's fine needs to query the data again directquery it's a bit chatty it's going back and forwards checking the data making sure this is right okay so it's got my two columns click ok create a relationship and now I can actually go and build some reports so I could say I want my cut in a month and I want my fair amount Ringling together and then there we can go I've now got some kind of a ball I see it as columns and I'll take a moment to think about it so that's all local currently so this is my local client machine going off querying that sequel or on-demand endpoint of bringing data back to me can take a moment to do that that's okay but I need to publish this I need to get this the data model that really simple just to table data model and send it off to my power bi workspace before signup can see it I guess there we go so I've got a super quick query it lets gather Sam I want to publish it so this is gonna be my signups come on save that it'll ask me what workplace want to send a tooth what you need to send it to that new signups one select and then that'll deploy it system now taking that data model definitions taking the report definition can deploy them as two separate objects into power bi so when this is done I should be able to get power bi and see those objects so understand that two can go and say what's in here don't have anything currently there we go so I've now got those two objects I've just created locally I deployed a power bi again nothing special there that's the same way power bi always works you might go back over to sign it we have a look what's in here I can go in here refresh I can now see a report and I might click on my datasets I can actually see up there so power bi datasets is now aware that I have this sign-ups dataset so I can't change it I can't do anything with it but I do have this that what insane create a new report and a bit click that this is where it's actually doing things inside signups so I have the report designer built in inside the sign up studio I can go and I can do that same kind of thing I can say I want to know all my different months exactly the same way I want like fair amount you know so I can build that same report and that just works directly it is the same kind of web portal browser that we're using to build power bi reports baked in inside signups and now I don't have dashboards I don't have some of the funkier stuff I don't have the full dataset configuration it's a very cut down version it's kind of like the blue nice entry level surface layer of power behind but it's available within the signups workspace so I can go I need someone to create those data sources for me and then I can go in and work with it now you'll see that it has called it something quite funky so I've got a new report 861 now if I want to save that I've got this save as button so I can click that what I want to call that I want this call that a essay in a hit save actually goes and saves it at this point it saved it so it's kind of annoying that it has save as copy well it's not same as copy it so just actually saying it just to show you that example if I go and create a new report and again a new run can be generated one so if I go back to my power bi workspace and I might just have refresh that so in a second we'll see there we go so my a sa report has come through the other report won't come through so that report isn't saved it's not committed that's not going to sink through to my power bi workspace until I do that file save as essentially yes I want to actually make that something when worried when the reports are already exist we've got file save and we've got the build safe button over here and that just works that just says I can make my changes and actually apply it sync that back to the power bi workspace in this but a bit more nor so I don't know why we've got this tape as as a separate copy thing it just saves it that's how you can be it so it's not at all related to the publish buttons which is again a little bit weird so I can be creating new reports if I hit publish I get single scripts I don't get PowerBait my objects so kind of the integration it's it's in there but it doesn't it's not fully part of the whole earn signups experience okay all right so I've got a report that's bringing back some data that's going off sequel under man that's charging me each time I go and change it so every time I tweaked it and Tommy filter it and time we refresh it it's going back for Ewing that I'm bringing it back again and what do a quick comparison just say this is de bricassart so I have a spark cluster currently turned on I can go and query the data we can do that similar kind of thing and I wanted to show that quick comparison if this is how we do it in de tricks vs. is how we do it in power bi signups so I need to get the connection details from my spark cluster so over here I've got back in power bi desktop so I need to again create a data source which means I need to do it locally deploy it so I can say I want to get some data and then this is where I need aggressions not fantastic data but if I need to select spot and say I'm gonna connect to a spark server not I draw H the insight as to be elope will spark server gonna ask me how I want to connect that we're gonna be an HTTP connection and I need to get some details I need to tell our difference out to connect so if I go down here I can go into my JDBC options and then I can see some details about how to actually connect so I've got my server hostname and I've got this thing called a sequel path I need both of those exactly builds up for you over here it's gonna take all of that bring over here and say this is how you connect whoops gotta look at that con okay so it's got my workspace location it's got my head for net got my port I then don't need any of that stuff I need up until the sequel yes that HTTP path and then the customer ID I need to replace with zero no it's super funky I need it's like a dark art but I need to know how I build out this this URL right but then that's direct query and click ok and then that should go connect through now I've actually connected here before so it's remembered my credentials normally it lasts Punic username and password and your username has to be token and then I need to go and generate a data Brooks token so in that I've got my using anyone you've got to use the settings generate a new token you'd say there's gonna be a lifetime one and this is gonna be a it's gonna be an example gonna generate token and there we go that's that that's my password so my username is just the words token that is my password and then if I put that in that's gonna go through and work and delete that so you guys pong back in and connect as me and there we go so now I've got a spot set up and I got I've got my two different versions so I've got the one which is my park a connection I've got my what would you Delta so I bring it in on the Delta side so we can do the same thing so I'll have to essentially - can parents data models yes I want to combine data sources and then we can do a little comparison so well how does this actually change between the two um okay so that's gone in we should have a data model so I've now got Dayton taxi it's on my seeking demand side like del Delta Lake date and Delta Lake NYC or online data British side and I can do that same thing I can say a pick-up day joins two calendar day go through take the moment again this is querying my sparks over now and saying give me back some sample day to go and show it to me look at that second that goes does the same stuff so not evident days in my date table there we go okay so create that can validate our relationship and then we connect to do a comparison we can say well less this is running on a spark cluster with three nodes versus on a sequel on the mantel sir we don't scale I mean get an idea of how speed works between the two so I can go back when you do the same thing so I can say I would like a so - my month and I want my fair amount and then I want to see that just as a column trimmer and I can do it next to each other and again so that's a similar kind of thing of what we're doing so if you're saying do the same thing one working sequel on demand one working using Delta Lake youth bio fired data bricks and actually let's just create a quick slicer so I want a slicer that's going to be my my normal calendar month and then i separate slicer that's going to be my delta like Andaman and then I'm gonna save this and redeploy it so then we'll get this whole stuff reptilian sign-ups I'm gonna get kind of an idea about how that's working okay so that's fine that's happy save publisher still going to my sign-ups workspace in a.replace alone stuff not going to go and push it out think about it there we go okay quite so finally back in signups we should be able to see that now so that the same dataset has those things in it we've got our new reports so the report I entered was my strike yes anyone the signups one so should have both okay there we go okay so it's saying it can't see it so again that's sort of that thing when if you've deployed something to power bi hey usually you have to go and refresh how that datasets working so I've got my signup dataset that's what you to go into my connections [Music] I suddenly shifted my settings so this might actually have an error on my dental credentials so see it's actually working yeah so that's not actually how it connects this bark so it should be that token and I really need to generate that token over here Emily says settings this is a power bi to connect generate my token up and then go in and paste it in let's see if that works it seems happy and I want to save that right now and then let's see if we can go back to that report [Music] okay so that's going to go through again trying to direct Greek to refresh it so it got a little C little wheels going through refreshing my stuff and what's going to be interesting is if I try and filter them about the same time we can kind of see about how that refresh feeds working between the two okay so we've had one refresh seek one demand one's still thinking about it that may also need some updated things now okay cool okay so that has worked a little bit of refresh but if we just go and try and say what happens if we filter them on similar time so we've got grabbed one of my 2009 one so 2019 Oh to try and get down both sides so it's pretty no.2 in 2090 ot so slices only relate to the charts above them they don't they're not cross filtering and we should get a rough idea about the speed at last so actually that was pretty good in terms of getting them to go off and refresh so a spot posted with three nodes for working in about the same speed as the sequel on demand now I'm actually a little surprised by that so they're about right so the interesting thing is over on the data brick side if I just check out my cluster we can say what's the last thing that ran we can go and grab that job so I can see it was a sequel query that came in I can say what was my related sequel query and I got this whole thing again of this is doing a dynamic query so this is actually sub saying it's filtering down my date table dynamically doing a partition filter on my Delta table and I don't know how that is working on the on the other side I don't know how effective that is when it's working on the SPARC site so the interesting thing with the sequel on-demand is invite just to quickly do another refresh in a different corner if I just do another refresh then actually we should see we quickly get down to the monitor tab in signups into sequel requests we can see that there's a sequel request running so we can go and see there is something happening we can see the sequel query that has been passed from power bi into sequel on-demand now this is only available while the queries running and there's a note saying history's coming soon and we'll be able to actually see what's doing but I don't know current as to is that hitting the partition in sequel under mad so it could be one it's brute force it's scaling it out it's something back all of the 84 million Rose filtering it in memory but because sequel on-demand scales itself it's just throwing more grunt at it and so it's doing it and it's making it return in about the same speed as the more efficient data bricks query that's doing partition pruning or is it actually realizing that I'm joining based on something that is going to hit the partition key and so it's effectively partition pruning so it's reading less data and I don't know when I can't tell so that's going to be the interesting thing because if it's going the brute force route then it's reading more day to each time which means it's going to cost me more so every time I query this it's bringing back everything and it's costing me money what if I can get it so it's actually it's only bringing back the data that it cares about because sequel demand is charged based on throughput that is going to be cheaper now I don't currently have the Diagnostics to see that all I see is there is a query it ran and I'm trimming up my hit refresh it's gone because I can only see concur about currently executing queries so there's a few requests to get used to there's a little bit of I can't lift the lid and see what's going on under the hood I can't optimize it and make it fancier but at least what I can do is I can work with reports I can build new reports I can save reports I can deploy them to my hobby I workspace I can again I can kind of create a new data sense but it's essentially I can create a template that I download to then redeploy which is a little weird but still I'm a nasty but it is a lot more integrated than I thought it was again what I would like to see is anything for that got here listed as a data source any of my links herbs my storage accounts data sense that I recommend like that I've created and any databases that I have registered in the meta store could have if I'm linking to a workspace that workspace I'd like to be able to just go get data where from one of my many linked so naps data sources please but that's not currently how it works but either way it's a hell of a lot more plumbed in than I thought it was and it is kind of now it is actually true a first-party part of signups yeah as the developer begged into the studio I stopped how much we'll see people going oh actually I'm gonna use that rather than this other web-based browser that has way more functionality and I just need to open a different web page how much are people gonna use that how much people gonna see the the ease of having inside a single place compared to the loss of some of the functionality that you get inside the main power bi comm service I don't know and the other thing the age-old problem with power bi is and I think all these things in there but when I hit publish and when I do they're kind of the deploy things inside sign-ups that's not to do with power bi so power bi once again is kind of outside of the DevOps story now there's a lot of really really cool stuff they're put into power bi comm to do with dare ups having dev test deploy environments and actually pushing things through that story that's awesome that's not here in science so it's the is this kind of just having this real lightweight version it's giving it like a niche motive users for whom power beyond Thomas like too much there's too much there they just want to be able to be given that a handful of data sources and traits and reports in the same place that they're doing orchestration and Spock will see I'm interested to see how what people think of it and how people have actually so views this so far and how people think they're gonna use it compared to the power behind our concerts either way it is all it is plumbed in a lot more than I thought it was so good stuff alright if you liked the video don't forget to Like and subscribe and we'll put some more videos that you might be interested in on the side otherwise we will catch you next time and keep those suggestions about more things first look into coming alright okay
Info
Channel: Advancing Analytics
Views: 7,249
Rating: 5 out of 5
Keywords: power bi, data engineering, spark, databricks, azure synapse, synapse analytics
Id: ldN6D2lhNyA
Channel Id: undefined
Length: 24min 56sec (1496 seconds)
Published: Tue Jul 07 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.