Power BI DirectQuery and SQL Databases: A Conversation with Bob Ward

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
there's 8383 rows 83 rows so 83 different queries yeah so you can have one or more query per visual and directly Wow and I'm here with Bob Ward who is a master at sequel server and a legend and we've got a couple tips for you about using direct query with power bi from a sequel perspective stay tuned [Music] if you're finding us for the first time be sure to hit that subscribe button to stay up to date with all the videos from both Patrick and myself and I am joined today with mr. Bob Ward Bob how you doing I'm doing great from you awesome hopefully you're hanging in there with the situations in our world today if you're watching this video in the future were locked down in the whole coronavirus stuff so this will be great looking at this like two to three years from now just a little history I've known Bob Ward since I think 2007 or 2008 there was in the support world bob was one of the technical leaders from a sequel support perspective at the time and he was he was my mentor kind of going through sequel support and taught me a lot of things and he is just his knowledge and wealth of experience on sequel servers amazing so Bob I don't know if you want to kind of where are you today in the world of sequel well first of all Adam thanks for having me on today a quick funny story for your listeners about me meeting Adam because when I first met him I did not know what I was in for because all I know is that some goof geek had a big Star Wars Lego thing on top of his desk and I remember telling my colleagues like who is this guy with a Star Wars Lego set and turns out Adam is not just an incredible talent for power bi and for technology but has been a close friend mine for many years I'm in the sequel engineering team now Adam I as Adam said we both met in the support organization but I work in sequel engineering I do work out of Texas so even before coronavirus hit I was working from home and working out in Texas and I'm an architect in the sequel team I spent a lot of time you know translating the message of what we're doing in the sequel server space whether it's sequel server 2019 Linux containers even now spending some time working in the azure Sutan space and then it's an architect giving great feedback to our team about what our customers are seeing and how we can improve our products going forward so it's a really fun role to be into this is twenty six and a half years now Adam for me to be in Microsoft all-in sequel server and quite frankly I'm still very passionate and love what I do love it love it yeah I know and when we were working together back in Las Colinas Texas I just hooked up a wealth of knowledge about sequel hanging around you guys and so it's been really fun knowing you and hanging out with you and just just live in life with you said no no we see each other well when we were traveling right you travel somewhere in the middle of Europe or something it seemed to have Vermont you know so but not fun to be here remote yeah yeah it was rare for us to see each other when we were in the u.s. we'd have to go oversized to see each other so yep amazing all right so so Bob we're gonna go through some things from a power bi perspective and I want to get your take with your sequel knowledge in terms of just what are your thoughts around these given areas to help people when they're trying to work with data and power bi to know helps equal out a little bit I want to get your initial reaction to this so I'm gonna show you should be seeing I've got a report up there it's a direct query report this is a beautiful report we Patrick and I worked very hard at this report it looks amazing this is what every executive would want to look at you know and it keeps going and there's lots of there's lots of information here whatnot so from the road from a direct query perspective what would you expect to see if I were to run a sequel profiler trace how many queries do you think are sitting there oh I can only imagine I think I think to produce all the aggregations and the details sub a gradations I would be surprised if there's at least just the top of the screen you know half a dozen queries at least run all right so I ran a trace and here's the there's 83 83 rows 83 rows so 83 different queries yeah so you can have one or more query per visual indirectly Wow Wow it's depending on you know - one thing one thing already Adam that I would say as a reaction at the sequel server person is that's way too chatty and what I mean by that is is that you know we'll always encourage developers and people using sequel the sequel language to you know queries in a store procedure or something so you're just sending one request to the server from your client and then you can read multiple results that's back that's fine but sending a single query is almost like fetching a row at a time or something crazy that writes yeah that's already inefficient in my opinion doing that yep so yeah so that that's kind of the basis of what we're gonna talk about that's the insight that I want is like coming from a sequel side of it like where like your advice on some of those things so so the flap topic matter right so if you're just running that as a power bi you know desktop app and sequel is running on your laptop and you were to run that you know who cares right but once you start putting that across a network somehow especially in the cloud that becomes just incredibly efficient I'm sure the first thought I had was if we're if we're going with direct query so direct query for those that don't know is when we are not importing that data into power bi we're leaving the data in the data source so this could be sequel server it could be another relational source like Oracle or some other database but in this case we'll use sequel whether that's an azure on pram or whatever and so when doing that typically some of my initial thoughts is we don't want to run this against an operational database like we just something that's actually in production and running because of like when I when I showed you earlier it was you said it was chatty and so what are your thoughts around that yeah there's no question there's great benefit even without direct query right to customers taking read-only workloads or running those perhaps against a warehouse or in the case of operational system a lot of people choose to use a replica so for example a mate imagine you have an operational data store sequel server we have high transaction volumes coming in and you set up an always-on available secondary replicas for repurposes so that would be an example of something that feels like an operational store where you don't have to put it into a warehouse format then a directory would be perfect for it but that's exactly one of the reasons why people use secondary replicas for that purpose it could be replication sequel server replication could have that setup Co subscriber so offload large reed workloads especially once it might be fairly unpredictable I have a feeling and directquery that might be the case you know it's probably your best angle for operational stores yeah so you know that's just some thoughts top like that I don't think oh yeah and that and that's easy enough to do from as your sequel database or even on Prem perspective I mean the capabilities that exist yeah that's what's so cool about Azure sequel as a brand is all that read replica type functionality that's available no matter what you slice and dice whether you sequel server Lennie's containers as your sequel database managed instance you have that replica type availability no matter which deployment page and so in that case it would just be I got a you have like a read-only connection string versus the primary connection string at that point right yeah as a developer or even say like with direct query you would be able to go in your connection tree and say look my intent is to do read only and then sequel would automatically direct you to wherever that read replica was so you would have to know the actual physical host where the replica exists nice and then for folks watching this one of the features that we have inside of power bi because we don't you don't have direct exposure to the connection string and so the connection string option for application intent is not readily available for you to do when you go to sample server under advanced there's a checkbox there it says use failover connection or something like that nature and that will add the application intent read-only to the connection string so just be aware of that be aware that it's not for someone that's maybe sequel savvy and not power bi savvy it's not that obvious so yes it does so and it was funny one of the the things I did early on was to prove because someone was challenging me that though that's not what it does and one of the things that Bob got me into when I was in support and I actually it's one of the things I really really miss for my support days is I went got a memory dump and I went and looked I said messes the connection string that's in memory by the time that we're doing it there's application intent so bam it is one of the things I've had from a discussion point with customers and whatnot is when they think about direct query and we talked about performance optimizations even some of the databases I go into and look at even though these are these are large enterprise organizations you know people have varying skills at sequel or even power bi when we start looking not only at the power bi data model but also when we start looking at sequel itself there are comments that I hear from other folks saying oh man this is just database 101 like you should be doing this what what would you consider as database 101 that you should be doing from a general performance perspective well one thing that I think we were just talking about kind of early on was the fact be efficient in submitting your sequel queries to the engine use things like stored procedures so if you've got a bunch of different queries to submit for a direct query that might populate a power bi dashboard you know trying to set up single specific queries for everything is not going to make a lot of sense doing this on your laptop's no big deal when you have to transmit this across some sort of network latency sensitive scenario like say the cloud you're gonna want to try to you know consolidate your queries as much as possible even even simple things like you taking advantage of the fact that sequels of relational language and using aggregations group by sums max as those kind of type things is something I've seen even miss from a database 101 perspective so just taking efficient use of the sequel server language and avoiding these chatty type behaviors where you're going after one row at a time for example things of that nature right that's one of them the second thing is it's just pure index and statistics I mean you know this is no different than looking up I mean gosh nobody does this anymore fourth an encyclopedia or a catalogue right look at that if he does it's the same concept right you've got a huge table you're going after if you can't index it it's gonna be it's gonna require sequel server of course to scan this massive amounts of data you know certainly one of those number one performance problems we see even just for sequel apps as people just not going to the process of putting any indexes on a table or using statistics that the query optimizer you know can take advantage of there's all sorts of tools to help you there though even as your sequel database we do automatic indexing in those scenarios where we think enactment index can be possible and then I think about maybe something like is that that makes sense for reporting and that would be range queries so indexes are great Sun Dex is only good for like single row fetching like I want to go to a specific ID key value I have or something but what about scenarios especially warehouse queries where you have to go huge ranges of data like date time ranges this is where columnstore index has become just extremely invaluable columnstore indexes don't require any application changes like in case of direct query atom you wouldn't be changing your queries or reports this isn't me put on the back end and it's perfect for those warehouse scenarios especially for large scans of range data probably one of the most underused features I've seen of sequel servers columnstore indexes and I think folks from a power bi perspective are gonna be if you're familiar with power bi in the way that its engine handles data you're gonna be familiar with some of the I would say the high-level concepts of columnstore because the verdict engine inside of power bi our analysis services uses that same type of technology we use the for two factor can actually use this within the sequel engine and yeah that's what we kind of got so back on the the database 101 in terms of the chattiness and you said limit the queries one of the things at least for power bi folks to be aware of is that each visual from a direct query perspective submits one or more queries and you don't necessarily have control over those queries so that query the the visual the way this works is the power bi visual will issue a DAC statement to the underlying AAS engine and then that a/s engine translates that to C sequel and says go get me the data because the data doesn't reside in analysis services so it's gotta go get that data for that specific visual and so in your case of you want to think about how many visuals you have on that report and limit you know the more you have the more chatty it's going to be so your way of limiting the chattiness is reduce the number of visuals on there there there are some things you can do in terms of submitting your own t sequel so if you have a more structured T sequel that's maybe more efficient than the auto-generated T sequel from an AAS perspective we all know auto-generated T sequel is the best a sequel but in case you may have your own then you know that's something you can you can supply as well so be aware of that another thing I'd maybe throw in on the just from an just understanding the data warehousing and AAS side of things is maybe avoid some of those wide tables so maybe smaller tables in terms of what we're fetching from the sequel side I don't know how that plays into how it goes and gets the data in terms of efficiency I know on the AAS side having some of those not so wide tables can help in terms of compression and things of that nature from the column store side of it it's a factor for sequel server as well one of the things about column store that's nice though is that you know we only go after the columns you specify so even if you have really really wide tables for columns per index base tables we don't have to go over the entire set for a huge wide sequel table except well you know affect how large the table is which larger tables gonna affect how efficiently go in and get certain rows and there so yeah you don't we you know all those kind of type discussions you just said about the width of a table can apply the sequel server as well and one other comment about the the nature of the visualizations you mentioned is my guess is these queries probably come in as ad-hoc queries into sequel server yes and one kind of faux pas for a sequel server set up is a large number of ad-hoc queries that don't get what's called parametrized so if you don't have control over that to say do a query to grammar ties these queries then there are options in sequel to help sequel server do that that'll reduce it the time it takes to compile these things it's called force parameterization nice that could actually make things a little more efficient if a large number these things are ad hoc coming into the engine the other thing I'll throw in there just as a warning and back me up if I'm off base here but if you happen to be going against your production transactional server and not like a read replica or some reporting database doing that type of option would that add additional overhead and potentially interfere with the production load well I would never recommend anybody just turn that on blindly so you know that kind of option doesn't even you know you as a direct for a user wouldn't even have authority to do that they'll be working with an administrator over a database or a sequel server to turn an option like that on and yes I would not do that unless you knew you had a really heavy ad-hoc workload it's well tested and perhaps it may not make sense to go against a operational store if that's what you have to do right yeah cool all right so going on so from so columnstore is an awesome feature yes and I know you're very passionate about that another thing that we do from a direct query side of things and we've had this from certain customer engagements that I've worked on is if we are hitting those base tables and or and or maybe we're hitting a view but that maybe we've got a bunch of joins on there because their databases is an operational database that's not in a star schema and so we got to have all these lines so one of the tips that we usually give to people in those situations is possibly like materializing that view and reducing the cost so that'll that'll just make that call go faster or something like that decks view are actually like generating a table off of the view data I see like creating your own table so we have this concept called index view which is very much like what you're talking about where you don't have to go to set the table but I've seen people do what you're talking about where they almost like create a denormalized version of that results like a cached version of their results right and then they refresh that no question yeah if you find yourself joining so many different tables because the database design is so normalized there's that great balance between using some sort of denormalized version at the table which could be like you said materialized view versus having to join so many tables together I found so many people struggle in those heavily normalized cases of trying to debug a problem with a huge set of joins against so many different tables do you have the right index is right is the right Jordan strategy being used those kind of like things I see it all the time yeah and so I also know one of the one of the key things I saw coming with a sure synapse analytics I don't know how much you're involved with that is that feature of those materialized views and baked into synapse right is portable so when they had about sequel server that's agnostic to you building the queries is the is in improvements in the engine so it's equals 2019 which is also an azure we added this capability in the query processor called intelligent query processing this could really benefit your direct query user the idea being is that you go to the latest compatibility of sequel server and we just add in all this functionality in the query processor that's based on problems you and I saw on support whether it's memory grant problems because of hashes and sorts and so forth or even turning on something called batch mode for row store which almost acts like a little column store in the next type query as part of the query so then one thing to keep in mind if you're using direct query like what sequel server version should I pick it could be that you should go to something like 19 or out of your database where you can take advantage some of that functionality and then all they need to do is just make sure they're on that latest compatibility mode or there are other switches or feature things that they've got there yeah they just yeah that's it's actually it's interesting it's really not a feature it's more just an enhancement to the query processor built into the engine you just turn on the compat level to light up you know you want to do that and there's even functionality in sequel server like query store for you to compare what was your query performance for queries coming in from direct query before and after that change so if you're worried about hey I'm on the latest compat level for you know risk aversion and I don't want to remove the latest compat or I'm worried about performance you could actually score to compare like what was your performance before and after nice the last thing I was gonna mention and then I'll throw it back to you Bob but just for completeness I do want to show for the folks watching this video just some of the things you can do in power bi to help with that chattiness so let me go ahead and switch over to my computer real quick if you're inside of power bi desktop if you go to file and then come down to options and settings and then options there is there are some settings down here if we come under a current file you'll see this query reduction option and then you'll see a couple of items here so one is to reduce the number of queries by disabling cross highlighting and filtering which is on by default for all visuals so the idea there is if I select a slice of a pie chart it's going to automatically tell all the other visuals to go get updated data based on that filter it would be off by default if you check the box and then you can independently say to turn on which visuals you want to cross highlight the other two options here for slicers and filters by default the way slicers work is if I make a selection on the slicer it's gonna again signal all of those visuals to go update so instead this puts an apply button so you can make your changes and then hit apply and say now go and do that so if you have like multiple select items or something of that nature this will allow you to control some of that chattiness back and forth and just make it a better experience from a consumer point of view and then the other thing to think about this goes along with some of the materialising views some of the things we talked about on the sequel side Patrick's got a whole series out there on the aggregations feature inside a power bi and so this is that mix of having both a higher grain summary table that's in the a s model and then when you want to go down to those detail levels then the engine automatically knows to kick over to direct query to go get those detail rows so that voids you from having to go to sequel for you know hopefully like 80 to 90% of your slicing and dicing but when you do need to go down to those detail levels that's when we go hit sequel and we know that we're gonna hit a higher lag on that than we would if we hit the aggregation table in memory wise so and all the link up above for that playlist that you can go check out if you're interested in aggregations by the way aggregations is not a premium feature so you can take advantage of it today all right so Bob any other thoughts that you have in terms of sequel usage and coming from a reporting standpoint anyways it comes to mind I think about power beyond just a user I'm not I'm not Adam sax that are Patrick right I mean I I just use fire about which I love that I love technology but I think about all the different data sources the power bi allows you to go after not just sequel server right so if if you want to kind of manage that more in the sequel server level there's a new functionality that they've expanded from sequel 16 now in sequel 18 called poly base and the idea being is that you would create external tables in sequel server that look like sequel tables but in what we're doing is we're redirecting queries from those tables to other data sources and we have built-in provider functionality for Oracle Teradata MongoDB other sequels her sequel database and what's really cool is bring your own ODBC driver so if I think about all the different data sources I can use in power bi you can set up sequel server to be a hub to go after all sorts of data sources like that but what you're doing is you're controlling access to those data sources through sequel through this external table concept where you set up the security rules you know think of it like a view where the data doesn't exist in sequel exists somewhere else but you can even join across these just like sequel tables use data classifications so if I thought about a an average power bi or neck even expert power bi user today looking at sequel server poly base in sequel 19 could be something that could be unbent advantageous to the to the average user that's amazing I know that we do run in with some customers where they want I have direct query and or they're trying to do import and mixing those data sources and sometimes some of the data sources they may be using may not be fully workable from a power bi perspective but check and see if they're available from a poly based perspective and maybe that's a that's a valid option for you and or to make it even crazy atom we even have Hadoop on us so if you you want to go out for Hadoop data that's not structured like sequel you can do it and if you really want to get crazy you can apply something called a big data cluster where we'll deploy Hadoop for you so yeah we're III got a wonder so I just add just a warning for everyone out there with all things test and you know your performance may vary when doing something like that so just be aware your performance already is gonna have an overhead from a direct query perspective I can only imagine if we throw Hadoop behind sequel as part of that that it would be blazing fast we put some scalability into it so you do need a testing center but we've put a lot of built in scale to the solution so that's good scanning petabytes of files we know how to filter and go after things in a very performant way awesome all right that may be a very well option for you to go use so that's that's exciting to hear about and and off links down in the description below that you can check out for all the things that we've talked about so that you can go read up more on those items we want to pass this off to you what are your thoughts about what we mentioned are you did you find something new about sequel that you didn't know about or do you have other tips and tricks that you want to share about using directquery go ahead and leave that down in the comments below and keep the conversation going if you like this video be sure to hit that big thumbs up button smash it if you so desire if you're new here be sure to hit that subscribe button to stay up to date with all the videos from both Patrick and myself and from both Patrick myself and Bob thank you so much for watching keep being awesome and we'll see you in the next video
Info
Channel: Guy in a Cube
Views: 27,360
Rating: 4.9616857 out of 5
Keywords: power bi, power bi dashboard, power bi desktop, power bi desktop directquery, power bi desktop tutorial, power bi direct query, power bi direct query performance, power bi for beginners, power bi reports, power bi training, power bi tutorial, power bi tutorial for beginners, azure synapse power bi, directquery in power bi, directquery in power bi desktop, azure sql database, azure synapse, sql server, sql server 2019
Id: lVXAkfiP1NE
Channel Id: undefined
Length: 25min 23sec (1523 seconds)
Published: Thu Apr 30 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.