Advanced Power BI: Solving the Hard Problems

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] [Music] you Audio here let's can hear me okay yep it's good okay alright so I hope I have what might be an interesting session for you guys we take a look at some advanced techniques from getting and working with data with power bi we have some exciting things coming with pragmatic works here soon that I think you guys will be really interested in we're to have some things like custom visualization classes we're going to be launching this week and we already have a streaming on-demand training class that we have and as Liz mentioned we had a three hour a power bi workshop that we did that was recorded and available for free for you on our website as well it's become pragmatic works calm you can find that today what I wanted to do is kind of piggyback off some of the things that we've done in the past where we've talked about how to do some basic things with power bi and talk today about some more advanced techniques for working with data and some more advanced techniques with just dealing with complex business scenarios and so what I'm going to do is I don't have a ton of slides intentionally because it's going to be primarily demos almost 95% demos today but I wanted to be able to show you some situations that you might run into and how to deal with them and solve them using power bi and these would be some common scenarios that using other tools might be very difficult to solve but power bi makes easy and so that's what I want to walk you through today if you're not familiar with Who I am my name is Devin I'm the training director here at pragmatic works so any kind of the formal training that we do usually goes to me in some way I am a Microsoft Data Platform and review that's just a fancy upgraded version of a sequel server and VP I guess and I've written several books not on power bi but mainly on a traditional bi you actually will find some power bi books but not a whole lot the reason being is because power bi changes so frequently I've kind of decided not to write a book not a more traditional book I'm looking at doing some digital writing here in the future but if anyone were to come out with a power bi book it would likely be out of date by the time I hit the shelves but I do have some more traditional bi and business intelligence books out there you could certainly look up I also won't run a local user group here in Jacksonville Florida that's where I'm from and we do all things from the power bi related to traditional bi to more data platform traditional sequel server DBA type topics and I also blog at a website called Devon Knight sequel com as I mentioned just a few moments ago I'm gonna have some big releases here this week liz is working with this with me on launching a new weekly blog series that will be on the pragmatic works blog as well as my own where we're gonna be releasing a basically a free class we're gonna have once a week a new video module for a custom visualizations with power bi course that we're going to be releasing so you should be able to see that coming up very soon probably tomorrow or Thursday all right so I like I said I don't have a ton of slides I'm gonna get straight into some demos so what I'd like to do to start us off is set up a scenario here and then talk you through how we can solve the scenario using power bi and the scenario that I have for you here first is going to be importing data from complex files and so let me show you an example of a file that I'm talking about that I'm identifying here's complex and this is probably a type of file that you've seen in the past let me close out my power point here for a moment there we go and the type of file that we're looking at here is one that has subtotals in it basically so what we're looking at just to give you some context the data we have a bunch of folks that work for us here's their different positions here's their age the starting date and their salary and then as part of the file there's a subtotal road that appears at that as almost like a header subtotal row on each section here so I'm seeing there's a subtotal row here for the city that they're from Edinburgh whether that's the subtotal row I can see a subtotal road for London I can see a subtotal row for New York a little bit further down and the problem is I don't want to import that that record for the city that they're from as its own row what I'd like to do instead is see an actual row that are I should say a column in here for city that I would add in here on the far right and what power bi fill that down and automatically give me the city for each of these folks that have different dev positions here for me so I can see Regional Director developer all these listed in here so hopefully that's clear on what I want to do I want to get rid of the subtotal row I don't care so much about the subtotal salary but what I would like to do instead is add in a city for a new column here so that way I can see what city each of these individuals are from alright so that's my scenario probably a very common scenario where you're doing with subtotal data inside your data sets so let's take a look at how we can solve this problem I'm gonna go ahead and close out of the file so it doesn't think I'm got it locked or holding it open for any reason and I'm gonna go ahead and open up power bi here okay alright so what I'm gonna do with this is very focus and a lot of what we're gonna do in this session is focused around dealing with the data import side of things so what we're gonna do is go ahead go ahead and start by pulling in some data using the get data section up in the top ribbon so you'll see up in the top ribbon underneath home section we can go to get data I know this is an advanced session so I will consider some of the stuff is knowledge that you likely already have for example give data that's where you're always going to get started and so I'm going to go ahead and select get data and I'm going to tell it that I want to pull in data from that excel file that we've selected just a few moments ago so I'll go ahead and select excel here alright so I'll select excel that will launch open a basically a search window here where I can go find that file and where it's located and I can see that I have my file with groupings example right here and I'll select file with groupings that'll launch open the navigator pane inside a power bi which allows me to really search through files or search through databases in this case it's showing me a excel file at this level here and then I can see all the spreadsheets or even Excel tables would appear below it in this case I only have one spreadsheet below it's called sheet 1 and I can see if I select sheet 1 a preview of it on the right hand side and that way I know pretty clearly what is inside this file ok so I can see it before I even select it I'm gonna go ahead and select sheet 1 here and rather than hitting the load on the bottom if you don't gone through one of our intro sessions before in the past if I liked the data as it is right now I would hit load and that would bring it into my data model but in this case I actually have some data manipulation I want to do to this first so before I hit load I'm gonna go ahead and select edit when I click on edit it's gonna launch open a query editor here and by the way just to give you some context of some of the things that's going around going on around me here you'll see it says the preview may be up to 24 days old basically that's telling you that that you have a preview of the data inside this window and it caches that preview meaning it's a subset of the data and if you want to refresh it you just simply hit the refresh button so that tells you that power bi does do some caching behind the scenes so that way whenever you come to the preview window it can load it a bit faster I'm gonna hit refresh to make sure I get the latest copy of that and the first thing I want to do also which I always recommend as a best practice is to go ahead and name the data sets so on the right hand side here you can see right now it's called sheet 1 let's go ahead and rename this I'll call this something like job salaries is fine alright alright so I've named this data set now really the tough part here is how do I work with the data that I have because you can see here that it looks like I have the city showing up as a subtotal but really I want it in here as a column in a city column in here so how we're going to work with this is we're gonna start by creating a new column and we're gonna create a new custom column where basically we take this value for the for the name where it has them what most of these are the name of the person it also has the name of the city here and I'm going to take this value out of the city and place it in its own column so to do that we're gonna write a little bit of an EM query there a small end expression to be able to copy out that value and bring it into its own column to write and create a new column based off of an end script you can either go to the advanced editor which is one way okay so maybe the slightly more complicated way to do it you can go to the advanced editor which I'm highlighting right here or if you want to do it a little bit easier you can come up to where it says add column there's an add column tab here up on the top and you can tell it that you want to add in a custom column now you'll notice there are a couple other different types of columns that you can do there are conditional columns this might be a new way you can solve this problem that I'm trying to do right now I'll show you that way here in a moment this is a new type of transform add column it's been placed in here you can also do duplicate column which is basically gonna take a column and duplicate it exactly as it appears or you'll see there's an add index column and an add index column essentially what that does is it's almost like an identity column if you've worked with sequel tables and you wanted to have an auto-generated value for a for a column that's what an identity column is if I selected to do an identity column from one that means it would start the first row would show up as one going to number two three four five all the way to however many number of rows I have if I do an identity column that starts with zero it's going to start with zero of course so let's start with zero one two three that's that's were thing and basically that index column is just a number that's auto-generated from one to however many rows that you have now the other thing here that we can do is a custom column this is where you can write a and M in expression that calculates out and determines what new count and what the new column is going to look like based off of an M expression that you write so we have a couple ways we can do this we can either do a conditional column which I'll show you here in a moment that's a new UI that's been designed for doing basically what we're trying to do or we can do a custom column so let me show you the hard way first and then I'll show you the new additions that have been added in recently here in a moment so I'm going to select add custom column and I'm gonna move this down a little bit so we can actually see the data in the background and I'll zoom in on this so you can see it as well so what we're going to do is we're to create a new column here called City so I'll name this city here and then we'll come down to where you see the formula section and we'll write out a little formula that actually brings back the city name right here that you see in row number one and we're only going to return back that value if let's say position is null so position is null then we know that must be one of the subtotal rows and we want to bring back the value of name so hopefully that makes sense logically here's what it looks like from a code perspective now if you knew tip the in query language it is case-sensitive so some of the things that you'll write in here you need to make sure that you do lowercase or uppercase wherever it's appropriate in this case if I want to do an if statement I would do a lowercase F okay and I would say if the position which is the columns you can see you listed here on the right hand side so I can say if the position I'll grab that just double-click on it if the position is no then return back the name column else no all right basically what I'm saying here is if the value of position what you see right here is null which it is when it comes to the subtotal rows then I want to return back the name column that's the value that I see right here if it's not meaning all these other rows that don't have a valid and don't have a null value that I want to show know where that where that occurs okay so let me show you what this looks like if I hit OK on this there you simple little if statement here notice the if statement actually reads very much like English so I can say if the position is null then bring back name else null so if I hit OK on this notice what happens it creates a brand new collar for me here on the right hand side and this brand new column does return back the City and nulls everywhere else okay and I'm gonna show you how we're gonna work with this here in just a moment but before I do that let me show you the other way that we could have done this as well instead of writing that if statement that I did a moment ago we could have also done a conditional column okay so a conditional column way of works is basically the same way as an if-then-else statement so if I select conditional column I can say if the position column equals and I can say null here then return back name so it does basically the same thing that we saw a few moments ago notice here you can add multiple rules so if I want to add another condition into this you could easily do that here as well I can also put an otherwise so this is like an else so if this is not true so if the value is not null then otherwise return back not okay if it is null then return back the name this is the same thing as what we did just a moment ago with the if then else statement it's just a nice little UI they wrapped around it and I'll call this column city too we should see the results have the exact same end result here so I'll hit OK and you can see they work exactly the same they've just created a nice little UI around that conditional logic that I did in the first example it's called a conditional column and it does like I said basically the same thing as an if-then-else statement I I don't really need both of these so I'm gonna get rid of the second one that I did and I can do that by going head and hitting the X on the applied step section to get rid of that second example that I did and so what I'll do with this now is I'll tell it that I want to basically the value for city and duplicate it all the way down where these nulls are until I have a value to replace it with so basically take in Bora her and duplicate it all the way down until I see London and duplicate London all the way down until I see New York and duplicate it all the way down and this is done through an example or through a feature called fill down there is a transform that's built into the query editor that allows you to do fill down which basically does exactly what I described to take a value duplicate it down until you have something to replace it with so if I wanted to do something like that I can come up to the transform tab or the transform ribbon up at the top and find that the option in here called fill you'll see if you look towards the middle of the screen here there's a bunch of transforms that you can select from including pivot on pivot rename detect data types and then the one that we want here is called move and if I hit the down arrow on move uhsorry not move ahead fill I can tell whether or not I want to do fill down or fill up fill down it means it's going to take the value here and duplicate it all the way down fill up means it's going to take the value on the bottom and duplicated going upwards in my case I want to take so that would be depending on where the subtotal Row is right if the subtotal Row is at the top which mine is then I want to do fill down if the subtotal row is at the bottom that I want to do fill up which you have an option for as well in my case I want to do fill down take the value of the city and duplicate it down until you have something to replace it with so I'll select fill down here and you can see immediately the value is duplicated down until it runs into the next potential value which is London and then it duplicates those values all the way down as well okay that's the fill down option really really neat transform built in to power bi to be able to solve these these sort of problems of dealing with subtotals all right and other problems for that matter there's a lot of reasons why you might use it all right so I have that now you've taken care of now I have a new column in here for City now if I want to actually get rid of that subtotal row I can now because for example I don't really need the subtotal salary because I can get that out of just the way I build my report if I wanted to so if I want to get rid of that subtotal road that I still have sitting here I can filter out the null position or age or start date and if I filter out the null position for example just by hitting the little filter optional amusing so you can make sure you see it if I hit the down arrow on position I can tell it that I'm gonna filter out all the null values just by unchecking the null value here you can also type it and once I uncheck null and hit OK you can see my subtotal rows are completely gone now that's not a really a bad thing at all because I can simply build out those so total rows again in my report in my reporting layer here in a few moments if I really wanted to keep that but in this case I basically got a file that was sent to me from a vendor that already had some sort of rows in it and I didn't want them so I got rid of them I now having the data sent that's looking pretty good I can see City is duplicated all the way down you'll notice even though I got rid of the row that had the subtotal in the name of the city in it it still retains the city in here for the following transforms later on so just because I got rid of a road that had Edinboro on it doesn't mean that it doesn't that I'm not able to keep it still in here because of the way that the query editor works it's very serial it moves in a in a way where it's not running things in parallel it's doing things all sequentially so it actually removes that row that had the city named after I already use it for the fill down example and after I already use it for the custom column example so it's okay to remove that row even though I'm using it here at an earlier point all right so I've got the data in a pretty good set here so I can go ahead and hit home and close and apply and I can use that data however I want to now I can build out a report if I wanted to I could even build out a report that basically duplicates what I had before if I did something like a matrix and did a matrix based off of the city okay so I can see all the cities let me make that text a little larger so you can actually see it okay there we go and then I can bring in anything else I wanted into that matrix so I can see something like the salary there's my subtotals and I saw earlier or if I wanted to start adding in things like the names and the positions I could get all that in here as well okay so very easy to do very easy to replicate what we had in the other report or I can start to build like more traditional dashboard type reporting items from this as well I can bring this into something like a tree map if I wanted to and now I'm able to visualize it in and maybe a way that makes more sense for a dashboards perspective okay so that's our first example our first example of more complex file you're trying to deal with importing that file in and then being able to apply some more advanced transforms it let's take a look at our next example so I have another slide here and they set the sample example up a little bit and this example what we're gonna be looking at is how you can basically loop over a set of webpages now you may have seen me show this example in some other context but I'm actually going to be doing a different scenario here what we're going to be looking at is a website that has in a plane crash data okay so don't forgive me for this somewhat more morbid data here it's really the business problem I want you to focus in on because if this website fit the business problem really well and so basically what we're going to be doing is we're going to be going to this website and what they have on this website is multiple years for each year that there are there are plane crashes that occur and all of the records for those plane crashes and rather than me having to open each one of those web pages I'd rather make it so that it the power bi loops through each of those web pages and brings me back all of the data at the same time so let me show you what I mean here I'm gonna open up a web site and you are welcome to go use this website later it's not mine but it's called plane crash infocomm again forgive me for the morbid data but it actually suits our business problem here that we're trying to solve very nicely and so what I want to do here is I want to take the data that they have on this website if I look underneath this databases section here there's a databases option I can see they have data from 1920 to 2016 here almost a hundred years worth of data and what I'd like to do is I want to get the data from all of those years at once I want to get all nice 96 years worth of data here and bring it in all at the same time if power bi can do that and here's how if I want to solve this problem I I need to do a couple things the first thing I want to do is go ahead and pick a year so it doesn't really matter which year I pick but I need to start by pulling a data from one year and if you've ever done any looping and other tools like SSIS have you worked with us Esaias you know how to do you might know how to do looping if you're done looping in any kind of source system or ETL system before in the past you always want to start by figuring out how to import the data from one file first and then worry about looping it later ok so in our case I'm going to pick the year let's say 1933 here and I can see in this it has an instance of every one of the records on this website so what I want to do is I want to get the data from this website that we're looking at right here and I want to be able to pull this information in and so what I'm going to do is I'm going to copy the URL that we have in the address bar at the very top and I'm going to go ahead and copy that out you can right-click and copy or I'll just do ctrl C and you'll notice inside of the URL itself it has kind of an entry point here for me where I could pass in the years that I want to be able to pull information back from so I want to be able to take the URL that I've now copied and be able to pass in new years into this and almost automate it so that it does it all on its own and when a new new year is added I want it to automatically add that information in as well all right so what I'll do is I'll copy the URL here I'm going to go back over to the power bi desktop application I'll even delete this just so there's no confusion here and so what I'd like to do is go ahead and import the data from this one year first and then we'll work on how we can import from multiple years next all right so to do this from one year I'm going to go up to the get data section here again inside the home ribbon and I'll select from web so I'll select the web option here I'll go ahead and paste in the URL so I'm just going to paste in the URL that we copied off of the web page a moment ago and hit OK all right so it's going to bring me back some information here it's going to bring me into the Navigator pane where I can see the website and the two objects that power bi was able to detect from that website one is called document which doesn't really have anything of value from a data perspective and then the second one here called table 0 actually has the data in it that we'd like to use you can see all the instances that are showing up on that website and I can take this information and go ahead and bring this into power bi by simply checking it off right here and then hitting edit to bring it into the query editor all right so I'll go ahead and check off table 0 and choose edit to bring this into the editor the power bi query editor all right so it's now brought this into the query editor here for me and now I can start to determine how do I want to manipulate this data what do I want to do with it and then my next step is going to be how do I loop over and bring every one of the years in here as opposed to just the one all right so a couple things that we can do to play around with this data if we want - you'll see there's this again I've predict me for the morbid data but it is interesting as far as being able to pull it in and pull it off a website here but I can see here for the last column that I have has multiple metrics in it it's just going to label to those fatalities but I'd like to actually break this into multiple metrics here if I could and so if I wanted to take the value that I have inside this last column and split them into multiple columns I could do that by right-clicking on the fatalities column and I can select that I want to do a split column based off of a delimiter okay so I can take this and say well I want to split this base on the forward slash that I have and then split it again based on the print open parentheses that I have and I can make multiple fields off of this by using the split column transform here all right so I'll select split column by delimiter and what I'd like to do is I want to split the column based off of a custom delimiter that we have in here because you'll notice in here there's not really a forward slash in here that I can use as a delimiter there's comma there's colon but if you have a custom delimiter that you want to do you can come down to where it says custom and I can tell it that the delimiter that I have here is a forward slash so I'll just select forward slash here and make sure that it's added as my custom delimiter you can also tell it how many occurrences of the delimiter are that you want to split off on in my case I'm it really doesn't matter what I choose here but I'm gonna choose the leftmost delimiter I'm not anticipating more than one forward slash appearing in here but just to be certain that I only capture one I'm going to say give me the leftmost delimiter if I had multiple slashes then I could say after each instance of that slash I want to create a new column basically in my case I'm going to say the leftmost a learner and hit OK ok so that now creates a split where I have two columns in here one for what I believe is going to be the error --air fatalities and then we'll go ahead and split this into another one and we'll split it into another one as well so let's do let's go ahead and split these into multiple columns before I rename them I'm gonna go split this next now the the the remainings of the column that we started off with again so if I want to split this one I'll right-click on it and go down to split column again by another millimeter but notice it's not going to be a forward slash this time it's going to be in open parentheses so I'll tell it why did my delimiter again and I want to split by a custom delimiter once again just like we did in the first case I'll split on a custom delimiter and I'm going to split this time on in open parenthesis so I'll select open parenthesis and I'll do it at the leftmost instance of that delimiter again just in case there's some other parenthesis that shows up in there I want to make sure I do it correctly all right so I'll select to split on an open parenthesis at the leftmost delimiter and hit OK and you can see now that I have three separate columns in here and we can go ahead and rename these now I know this one is Air fatalities I know this this one is how many people were on board the plane by the way this information is on the website that's how I know what they are so how many people were aboard the plane and then this last one is round fatalities okay again forgive me for the morbid data but it's it's a good interesting solution here all right now I still have an issue here if you look at the data you'll notice that power bi automatically converted the data types for me here it converted the data type here to a numeric value a number here a number here but this last one is still in here as a txt you can tell that by looking at the little indicator at the top left of the column header and the reason that is is because you can see there's still a character in here a closed parenthesis that is preventing it from automatically converting data types for me and I wouldn't be able to convert it as a data type manually either so what I need to do is I need to basically strip out the closed parenthesis on this last column to finish this out so if I wanted to do that I can right click on the ground fatalities and I can tell it that I want to remove a value and remove values here and basically I can tell it that I want to get rid of that closed parenthesis I don't need to do any more splitting here I've done enough splitting up the data I now want to go ahead and do a removing of a value so I'll say replace value and the value that I want to replace is a closed parenthesis and I don't really want to replace it with anything so I'll just leave that blank no need to replace it with something else just leave that blank and then go ahead and hit OK alright so that now gave me three separate columns you'll notice that this one still hasn't converted data types that could be that there's some character in here that's causing an issue that I don't see one here offhand but if I need to go ahead and convert the data type of that column to make sure it's in here as a whole number I can do that by changing the data type by simply hitting the little indicator here and making this show up as a whole number by the way if you don't see these little indicators next to the column names it has the data type that's because you might be running a slightly older version of the power bi desktop this the latest version that was released last actually earlier this month I believe gave you those little indicators that you're seeing here okay all right great so I now have all three of the metrics that I want this is looking good I am going to go ahead and rename this query so I'll call this it doesn't really matter I'll call this error events okay now my next step is to actually make this dynamic because right now we've got this data set set up but it there's nothing dynamic about it right it has one single year that is bringing back nothing more than that single year so if I want to make it dynamic and have it more parameterised then I need to go into the advanced editor to actually key in some parameters note there are some parameters that have been added in if you notice here underneath the manage section the last release of the power bi desktop also added in let's see where is it that's L right here manage parameters there's this new feature in here called parameters that is already built in it's like a UI design for dealing with parameters there's a little bit of a different context for this this is actually used for more we could use it in this context but what it's more designed for is creating templates out of your power bi solution so you can create a template that has in it that points to your data set actually you build visualizations on top of it and the context of using parameters is whenever someone goes and use that template I want them to be able to change the data sources very easily between production and dev or maybe this customer and customer a and customer B and I want them to be able to plug in which which data sources that they easily drop in and the parameters are a great way of doing that you can actually well if we have some time we'll talk about the context and those how those make more sense than what we're doing right now but for what we're doing now we're going to a little bit more of a manual process for creating these parameters or these these these ways of making the query dynamic to make the query dynamic what we're going to do is we're going to go into the advanced editor which you'll see right here so I'll select the advanced editor and then inside the advanced editor and again I recommend watching some of our intro sessions that we talked more about just how to get to this and what this is showing I believe I even show how to how to read this a little bit in one of our previous ones where I talked about M queries but what we're going to do in this one is I'm going to make this into a dynamic query so to do that what we're going to do is I'm going to come up to the beginning of the query here where it says let and what I'm going to do is give myself a little real estate above it and I'm going to add in a new variable basically okay so the way I'll do this is all name the variable you can call whatever you want you do it inside of parentheses and so I'll call this one year okay and then I'll do a close parentheses here and then do an equals greater than sign which is basically how you declare a variable here in the in query language okay and by the way stands for mashup it's a mashup language so I've declared a variable now I need to go tell it where do I want to use that variable there is probably a pretty clear area here that we want to use it you can see here in the URL itself that the 19:33 that's the hard-coded selection that we made earlier I want to replace that hard-coded year with the parameter that we just created up top here so to do that I'll go where it says 19:33 and I'm going to replace this 1933 by injecting a value into it now one thing I should point out to you you'll notice that there's double quotes around this URL here so notice there's double quotes that story that end here and begin here why am i bringing that up well I'm bringing that up because that tells you that the URL has to evaluate as a string so if I'm gonna inject a number in the middle of it I better make sure that it evaluates this text value at some point all right so here's how we can do this I'm gonna inject in the middle of this URL string two double quotes and I'm going to need to do it here for this 19:33 as well okay so there's two double quotes because we're injecting in the middle of a string and I'm gonna do two ampersand which is basically the way that you do concatenate in here and both Excel and power bi for that matter and I'm going to place in the middle of this our year variable so we had a year variable so I'll place that in here and I'll place that in here because the URL itself is expecting it to appear twice and then just to be certain that the year is brought in as a txt instead of a number I'm going to make sure that this gets converted to text and the way you can do that is there is a number dot to text function and in this case in significant need to have a capital T and capital T on to text and I'm going to make sure that it gets converted to text because the URL is txt and that means basically that my parameter needs to evaluate as text here as well so it looks more complicated than it really is but basically I added in what I have highlighted here twice I replace the hard-coded year with a number not to text of the year parameter that we've created and that will inject the year in the middle of this function that we're trying to do all right so we've now taken care of that we've placed and we've made this dynamic now if I hit done you'll notice that it converts this into a function and basically a function is something that's waiting for me to pass a value into it it's it's actually pretty similar to a sequel server function in some ways and that it's a value or it's a it's a a stored query that's waiting for you to feed values into it and so if I want to test this out and see if it's actually worth a if it's actually working I could hit invoke here and it would prompt me to type in a year and so I could type in a year let's say I typed in the year 2016 and hit OK I can see here that it's brought in just the occurrences the events that happened on in the year 2016 and so I can see exactly what the occurrence is where I can see all the details that we saw earlier also applied to it here as well now if you want to try that out again you can always hit X to delete that invoke step that we did just a moment ago so I can hit X over here on the applied step section and I can hit and book again and type in a different year if I wanted to so I could type in 2015 if I wanted to hit ok and you can see all the events that happened in 2016 a 15 all right now it would really our end goal our end goal is that we want to feed in all of the years that are on that website just a few years here and there I want to feed every single one of the years into this invoked step and so what I want to do to make that happen is make sure you hit X on the last milk step that I did so it should be waiting for you to pass values into it like you see here if it's not if you actually see data then it's not going to work you need to kill that last invoke step and have it waiting for you to pass values into it because what we're going to do is we're gonna feed in a list of values to it all right so to do that that's our next step our next step here is to pass a list of values into this function that we've created and the way I'm going to do that is by going back to the website okay so back on the website if you remember if I hit back this will take me to where I can see a list of all of the years that are on this website and so if I want to get that list of years and make it into like a table where I can pass these values from the table into my function I'm gonna copy the URL up at the top again okay so I'm just gonna copy this because I want to get this list of values that I see sitting right here so I'll copy the URL go back over to the power bi desktop and I'm going to create a new query that gives me the list of years that I just saw on the website so to do that you'll go up to home and you'll see from the home section here that I can create a new query where it says new source okay so I'll go to home and select new source and I'll pull in data from the web so select web and paste in my URL alright so this is actually not so different than what we did earlier the only difference is last time we pointed it to a year a specific year in this case we're pointing it to the part of the website that has the list of years because I want to get that list and then feed that list into the function that we created so I hit OK and this will give me a couple things in here you'll see there's a couple different data sets in here the the first one is actually the one we want this one actually has some other information in it that's not really relevant to what we care about so I'm going to select the first one in here that says accident database where I can see all of the data stored here and I'll check off that accident database file are really part of the website and hit OK and that will bring that into the query editor again ok where I can see all of the data that's come back from our data set the problem though that you see here is it looks like this data has now been brought in here as columns and really what I'd like to have is one column that has a list of all the years in it so first things first let's go ahead and get rid of this column 1 this column 1 is useless to us it just has this symbol in here we want to get rid of that you can right-click on that remove that column and then that leaves us with all the other columns that are in here ok and so what we want to do is we want to take this and basically unfit the results so that we can see all of the years in a single column and so what we can do is we can select all of the columns here like so and we can unpin it the results by right-clicking on any one of those columns I have selected I selected all the columns first by the way and then select unpin two columns so that it sticks all of those values into a single column now you'll notice in here that it has column two three four or five you can get rid of that we don't need that one so I'll right-click on that and remove that column as well and it leaves me there with just a list of years just like we want to see it so that we can feed it into that function we created in fact I'll go ahead and rename this I'm gonna rename this once a year it's that way we have a list of years ready to go all right now our next step is to take the values inside this year column and pass them into the function that we created and so to do that we're going to do something kind of similar them to what we did with the last example we have the file groupings and some unusual data in the file that we wanted to work with and that we're going to add in a new column but this time when we add on the new column we're gonna pass in the year values into our function so here's how you do that you're going to go up to the ribbon on the top and select add column and you're going to add in a custom column again so I'll select add custom column alright then we're going to call the function that we created in the previous step so remember we have all of the queries that we've created over here on the left hand side here's our job salary one that's the first example we did here's our air Events function that's the function we created a moment ago and you can tell it to function because the icon next to it and then here's the action of databases that's that excellent database that's the one that we're working on right now and so what I'd like to do is I want to call the function called air vents so you'll actually just simply type it there is no intelligence here unfortunately and I want to pass in each of the years that you see here into that air vents function and so to do that you'll do an open parentheses I'll go ahead and do my closed parentheses here as well and I'm going to basically pass in the year column into that function by doing it like this it's not so different than like a function you can see on T sequel or any other func query language for example so I'm going to call the air events function double click on the Year column that you'll see here on the right hand side and that'll add it in between my parentheses and so now what it's going to do is going to pass in every one of the years every one of them not just one every one of the years into that air events function and so I'll hit OK and you can see it'll take a few seconds actually then take very long at all but once it's done you'll see it appears here like a little nested result set and so it has the year and then the table that basically represents the value being passed in as a new row over here and so what you can do there's a couple things you can do with this is I can actually select on these and you can see if you select the cell you can see below the data that's being brought back and so this is I'm looking at 19 21 here if I click on 19 23 you can see the values being brought back here 19 20 you can select it by the way you'll notice here that 19 20 actually includes years prior to 1920 as well so just for your information that's ok because we're about to merge all of it together anyways but what we have in here is a data set that includes all of this data and now the really the easiest step here is next what we're going to do is we're going to tell it that we want to merge all that data together into a single data set rather than having to have it as nested views like this or nested tables we can have it all visible at the same time by coming up to this little custom column that you see here hit the ellipsis I shouldn't say ellipses it's actually an expand button hit the expand button and you can see here's all of the columns from our function that will be brought back so all our functions are all the columns we brought back here if I select this if I simply hit OK it'll bring back all those columns now I usually recommend also unchecking this option this option here says use original column name as prefix basically what that means is it's gonna call all the column names custom dots dates custom dot abhorred custom got brown fatalities so usually I recommend unchecking this unless you're worried about multiple columns having the same names you don't really need that a using original column name is prefix because our original column name is called custom and it's gonna prefix everything with custom dots date for example our custom dot location so I'm gonna uncheck that hit OK and you can see now it's brought back all of the data from every single one of our results and as I scroll down let me scroll down you'll see it actually changed look notice the first column here 1920 notice it changed in 1921 there's a change in 1923 and 24 and we can see all that data being brought back here pretty interesting stuff so I've got all this data in here I can rename this query let's call this error events complete hits close and apply underneath our home section here and now if I wanted to we could visualize this very easily so I could say I'd like to take this data once it gets all of it loaded in you'll see there were some errors in here now the errors by the way are really because of the the website itself the website has some funky characters occasionally where someone just had a typo so for example think back to that column I started splitting columns based on the delimiter some sometimes they actually fat-fingered it and did like a different delimiter by accident at one time so some of that data is not included if there's an error it just it you can actually go back in and deal with the error in some way in the query editor and so you can tell it that I want to replace those errors with something else so you have some flexibility there alright but I have this data now in here and so what I can do is I can build some kind of a visualization on this let's say for example that I want to build this out on a scatter chart now the one problem with doing this on a scatter chart right now is you'll notice that there's something going on with my data take a look at this with me if you will and you'll notice that many of the things that I thought were metrics aren't being brought across as metrics so I had the number of people that were aboard of playing the air fatalities and brown fatalities the reason I can tell you and I know that it's not interpreted those as metrics it's because it doesn't have this little Sigma symbol next to it like it does on year here I actually don't want it to be a metric it's not supposed to be a metric I'm never going to summarize years or average years what I will average or summarize is the number of air events that there were and the fatalities there were I might want to look at that information but I don't want to summarize years so there's a few things that we might want to fix here and so one of the ways we can do that is we can go back over to the query editor by the way to get back to the query editor you can click on the Edit queries here and I can select edit queries and you'll notice that if I go back to the air events complete that looks like it's a bit confused on the data types take a look at this if you look at this with me you'll notice that the data type indicator that we saw before has a question mark next to it instead of a numerical indicator like we saw in the past and again the reason why that's happening is because there might be some fat fingering of the delimiter that we had before and so in that case I get too got confused on the data types and so what I'm going to do is I'm going to make sure that it's clear on what the data types need to be for those three columns I'll select the three and then come up to the top here where it says data type and change it from an any data type which again that tells you that it's confused and that it just brought it back as in any data type and I'm gonna make it in here a whole number okay so it's converted it to a whole number you can see the indicator next to the column showing that it is a whole number here now alright so I'll hit close and apply once more okay and I'll bring those in it's gonna bring them in just like I did previously they were the same errors that it's all before it's gonna pop those errors out again and by the way if you want to see those errors you can go back to the query editor and just look straight at only the errors if you wanted to and so I can notice over here now in my field list that it is interpreting several of these as metrics now so it has the signal symbol here next to it indicates that it is an implicit measure and I can actually start building out admirations on top of it or it's actually going to aggregate by default so here's what I'm gonna do I'm gonna go ahead and check off aboard air fatalities ground fatalities I'm gonna look at all three of these metrics I'm gonna bring it into something like a let's do it as a scatter here or no let's do it as a line a line makes more sense for this and what I'd like to also bring in here is the year and I'm gonna bring the year in as the axes here all right so what we're looking at here is a data set that shows us each of the fate air fatalities by year and so what you'll notice here which is kind of interesting and in the data itself again this is this is air vents or air fatalities what we're seeing here is it looks like the peak where the most air fatalities were I'm sorry I'm actually you know here's the other thing I want to do let me bring in one other thing because this is this is what makes this really interesting is I'm going to bring in a count of events as well let's actually look at a count of events that are occurring and so I can see when I do that let's actually take out a few other things here let's take out the let's take these out for a moment all right so what we're looking at here is you'll notice that the black line here is indicated by the number of events okay so you kind of see and see where in some some events peaked and what you'll actually find is the most events if I had focused just in on the count you'll notice that the count of events is peaking in kind of the 1970s where they had most most crashes you also see things like the 1940s there was fewer fewer deaths but there were more event and that's because they were more single single passenger planes during war to where events like that would occur and there would be more accidents or crashes just in general so interesting to look at the data you'll notice here also if I bring back the other data elements that I hadn't here a moment ago not surprising is you guys might expect there was a pink peak in 2001 that's September 11th when there was quite a few fatalities both on ground and in air so the data is interesting it does doesn't allow you to look at trends over time where there's Peaks where things have occurred and you can see obvious events like 2011 - 2001 there excuse me September 11 2001 showing a major peak I said the key thing there though again I don't want to make you guys sad about the data said the key thing there was looking at how you can actually take multiple parts of a webpage and import multiple at a time alright so I think we have time for one more example and the last example that I want to do with you guys is creating a calendar table there's actually several methods of how you can do this how you can build out a calendar table so that way that if you don't already have a date table so that you you know you really want to analyze things by date you do have a day value but maybe you would like a day table so that way you can analyze things by year and by month and my quarter that sort of thing you can create your own date table and there's there's a couple methods for doing doing this one of the methods I actually have on my blog already and I'll share this with you all search this on the side and make sure I get this up here for you one of the methods is using power query and/or the query editor I should say and so what I'd like to show you is that method first and I'll show you one second method which allows you to use Dax inside of the data modeling section to do it as well so here's my blog that I have on creating a date dimension using the query editor section I give you a couple reasons why it's important to have a date table here I also have several references to other people that have created date tables like this Chris Webb Matt Mason Jeana as well there's there's some good ones out there if you'd like to take a look at how to create a day table and then I kind of have my own take on it as well here it's pretty well-documented you'll see I feel like it is anyways you'll see everywhere where there's the two forward slashes those are actually comments that tell you what each section of the code is doing and then I also haven't parameterised here as well so what I want to do for you guys to show you how to use this and then we'll kind of wrap up on this because I've run out of time and I want to have some time for questions here as well again if you search my blog it's Devon Knight sequel comm and if you just type in date dimension up at the top you'll find the script for this pretty easily alright so here's what we're gonna do I'm going to show you how to use this and again the purpose of it the idea of it and why you need a date table is because it makes it easier to analyze things by dates it also gives you the ability to add in time intelligence for your calculations that you might build as well laters if you want to analyze things like this year compared to last year you're gonna need something like a date table to be able to manage things like that to be able to build things like that so let's assume that I don't have a date table which in this case you can see here I actually do not have a date table in my fields list and so I want to add one I want to add in a date table and so to be able to create a date table you can take that script that I have in the blog and by the way that script I have in the blog is set up as a function so you can actually plug in the years that you want to be able to create the range of dates for and so what I'm going to do to be able to use that script is I'm gonna go up to the get data section here again and I'm gonna select the blank query because we're basically going to feed in a script that I've given you or that you might borrow from Chris whatever Matt Mason or Gina and so I'll hit the blank query option and so I can see here that the blank query option just gives me an empty query editor ready for me to plug in something is where it's waiting for me to plug something in and so what I'd like to do is rather than using the formula bar which is given here to you by default I'm gonna go to the advanced editor where I can plug in my query that I've already fully built out and so I'll select the advanced query and I'll get rid of all the stuff that's already in here and I'm gonna go ahead and plug in the query that I've written for you and if we have more time what I would normally do is actually go line by line through this with you and I might do that to some extent but I would actually write the whole thing out would you line by line if we had more time but what you're noticing in here is I have given it a couple variables that I passed in values here now one of the things that you'll notice that's interesting about how the query editor works within queries is you don't pass in dates like you would in the traditional sense so if you're used to doing dates like a new in sequel where you can have the month day year or day month year you don't do it that way here in the query editor with with M you actually use a date function here and then you tell it that you want to pass in a year a month and then the dates and so what I've told you here is that I wanna add a table that has a range of values from January 1st 2010 to December 31st 2016 and then what I'm doing basically here is I'm trying to get the number of days between that date range so what that's going to give me the number of days then doing something where it's taking that list as being able to convert that list to a list of dates and then there's a cool little bit further down here I start to add in things like add in a quarter column so I'm getting the number of quarters I added in a year here previously so I'm using some of the functions inside the M query language to be able to generate that and then adding in a week okay using the table that add that column function and then telling us that I want to add in a week number using the week number we come here function then I add in a month number add in the month name and then the week day of the week as well so things like Monday Tuesday Wednesday that sort of thing is added in here as well so this script is already available for you if I hit done it's going to create that for me automatically you can see here that it's got everything I really wanted in it it's got the date the year the quarter the month I started number of the week the month number here the year the month name and then the day of the week as well and so I can save this as my dates table or date dimension if you prefer and then if I hit close and load I now have that table that's referenced inside of my my data model once I hit close and apply and then I can use that over and over if I wanted to by the way just to show you real quickly if you steal that strip for me which I am completely fine when you're doing that that's why I put it on my blog inside that script I've made it a function so rather than having a hard-coded start date and end date what will happen when you use use this instead let me show you the difference between what I just plugged in here and what's on my blog when you plug this in from the blog you'll notice that when you hit done on it it's going to wait for you to invoke this function and when I hit invoke I'll be able to pass in a start date and an end date so maybe you want to have a different parameter range than me or a different date range to me you could put in here I want to start on one one let's say 2005 or maybe you want to go to even earlier than that you can and then I want to end on 12/31 2016 hit OK on that it'll generate the thing for me and it starts with that date range that I provided and again that's on my blog so go ahead and check that out copy that script use it make it your own alright so if I hit close and apply that now gives me I don't have two separate ones I just added in there one called date dam and one I forgot to rename it's called query one but it's got both of those now in there for me and I can use those however I want okay there's one other way that you can create date tables though so if you're kind of if you're new to the whole idea of doing data models and and really this idea of doing calculations there is some ways that you can create date tables from your data model as well so if you don't like using the script that I provided you can kind of create your own date's table as well using a calculated T able calculated tables are fairly new that came out roughly I think last November or maybe I mean it was last September I can't remember but calculated tables just work very similar to calculated columns or calculated measures and really very comparable to calculated columns and basically the idea is that you can create a table off of another table or you can create a net new table and that's what I'm going to show you how to do here next so if I wanted to create a date table that wasn't using the query editor like we did a moment ago I can do that by going to the data view is really one way to do it let's go in toward the date of view it's an easy way to see it and so if I go to the data of you here inside the power bi desktop and go up to where it says modeling you'll see there's three types of calculations you can do there's a calculated measure which is usually genuinely generally sees me on January generally used for doing things like measures you have calculated columns which is for things like maybe I want to concatenate two values together for example that will be a calculated column a use case and then new table is just creating a net new table based off of a Dax query that you write so Dax is the query language that you use in the data modeling side that is what you'll use for creating measures or columns or in this case tables now calculate a tables you can actually build and filter down based off of another table or in our case what we're going to do this time is actually build off of a function there's a calendar calendar function that you can use that actually builds helps you build a calendar table so I'm going to click the new table button here I'm going to call this my calendar demo okay that's that's what's going to be the name of the table here in a moment and then what I'll do is I'm going to tell now I want to use the calendar function okay you'll see there's also a calendar auto function which is pretty useful as well this returns a table with one column and the reason we'll see why I'm going to use the calendar table here in a moment because we can actually do some things that are a little bit different with the calendar function the mid fronts here below the calendar function is it'll return back a list of dates but I can give it a start and end date that's why I kind of like this one so I'm going to do a calendar function and I'm going to tell it that the start date is going to be 1 1 2015 just to give it a small date range here and then I'll do an end date of 12 31 2016 and I'll zoom in almost see you guys can see it this is what the function looks like it's a calendar function it's going to create a new column for me that has a list of dates between that range and then what I can do with that once I have that list of dates like you see here I can now build additional columns off of this column so I can now do new columns on top of this column and let's say for example that I want to return back to month I would do a month function on top of the date function on top of the date column and so now I've returned back to month and I can do that again and I can say well let's do another column and let's return back the Year this time and so you can bring back and build things on top of the functions that you've already are the in this case the column you've already created all right let me show you one more example and we'll wrap up and I'll see if we can answer at least a few questions here with the time we have remaining all right so I'm just gonna do basically the same thing return back the year and by the way you can also do things like month name there's a format function that allows you to return back the names of months opposed to just the month numbers but you can build it out quite a bit here if you like alright well is I think we only have a few moments left you have any questions that happen to be queued up already that I can write yeah I'm gonna my voice is going so I'm gonna let you sure sure look at them if you don't like sure no problem all right so let's see my first recession in this practice is creating especially if there's a question here that would be a good one okay here's one so this one can we read data from HTTP sites as well as HTTP yeah you can actually read data from secure sites as well you can even depending on what kind of security setup on the site if it has a login you can even prompt it to log in to a site as well let me show you what I mean by that briefly here if I wanted to pull in data from a site that required authentication there is if I go underneath to get data and select web you'll notice whenever I plugin depending on what type of site I provide it after I hit OK on this it might prompt you for authentication or there's actually a section where you can identify the authentication required as well so there's some stuff you can do there as well let's see what else sorry some of these are like multiple questions that are tied together so it's hard to connect the dots when they've been separated so here's question in the in the day table how do you handle unknown and future dates so Ted SSI questions so but he's probably referring to is oftentimes when you're doing and building out things like data warehouses you'll have an unknown record in there for dates to be able to handle those because you're gonna have in some cases things that you just don't know the data yet so maybe an employee got hired but they haven't been plugged into the system yet so that they haven't had all their information plugged in you don't know exactly what their hire data is yet all that information so if you're dealing with things like unknown dates you could certainly add in an additional record for that so if I was doing it in that and that scenario tab I would definitely do it the query editor method because in the query editor method then I can have I am additional rows added in using the query editor and the advanced editor with the Dax example I showed second you don't really have the flexibility to add an additional rows that easily since it really the query editor will be the best route there oh it's already 12 o'clock look at that I think that's probably good at least I mean what I'll do guys is if you have some of questions I was just wasn't able to to be able to get in yet I will answer that via my blog so kind of keep it I wrote a blog on the last session that we had I'll do one on this one as well where you'll be able to find the recording by the way the recording if you have questions on that is usually posted by the end of the week so you should expect that within the next few days and I think that's it I'll rattle is I'll go ahead wrap it now because I know you've got voice issues they're blending just for joining us if you have any questions feel free to plug them in the chat window and I'll answer them be in my blog and like I said the reporting will be available in the next couple days thank you guys for joining us Thanks [Music]
Info
Channel: Pragmatic Works
Views: 60,521
Rating: 4.95365 out of 5
Keywords: Power BI, Free, Training
Id: -rQhqKZM0OM
Channel Id: undefined
Length: 60min 26sec (3626 seconds)
Published: Tue Sep 11 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.