MSPTDA 03: Power Query Introduction: Importing & Transformation Data in Excel & Power BI Desktop

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[MUSIC PLAYING] Yes, this is MSPTDA number three. And in this video, we get to use Power Query for the first time. Look at what this amazing tool can do. Now this is going to be an introduction to all the things a Power Query can do. It can extract and import the data, clean and transform the data. It is an ETL master tool, that data warehousing term, extract, transform, and load. No other tool does it better than Power Query. And if that's not amazing enough, Power Query is both in Excel and Power BI. And there's one other thing that Power Query does. For Excel people, it can replace crazy array formulas that we used to have to do with a simple few clicks of Power Query. Now we want to take a look at the ambitious set of topics we're going to try and cover in this video, because I want to give you a glimpse of everything that Power Query can do. We're going to start by cleaning and transforming data in Excel. Then we'll look at the Power Query user interface and the M code behind the scenes. We'll see how to load the data to an Excel sheet, the data model, and as a connection. Then we'll see something so amazing. We can edit, delete, and add steps at any time for any one of our Power Query solutions. Then we'll do ETL extract, transform, and load data from Access into the Excel Power Pivot Data Model. Then we'll do ETL in the data from text files and dump it into the Power BI desktop data model. Then we'll look at the Power BI desktop Power Query user interface, which is slightly different than in Excel and the M code. And we'll finish it off with an amazing example where we'll replace a crazy array formula with a simple Power Query solution. Our first four examples here, we'll get to see on this sheet CT. Now here's our situation. We have a data set in Excel, description and amount. And look at this. They've combined product date and region all in a single cell with the delimiter space forward slash space. So we need to break it apart into three columns. Then use the three columns and the amount to make this report. Now before Power Query, we could do this a few different ways. We could actually use data text to columns. But the problem is, we have a delimiter that's more than one character. So with text to column, we'd have to do it in multiple steps. We also could do it with formulas. But look at all these crazy formulas we'd have to do. That's where cleaning and transforming with Power Query comes in. Now this would be the first time we've done Power Query. We're using Office 365, the latest version of Excel. So our get and transform, that's our Power Query. Now in order to use Power Query on Excel data, we have to first convert it to an Excel table. As we talked about last video, we want any solution we make to be dynamic. And actually, we'll test it. We'll convert this data set. And then later, we'll add new data and see that our report updates. All right, so I'm going to go up to insert table or use the keyboard control T and enter. I want to name it, so I go up to design, over to properties, table name, or use the keyboard, alt J, T, A. We're going to call this something smart, like start sales, and enter. There is our name. Now we can bring it into Power Query. Actually, I have my ribbon tabs collapsed. You can either right click and point to collapse, or the keyboard is Control F1. That's a toggle for your ribbon tab. All right, data. Down to get and transform. And because our data is from an Excel sheet, and it's an Excel table, we use the from table range button, click, to bring it into the Power Query editor. Now let's take a little tour. Over here on the right are query settings. That's the name. This would be the name of our query if we didn't change it, but that's the actual name from the Excel table. We'll come back and change it in just a second. There are two steps. These are two separate steps in the transformation that were automatically created when we clicked that from table button. Down here, it says preview downloaded at a particular time. Over here there's some information about columns and number of rows. There's our two columns that are imported and waiting for us to transform them on the left. If I click this query, later, we'll have lots of different queries we're working with. And this is where we see the different queries. I'm going to collapse that. We have data ribbon tabs just like in Excel, the home ribbon tab, some common cleaning and transformation features, the transform ribbon tab, awesome transformation features, the add column. We'll get to see a bunch of different ways to add columns. And then there's view. You can view the code, whether or not you want to see the formula bar. I'm going to go back over to home. And as we see, the f of x, that's the icon over in Excel for formula, there's our formula bar. Now we'll come back and talk about the formula bar later. But the first step for us is always going to be name the query smartly. So I'm going to name this query something like transformed sales table and enter. Now here's our two steps. If we click on source, we could actually look up to the formula bar and see that the case sensitive M code that is behind the scenes in Power Query used the function Excel dot current workbook. Now we'll come back later and talk about functions in the M code. I'm going to come over and click on change type. Up in the formula bar, you can see table dot transform column types. Now both of those steps were created automatically. And much of what we can do in Power Query, we can click on features or right click and do particular features and the steps and the M code will be created automatically. Now later, we'll come back and see how to edit, delete, add new steps to the applied steps. That's a very powerful feature in Power Query. And we'll talk more about M code. For this example, we're just going to use the built in features. Now one last thing about change type. In Power Query, we have to define the data type. If we don't, our calculations in Power Query and then later in an Excel spreadsheet or the data model, those calculations might not work unless we define the correct data type. Now this data type is fine. This is one, two, three. That's the icon for a whole number. Over here ABC, that's the icon for text. Now watch this. You can click, and it shows you the icon and gives you the description for the different data types. We will deal with data types later. Right now, we have our first two steps. Let's get busy transforming. I'm going to click on the description column. And we need to split by delimiter. So we go up to home. There is the split by column. We could also go over to transform, split by column. And for many of the common features, we can right click the column header, and there's a bunch of cleaning and transforming features. Let's point to split by column. There's the same two options over in text to column. But over here, when we select by delimiter, there's much more that we can do. Now what is our delimiter? I'm going to click the dropdown, point to custom, and here's where it is so much better than text to columns. Space forward slash space, three characters define our delimiter. Now we could split at the leftmost, the rightmost, or in our case, at each occurrence of the delimiter. Now these options are also not available over in text to columns. And watch this. We can click advanced. And look at that. We have the options which we don't over in Excel to split by columns, which is what we want. We want to split into one, two, three columns. Or we could split by rows. And we'll have great examples for this row one later in the class. All right, we have what we want. I'm going to click OK. And look at that. One, two, three. Over here, we can see two new steps. Split by column delimiter is the name. Change type is the name. And look at that. It changed the type. ABC, which is text. That means date. There's text, and there is whole number. Now our last step is to rename the columns. Now watch this. I'm going to rename it one of three ways. You could right click, rename. I'm never going to do it that way again. You could double click, escape. You can also use the F2 key. F2 puts it in please give me a new name mode. Now F2 key works in Windows Explorer and Access also. So this column is going to be called product. Now watch this. I'm going to hit enter. And to move to the next column, I'm not going to use my mouse. I'm going to use the arrow keys. Then with the column selected, I'm going to use F2 to rename it. This is called date, enter, arrow key, F2, region, and enter. And just like that, I've renamed the three columns. Now it added just a single step, even though we renamed three columns. There we have it. We have the desired columns and, very importantly, the correct data types for each one of our columns. Five steps were automatically created. We have a good name for this query. So now we want to see how to load. We go over to home. Over here in Excel, Power Query is called close and load. Over in Power BI Desktop, it'll be called close and apply. We have more options of where we can load it over here in Excel. I want the second option, close and load two. This dialog box will be very important. It's called import data. Now in earlier versions, it was called load two. It had the same options though. And look at this. We can load it to a table, which is sometimes exactly what we want, especially in Excel when our data is coming externally. We could load it directly to a pivot table report, which is what we're going to do in the end here, a pivot chart. Only create a connection. And then finally, we can totally dump whatever we transform and clean into the data model. Now I'm going to start off by selecting only create a connection. When I click OK, you'll see over here on the right queries and connection pane. We have one query. That pane can be opened and closed in the queries and connections group by selecting queries and connection. So later, when you come back to a workbook and you need to edit queries, you click on this. Now it says connection only. And there are some times where we do not want to dump the data anywhere. Most of the time, connection only is when we're transforming data, and it becomes part of a different table where we load that final table to some location. Now this isn't the location we want to load to. So how do we edit that? Right click the query and load to. Now this is really important. Because later, when you load something to the data model, and you try to delete it over in Power Pivot, it doesn't work. You always have to come back to Power Query to either edit, which we'll do in a second, or change the load to location. I'm going to click load to. I want to now dump it in the data model and just notice that it ends up in the data model. Only create a connection, because that's the Power Query part of it. But now, we're going to load it to the data model. So I click OK. It says 365 rows loaded. Now it doesn't tell you where it's loaded, which I wish it did. But we put it into the data model. Now Power Pivot ribbon tab, we could click data model. I like over in data, since I'm here so often, there it is, manage data model. Now the keyboard to get to data, if I hit the alt key, notice A gets me to data. And when I hit A, then I see, oh, that's an easy keyboard to remember. So even though I may be over in the Power Pivot ribbon tab or somewhere else, when I want to see the data model and I want to use a keyboard, I use alt A, D, M. Now this is the Power Pivot for Excel window. Just like Power Query, it opens in a completely separate window. We'll come back and talk about Power Pivot later. But I just want to notice that, yes, in fact, the table got dumped in the data model. We could even go over to diagram view, which is like relationships view over in Power BI desktop. If we tried to delete this table, it is trying to be polite. And we delete this from the model, and it says, uh-uh, this is connected to a query, so we have to go over there to do it. Now I'm going to close Power Pivot window. Come over, right click load to, and I want to uncheck this. Now I'm going to click table, and I want to load it to a new worksheet. We're just looking at the different places we can load. Now when I uncheck that, it actually will delete it from the data model. When I click OK, it's being polite. It's telling us that this is in the data model and this might really wreck things if we do this. But when I click OK. Now notice over here, the message is the same. 365 rows loaded. It doesn't tell us that this is to a sheet or the data model. I see the new sheet. If I use my keyboard, alt, A, D, M, to open up the data model, I can sure enough see that it has been deleted. I'm going to close this. Now I want to delete this, because this is not where I want it. Oftentimes, if you're getting data externally and you need a proper data set, this is exactly what you want. But here, we don't want this. I'm going to right click delete the sheet. Click delete. Notice it transformed to connection only. Now I'm going to right click. And before we load it to pivot table cache, I want to go back and edit. Now we can point to edit. And when I do that, it opens up our Power Query window. I'm going to close it. You can also double click. This allows us to edit the query. We might want to do things like we forgot to add a data type, and we need to add one. We might be adding some other columns or many other reasons. Now what I'd like to do is delete all of the steps from change type after, and then go through a slightly different process where we see how to edit applied steps and some of the M code. Now watch this. We can right click a particular step, and say delete until end. This will delete all of the steps from this point forward. So when I click delete end, and then delete, those steps are removed. Now I want to show you something else that's very important. Let's say I just did a bunch of changes and edits, and they're totally wrong, no problem. Come up and click the red X. It'll be polite. It says, do you want to keep the changes? This will always come up. So you're free to come back here and play around. If you want to revert back to the original, click discard. Now when I double click to open this, I'm back to my starting point. Now that really was what I wanted, so I'm going to right click, delete until end. Delete. Now we deleted that change type. Because remember automatically, it changed the type here. But we had a another step later where we actually had to change the types. So instead of having two separate steps, we'll consolidate all of the change types after we split the columns. Now right click description, split column by delimiter. I'm going to select custom, space, forward slash, space, click OK. I have my one, two, three columns. It added two steps, split column by delimiter and change type. Now we still need to rename these columns. But guess what? Somehow in the code already, the columns were given the name description one, description two, and so on. If I go back to split column by delimiter, come up to the formula bar, click the down arrow to expose the formula bar, I see the table dot split column function. And as I'm looking through here, I can probably guess that this last argument is the one that named those columns. Now if you want more information about particular functions, you can simply go search Google. And sure enough, if we look through the arguments, that fourth argument, it looks like these two were left out, but this fourth argument allows us to name the columns. So I'm actually going to click on change type. Delete that. Actually, let's leave that step there. Let's come over here. We're going to highlight up in the formula bar. We're editing the Power Query function, which is editing the M code. And I'm going to call this one, and then the second column should be called date. And the third column should be called region. And so when I hit enter, I've edited this function up in the formula bar, and thus saving a step. If the Power Query solution for this transformation step was naming the column anyway, we might as well go up and edit it. Now if I click back on change type, there's an error. And it's fairly polite. Most of the time, you can figure out what's causing the error from this yellow message. Yes, when we change the column name, this step was using description one, description two, and so on. Now we could edit it here. But forget that. I'm going to click the red X. The moral of the story here is there are some times where you can edit a previous step and logically it has no effect on the remaining steps. In that case, it's fine. But in this situation, the column names were used in subsequent steps, so we had a problem. I'm going to click the red X. ABC text is fine. This is definitely a date column. Notice it's aligned to the left. Click the icon for data type. Decimal number, that's like an Excel number with 15 digits. Currency with four decimal. Whole number, percentage, date and time together, just date, just time, date time time zone. Duration, that's an amount of time. Text, Boolean value, true, false. We'll see what a binary is later in this video. Also in other videos later in the class, we'll use using locale. That's when we get dates from a different country. We're going to select date data type. And there's our data type. We can see it's aligned to the right. Now we want to come up, and I'm going to select currency for this, because this is a dollar amount. Now look at that. We have one, two, three steps. Now when we're going to reload this, I would like to click the dropdown and choose this second option to load it to a pivot table. But once we've loaded it, and we've come back here to edit it, we simply click close and load button. It loads to wherever it was previously loaded. Now we can right click load to, and I want to load this into the pivot table cache. Our data is sitting in an Excel spreadsheet. So just like we would open up pivot table, and it would take it and put it in the pivot table cache, that's what it's going to do. But it will be the transformed proper data set. Now I want this on a new sheet, so I click new sheet. Click OK. Now look at that. It's loaded, but it doesn't tell me where it's loaded to. There's our new pivot table. There's the new sheet. I'm going to double click and call this PT and enter. Now this is a normal pivot table cache. That's a location inside of Excel that stores the data for a pivot table. Because it's a standard pivot table, I'm going to use the grouping feature. When we get over to the data model, we do not want to use the grouping feature. It causes all sorts of trouble. But here, this is a standard pivot table. So I'm going to click and drag down to rows. Actually, watch this. I'm going drag the pivot table field list over here. Now months, I want, but I do not want date. I'm going to drag product down below rows. Region over to columns. Amount down to values. Expand, expand. This is a standard pivot table, not a data model pivot table. So we actually have to add manually number formatting every time we make a calculation. Right click and the number formatting option. I'm going to select something like currency, click OK. All right, so that's pretty amazing. We were allowed to clean the data and then make a pivot table report here in Excel. Now if we go back to CT, that's a data set. And for some reason, we get this crazy data. If we come over to the side, remember whatever solution we have, we want it to update when new data arrives. I'm going to click in a single cell control asterisk to highlight the whole table, control C, scroll over. Click somewhere in the description column. Remember this is the bad data source. Control down arrow. This is an Excel table. So when I click directly below and control V, instantly, that new data is incorporated into our table. Control home. We want to come back over to PT. Right click somewhere in the pivot table. Point to refresh, and look at that. Click. Click. Our report has updated. And over in our queries and connections, I can see Power Query did its thing. It cleaned all that data and loaded 481 rows into the pivot table cache. Now before we go and look at our second and third example, I want to do one last thing. I want to double click our transformation. We can see our three steps. And when we click on any particular step, we can look up into the formula bar. But let's go look at the complete M code in the home ribbon tab advanced editor over on view advanced editor. Click advanced editor, and there's our complete M code. We have three steps that result in our output. These transformations will start with let. And everything here is case sensitive. Little l-e-t, followed by each one of the steps in the transformation. In is followed by the end result. Now notice each transformations step has a name source. This one is split column by delimiter. Why is this one not in double quotes and pound sign? Because that has spaces. Further, if you look over at text, like when we entered product and date and region as the column names, double quote, double quote, that's reserved for text. So when we want to distinguish the name of a transformation step from actual text, they use pound. Now if there's no spaces, that makes it easy. It's just the word. So we have the name of the transformation step, equal sign, and we can see a function based language, Excel dot current workbook, table dot split column, table dot transform column types. Also very important, at the end of each transformation line, there's a comma. That comma means go on to the next transformation. Comma, go on to the next transformation. And then the last one does not have a comma, because the name of that step comes after in. That last step is the end result, the output, or our final transformation. One other thing to notice that each time we have a function, it's going to refer to the name of the previous step. Because this line of code will be acting on the previous step. When we get down to the final line, split column from delimiter, that's the previous line that this line of code is acting on. All right, I'm going to click cancel. Click the red X. So that is example number one, where we had an introduction to Power Query transforming Excel data into a pivot table. Now we want to go to example number two. I'm going to click on the sheet access. Example number two, we're going to take an import a snowflake data model from Access, and import it into the Excel Power Pivot data model. Now before we go up to data and use Power Query to import Access data, let's go look at the Access database. You, of course, can download all of these files from the links below the video. We have three tables over here in Access. If I double click F sales, date, product ID, sales, rep ID, those are foreign keys, units, and revenue discount. Down here, we can see there's about 37,000 rows. That F sales table is related to D products. There's the product ID. This is the primary key that's connected to the foreign key over in our F sales, product name, price description, and so on. Category is a foreign key. And if I close this, and open up D category, that's the primary key. So the categories for each product are stored in a separate table. If I go over to database, tools, relationships, there's the many to one relationship. Now notice over here in Access, they have the infinity symbol. Over in the Power Pivot data model and Power BI Desktop data model, they'll actually use an asterisk to one. But over here, F sales to product, and then product to category. Now when we import this, we want to pull the category into the D products table, so we have a proper star schema. That will be no problem with Power Query. Because guess what? When Power Query sees this database, it not only will pull in the tables, it'll pull in the relationships too. Let's close this. Now in Excel, we want to go up to our Power Query. Both of these sections here. But here it is, get data. This get data button will also be over in Power BI Desktop. We click the dropdown, and we have all sorts of options. From file, we can get it from an Excel workbook, a text file, XML, JSON. This from folder will be profound. We'll actually be able to simultaneously important multiple files. That actually will be our next example. We have all sorts of databases we can connect to from Azure, for example, an SQL database, from online services, other sources. There's the new from web, as opposed to our old legacy from web user interface, and a bunch of other options, including a blank query down here. There's the legacy wizards. You can actually have these show up in your get data dropdown by going to options in Excel. There's the old legacy web import. Then combine queries. Now what we want here is from database. And we're going to connect to a Microsoft Access database. So I click. I navigate to wherever that file is. There it is. That's the file you can download. Double click. This is the navigator part of Power Query. I want to select multiple tables, and then check each one of them. Now we want to come down here, and I do not want to click the load button. I want to click transform. Now transform data, that button used to say edit, but now it says transform data. When I click this button, it will bring the three tables into our Power Query editor. As we saw in the first example, queries is where we list all the different queries in this workbook. That is the first query from our first example. If I click on D category, I can see over here in properties, there's the name. There's the two steps. D product, there's the name. There's the two steps. We're going to click on F sales, and we'll start here. There's the name and the two steps. Now this is coming from an external data source, and I want to keep the exact same name, F sales. Now if we want a little bit more room, we can pull this task pane in a little bit. Maybe we can do that over here also. Now F sales, we look through each one of the columns at the name and the data type. Now we're going to look through each column and decide what we're going to do. This is a primary key, and it's in a fact table. We really don't need it for any of our calculations. Occasionally, you might need it for some calculations. But we don't, so we're going to remove it. Not only that, but because our goal is to take the data from these tables and dump it into the data model, since the data gets stored in a columnar database, that means each one of these columns will be stored as a unique list. And if there is a unique identifier as a column and we don't need it, why increase the size of our columnar database storing the raw data by keeping this column? So I'm going to right click remove. Date, you know what? These are actual dates, and it came in as date time. So I'm going to change the data type. Click the dropdown, and I'm going to change it to date. Product ID, sales rep ID, those are the foreign keys. We do not have a sales rep table in this example, so I'm getting rid of it. We do not need it for any of our reporting goals. Right click remove. We do need this one, because we'll connect it to the product table. Units as decimal, you know what? I'm going to change this to whole number, because units are whole numbers. For that matter, we can change this one also, because it's a product ID as a whole number. Revenue discount, that's perfectly all right to keep it as a decimal. Now look at this. This is something we're seeing for the first time. Because fact sales table is connected on the many side here to the product table, and the product table is on the one side, we're allowed to see the entire record on the one side over here in the sales table. This is because there's a relationship. So if I click off to the side, not on the value, but off to the side, I get a preview down here. This transaction right here sold product number four. And there's the record over on the one side in the product table. Now we actually do not need this column here. But for some transformations, that's really helpful. Because that means through the relationship here on Power Query, we can pull this data into this table. I'm going to right click remove this. Now notice over here on the steps, we have remove, remove, remove, and change, change. I actually could delete all those. Go through and remove all of them at once. And then change the data types all at once. That will be more efficient. So I'm going to go to remove column, right click, delete until end. Delete. Now I'm going to click on sales primary key. Holding control, I'm going to click on sales rep ID. Scrolling over, clicking on D products, holding control the whole time. Now I can simply right click remove columns. And there it is one step. Now we can go through and change units to whole number, product ID to whole number, date time to actual date. And there we go. We have our two steps. Now let's go over to D products. And guess what, here, we are going to need the primary key. I'm going to change this to a whole number. Product name is fine and data type. Same with retail price and standard cost, both are good names with the correct data type. ABC for description is fine. Data type text color is fine. Category key, now we'll come back to this in just a moment, but look at what we have here. Because product is in between sales and category. And the product in its relationship to sales, product is the one. Sales is the many. So we get to see over here in the product table, an actual table of values from the many side. So if we look down here, remember this is product number one. If I look down here in the preview, these are all of the sales for product one. If we needed to make a transformation, we could use all of those records over here. Now we don't need that. If I click on category off to the side, not on values, this is where we want to pull in general into the product table. So now I can use the expand button up here. Uncheck this if it's checked. Uncheck everything. And all we want to pull over here is category. Remember that record down there showed us the one side. As soon as I click OK, that means this new column here will have duplicates. When I click OK, we've pulled the category for each product into the product table. Now again, the columnar database will store this as a unique list, so it's no problem. And that means we have the column we need for analysis in the product table. And that will match the star schema. We don't want to use the snowflake. We're using the star schema. Now again, for a small data set, if you're able to navigate it easily, it really has no performance downside at all. Now we need to delete. Category key, hold control, F sales, right click remove columns. And there we have our steps. Now if I click on D category, I can click off to the side here on the table. This is the one side of the category. So when I'm looking at the D products table down here, I see a preview of all of the products that fit that category. Now here's the thing, this category table, when we were over in products, we pulled in category. But I want you to notice something. If we go back to navigation, and this is another great thing about Power Query, we can go back and look at the individual steps. Look at what happened. When we in imported D products from a relational database, there were relationships over in Access, it actually pulled in from the one side in the category table, the data we needed. We actually didn't even need to import the D category table. Now in step number one, we could have just not clicked that, but we wanted to take a look at that over here. Now guess what? We can simply right click delete. I'm deleting that query. I click delete, and that query is gone. We have our F sales table just the way we want. We have our D products table just the way we want, including the category. Now we need to load these. Now we already loaded this one. So now when I go up to close and load, click close and load to. I very carefully want to say, only create a connection. That's for Power Query. Now I come down here, and for the data model, I check, add this to the data model. Both of those tables will be added. When I click OK, I can see over here, it's loading, 11 rows and 36,942. Now either on the Power Pivot ribbon tab data, I can click data model button, or I can use the keyboard, alt, A, D, M. Here's the Power Pivot for Excel window. We can see that both tables were imported. Remember the columnar database is an in-memory database that stores the data that we can manipulate here in Power Pivot or over in an Excel sheet when we have a data model pivot table. Over in diagram view, there's are two tables. Now remember, we need star schema. There's the fact table right there. Many side, one side, click either one. Click and drag, and I've created a relationship. Now later in the class, we'll get to do lots of amazing things with this relationship. We can look up on the one side, the many side. We can drag conditions and criteria to our reports and a bunch of other things. But what have we done? We used Power Query to import from a relational access database just the data we wanted. We took that snowflake and converted it into a star schema. Control S, we're not going to build any reports. We just wanted to see the power of Power Query. Now I'm going to close this Power Pivot window. And now we get to see our third example. We're going to import multiple text files and append. Append means to stack up the different tables all into a single table. And we'll do that in Power BI desktop. Now you guys had to buy Office 365 to get Power Pivot and Power Query and a regular Excel. But Power BI Desktop, you just go search Google. It is a free download. Not only that, but it's one of the fastest free downloads and installations I've ever experienced. So this is what you'll get. There it is, Power BI Desktop. I'm going to click to open. We'll close this. Now later, just like Power Pivot, we'll have an introduction and talk about Power BI Desktop. Over here on the left, we'll build our reports there. This is where we'll see our tables. This is where we see our relationships. For this video, we're just talking about Power Query. And get this, that little external data, that's our Power Query. We can edit the queries. And there's our get data. Now there's a shorter list over here. But if you come down to more, here's all the categories, get data, just like we saw over in Excel. Now the one we want is file from folder. Now what is so amazing about this is that when we point Power Query to a folder either here in Power BI Desktop or over in Power Query in Excel, Power Query sees every single file in that folder. So we could have 10 or 20 Excel files, or in our case, many different text files that we need to simultaneously import. Now why this is so amazing, well, for a couple reasons. We might have monthly or yearly data each in a separate file. Also, for text files why it's so important is because all systems in the world that store data cannot communicate directly with each other. But all systems can export text files and then subsequently import those text files. So one of the main ways that systems have shared data with each other is through text files. Now before I click this, I'm going to stop here and go over and look at the text files. Now you get to download this as a zipped folder, and then unzip it. Inside this folder, there are two other folders. If I double click on start, there's our sales data, 2015, '16, and '17. Now I put some other files in here also, because another amazing thing about Power Query is we can filter. Even though Power Query will point towards that start folder, we may not want all the files. So we can filter and get just the text files. Now let me double click and open one of these text files. In Notepad, you could see that we have date, product ID, category merged together, units, revenue, cost of goods sold, and profit. This is a tab delimited file. If I were to right click open with Word, you could see the tabs. Now what's so amazing is we have 2015 to '17 data. We'll tell Power Query to look at this folder. Later, when we add more text files with new data, over in a report or Power Query or wherever we dump this data, we just refresh it and everything updates. So when we go back later, we'll add these files and see if everything updates. All right, I'm going to close. Back over here in Power BI Desktop, with folders selected, I click connect. It's as easy as browsing. I'm going to come down and open up MSPTDA03, and very carefully point to start. Now remember what are we doing? We're telling Power Query to import everything in one folder. I'm going to click OK. Click OK. I do not want to combine, because I have some filtering to do. I definitely don't want to load. I want to edit. Almost always, you want to click the edit or the transform data button. At least go look at it and make sure everything's in order. I'm going to click edit. Now this is similar to the way it looks over in Excel. The actual column is a binary column. And each row contains the file. I can see the text file. If I click the second to last one, I can see the document. So each one of these is a file that we want to extract and get things. Now later, we'll see there's problems with Excel files, because Excel files can have all sorts of objects. But text files, that's simply going to be one proper data set. Now it might not contain a proper data set, but we definitely exported it from a system where that's the case. Now it gives us some attributes of those files. The one we're interested in is extension. Now in some of the other examples in this class, we'll actually have to extract data from the name, like the year or something or the name of the city. But that's not what we have here. Each one of those text files has a proper data set. Now remember Power Query is case sensitive. So if we have something dot TXT capital and dot TXT, well, we would have to filter and say, hey, please, give me this one or this one. But instead of doing that, we're going to right click, just like we did over in Power Query in Excel, and there's a bunch of options here. We want to go down to transform. And look at that. There's lowercase. Now actually, transform has it over here too under format. But we're going to right click, transform, lower case. That way, any file extensions that we bring in will always be lowercase. Now it's a simple matter of filtering text filters. And I'm going to say, it must be equal to dot TXT. Now I probably could have just done the dropdown up there, but this will give me the same line of code. Click OK, and I have filtered out the non-text files. Now up in the binary column the double downward arrows mean combine files or append these files. And when we click combine, it's first going to be polite and ask us things like, what's the example file? The first one is fine for us. What's the file origin? That one's fine for us. What's the delimiter? We definitely have tab. And it's going to ask us, do you want to base this on the first 200 rows? Now we got this data from a database, so it spit out all the correct way, so the first 200 rows is fine. We can already see a preview over here. Now when I click OK, it's going to do a bunch of steps for us. I'm going to click OK. Look it over here. It created a bunch of queries. Now it actually took the sample file the first one, it built some code. We can see over here, source and promote headers. And then it built a custom function. Now later in the class, we'll see how to build our own custom functions. And finally, it used that custom function to combine. And we can see over here, invoke custom function, it did that for each one of those files. Now if you wanted at any one of these steps, you could actually go and look at the M code. Over to view. There's our advanced editor. And there is the M code, just like we saw over in Excel. I'm going to click cancel. Now for text files, it's safe to click that double downward pointing arrow. All of these steps work just fine. Our table will be one fully appended table. Later when we see how to do it with Excel, we'll see we're not going to use the double downward pointing arrows to combine. We'll do something different. Now we're ready to load. Notice we really didn't have to do many transformation steps at all here. When their text files, and the data is consistent, it is easy. Now home, close and apply, not close and load. We really can just click close and apply, because it will automatically go to the data model in Power BI Desktop. So I'm going to click close and apply. Loading to the data model. And now we can see we're in relationship view. We only have one table here. We come over to our table view. This is actually brand new, just in the latest, I think it's June 2018 update. We used to have these dropdowns for filtering and sorting over in Power Pivot, but we didn't have them over here in Power BI Desktop, but now we do. Now if we look down here, we can see we have 524,000 rows loaded from those text files. Now actually, look at this, we forgot to do a few things. We forgot to split this. And if we go over and look at the fields on the right, that is a terrible name. No problem. Over here in Power Query, we can go back and edit the query anytime. Edit queries. Edit queries. I'm going to come over here to name. Call it something like F sales. Come over to the product ID category column, right click. And just as we saw over in Excel Power Query, split columns by delimiter. We have a single forward slash. And it guessed right. I'm going to click OK. Just as we saw over in Excel, I'm going to get rid of this change type. And then come up to split column by delimiter, up in the formula bar, we're totally allowed to edit the M code. So I'm going to call this first column here product ID. The second column, we'll call category. And enter. Now it added the change type step automatically. And actually, both in Power Pivot and over here in Power BI Desktop, if you don't like that automatic setting, you can change it in options. Later in the class, we will change some options. All right, so we finished our edits. Now we come up to close and apply. And there, we have our two columns and our proper name. Now we're not going to build any reports in this video. We're just looking at Power Query here in Power BI Desktop. But guess what? 524 rows. Let's go drop the next two text files in and then refresh. Over here in Windows Explorer, I'm going to go to add later. Select, hold shift or control, click, control C, click the Back button, double click start, control V. Now this folder has five text files. Power Query's pointing to that start folder. So when we go back over and click refresh, it is totally running Power Query and getting the latest files. Down here, we can see 912,000 rows, almost a million rows from those text files. All right, so we saw over here in Power BI Desktop that we can use Power Query to get and transform and load data. Now before we go look at our last example, uh oh, we forgot to save. Guess what? I'm going to use the same keyboard as in Excel or Word or anywhere, F12 to open up save as. And I'm going to call it something like MSPTDA03. And notice the extension dot pbix. When I click save, now I have saved this. Now we'll go look at our last example over in Excel. Back over in our Excel workbook, I'm on the sheet no array formula. I got to show you something. I wrote a whole book about array formulas called Control, Shift, Enter Mastery in Excel Array Formulas. And there is one advantage to formulas that Power Query can't really beat, but let me just show you here. If our goal is to from this table always extract a unique list and have it sorted, this is how crazy it gets with array formulas. That's crazy enough. That's counting unique. But this formula, wow, that even makes my head spin. But that's the formula to extract text items resulting in assorted unique lists. Let's just see how easy this is with Power Query. Because what are we doing? We're replacing a complex array formula. Now this is already an Excel table, so I click from table. You know what? I do this so often, there is a keyboard, alt A, P, T. That just brings the table into Power Query. Now I don't need change type. The only column I'm interested in is sales rep. I'm going to right click and say, remove other columns. Now what I want? I want a unique list. Well, guess what? Up in remove rows, there is remove duplicates. By the way, that icon right there is the same over in Excel. I can also right click the column header. And remove duplicates is common, so there it is. Now I want it sorted. I'm going to use a regular sort feature. This is just amazing. There it is. How many clicks was that? A few clicks? I'm going to change the data type also as text. There are the steps. We can close and load, close and load to. I want to see as a table on the existing sheet. I definitely don't want it in C 13. Maybe right there. Uncollapse, click OK. And there it is. Now the moral of the story is that that is super simple. This is super hard. But watch this. Let's come down and add a new record. Tab, we'll add a date, abdi. Look at that. That's how come sometimes, for some solutions, we still need array formulas. It just instantly updates as soon as we change the data. It's easy enough to come over and right click refresh. Then just like that, we get our sorted unique list. Now look at this. That's a terrible name. With the query highlighted, hit the F2 key. And I'm going to call this something smart, like unique list sales rep. Hey, that was an epic video all about Power Query. We saw how an example for to replace a crazy array formula with a simple Power Query solution. Over in Power BI Desktop, we saw how to import from multiple text files almost a million rows of data, imported, transformed, and appended into one table. And then with Power Query, we connected to an axis relational database, imported, transformed, dumped the data in Power Pivot, built a relationship, and got our star schema data model. And we started it off taking bad data in Excel, transforming it in Power Query, and learning a lot about Power Query in the process, and creating a cached pivot table report. All right, if you liked that video, be sure to click that thumbs up, leave a comment, and sub, because there's always lots more videos to come from Excel is Fun, including video number four, our next video where we'll see how to take multiple Excel workbook files and consolidate them all into a single table using Power Query. All right, we'll see you next video. [MUSIC PLAYING]
Info
Channel: ExcelIsFun
Views: 132,637
Rating: 4.9767146 out of 5
Keywords: Excelisfun, Highline College, Data Analysis, Business Intelligence, BI 348, Mike Girvin, Microsoft Power Tools for Data Analysis 03, Power Query, Introduction to Power Query, Free Power Query Class, Learn Power Query Online, ETL with Power Query, Extract data, Import data, Transform and Clean Data, Transform Data in Power Query, ETL in Power Query, Import From Access Database, Import Text Files, Append all Files, Combine Button Power Query, M Code, Edit M Code
Id: BLl7DpL7JrI
Channel Id: undefined
Length: 57min 13sec (3433 seconds)
Published: Wed Jun 27 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.