MSPTDA 06: Power Query: Merge, Append, & UnPivot – 3 Important Transformations

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Welcome to MSPTDA video number 6. In this video we're still studying Power Query, and we want to see three important transformations. We're going to see how to merge, append, and UnPivot. Our first example, we will see how to merge two tables to replace either VLOOKUP in Excel or relationships in the data model. Here's our end result. We need this sales rep table with territory, not like the original sales rep table where there's territory ID and we're either supposed to do a VLOOKUP to the lookup table or have a relationship between the two. Not only that, but the source for each are two different files, text and Excel. So we're going to go from these two files into a single table. Our second example, we want to see how to append one, two, three tables, stack them up on top of each other, and get a single table with date, product, region, and revenue. In this example, we'll have some problems because there are different columns in each table, and the first table from the sales for 2017 doesn't even have the revenue calculated. No problem. Power Query can take all of these tables, transform them, and append them into a single table. Finally, we'll get to do UnPivot. Someone's keeping data in a non-proper dataset, students down the rows, class across the columns. So we need to UnPivot this and create a proper dataset. If we go over to the sheet merge, we could see our first example. Oh, wait a second, there's nothing here. We got to go look at our files first. Now I'm using Windows Explorer, and you can download these files. There is an Excel file. If we open it up, it's an Excel table. We can definitely import this. And there's our text file. In our text file is the tab delimited text file. So we'll import both of those files. Power cruising the data ribbon tab, get and transform. I'm going over to the get data button from file, and we'll do the Excel file first from workbook. Navigate to wherever that file is, double click to import. This will open up the navigator window. We only have one file to get, so we simply select it. Now we come down, and because we're in Power Query in Excel, it's this transformed data. I'm going to click transform. Step 1, I'm going to come over and name this sales representatives and enter. These steps are fine. This table looks like it already has the correct data types. Now we're going to close in load, close in load 2. We want only create a connection. That means Power Query will just be connected to the data in that Excel file. Click OK. There's query number 1. Now we go over to get data from file and text. We'll navigate to wherever that file is, double click. It's going to ask us to file origin, the delimiter, and it's going to ask a difficult judge on the first 200 rows. Yes to all those. It is tab delimited. That's delimiter, so I come down and click transform data. I'm going to come over and name this something like territory lookup table and enter. Now look at this. It did not interpret the text column headers as column headers, but no problem. We can come in the upper left hand corner, click the dropdown, use first row as headers. That's also up in the transform group in the home ribbon tab. And then we have promoted the first record to field names. The data types are OK. Our steps are OK. Close and load, close and load 2, only create a connection, click OK. Now we already have our two queries over here, so now we can come to get data. Down to combine queries, merge is when we replace VLOOKUP or relationships. Later in this class we'll see that it's actually doing a join, and we'll get to see the different types of joins. A pen, that's one on top of each other. We want merge. Click-- it'll ask us what the first table is, sales representatives. What is the second table, territory look up. Now we simply have to click territory ID in both. This is the foreign key. This is the primary key. If we were in Excel, VLOOKUP would look at that value, find a match, and bring it back. In the data model, we'd simply create a relationship. And if we wanted the value over here, we'd use related. Now we're going to do a left outer all from the first matching from the second. And get this. We definitely want that because we want every single sales rep name but only matching from the second because this company does not sell in all of the territories in this lookup table. So left outer will work. Click OK. Now we'll bring in a table, and if you click off to the side to the right of the word table, you can see down here it is bringing back the single record that matches. There's MEX, a foreign key. There is the primary key, and there's the territory name. I'm going to come over to the expand button, uncheck use original, uncheck all, and we only want territory. Click OK. Now I'm looking at the data types. The data types all look fine. I actually do not want this now that I have territory, so I'm going to right click remove. I want all the rest of the columns. Now look at this. I forgot step 1. Step 1 is always name the query. D sales rep, as if this was going to be a dimension table in a star schema data model, and enter. Now for this example, I'm going to load this to the sheet, close and load, close and load 2, table, existing sheet, A1. There it is. That's on the merge sheet. Click OK. And there we have successfully merged the two tables replacing VLOOKUP or relationships. Now I'm going to click the filter, and filters are awesome because they'll always give you a unique list of all the items. And I'm looking through-- these are all the areas that this company sells in. No USA. Well, we just got a new sales rep, and they're selling in the territory USA. So I'm going to save this, find my source data, and open it, go down to the bottom, control down arrow. Right here, I'm going to hit tab. This is an Excel table, so it adds a new record. Type the sales rep ID, the name, today's the hire date, and USA. Now I'm not going to hit tab. I'm going to hit enter. Save Alt F4 to close that workbook. Now I'm going to come over and somewhere in our query output, I'm going to right click refresh. And you could see there it is, 135 rows. If I click the dropdown, it'll be right in alphabetical order. This is just simply amazing about filters. There it is, United States. If I control down arrow, sure enough Sue Red Coolinator has been added. So that's our first example. We went from a text file and an Excel file, merging the two into a single table. Now let's go look at our second example. Actually this is probably going to be the most fun one. And the data is right here in Excel. If we go over to 2017, date, product, region, that's all good. Those columns match the columns in the other tables. But units, net cost, equivalent, and price, that's what we're given in this table. We have to multiply all those to get our revenue. If we go over and look at 2018, this table, if you look through it, is exactly what we want. It has the four columns, everything correct. 2019, uh oh, there's an extra column. Now what we're going to do first is I'm going to import all of these and then try and append them because this is a common mistake. We'll see what happens. So I'm clicking in a single cell. This is already an Excel table with a smart name. Now I go to data. From table arrange, that name is OK. Those steps are OK. I'm thinking though I'm going to change this to date. It'll ask me if I want to replace it because that automatic step there was already applied to data type. I'm going to say replace. The other data types look fine. Now we're going to leave this for the time being, but later we'll come back and fix it. Close and load, close and load 2, only create a connection, click OK. Now I go to 18. The keyboard to import an Excel table as alt, A, P, T. I'm going to change the data type to date, replace. The data types look fine. The name is OK. The steps are OK. Close and load, close and load 2, only create a connection, click OK. Now I go to 2017. A single cell is already selected, alt, A, P, T. Date, I'm going to change that and replace it with date replace. Everything over here is OK. Close and load, close and load 2, only a connection, click OK. Now we can go up to get data and let's try append. Combine, append, and this is so cool. I remember earlier versions they didn't have this, three or more tables. I'm going to click Sales 2019. Hold shift, click Sales 2017, add, click OK. Well, it got date, product, and region, but color, well, it looks OK here. But definitely units, net cost equivalent, and price, we can see what happens. These three columns were only in the 2017 table. So for all the records with 2019 and '18, they'll be nulls. Somewhere down below here, there's nulls because 2019 was the only table with color. If we click the drop down here, look at this load more. If your filter doesn't give you a complete list, click load more. And there it is. There's some nulls down there. Click cancel. Now I'm going to come over and name this something like three years data. Close and load, close and load 2. I'm going to create it only as a connection for the time being. It's a totally messed up append. Click OK. Now all we have to do is go edit these. So sales 2019, I'm going to double click. So for color, right click remove. Close and load will just close to the place it's already been loaded, so it will close to a connection. Now I come over to Sales 2017. Double click and our goal is to add an extra column that will multiply price times net cost equivalent. Now a net cost equivalent means how many pennies for every $1.00 of the price should we charge the customer. Down here there's no discount, so 1 times 19. Down here, the first record is a discount of 9% or 9 pennies, so we're going to multiply 91 times every dollar. So we multiply these. That will give us the net price times the units. So watch this. Instead of like we did in earlier videos where we added a custom column, I'm simply going to highlight units, hold shift, click on price. My goal is to multiply all three. And in the add columns, there's this great feature, from number. Click the standard and I'm going to click multiply, and it will add a column for us. Now a couple videos ago, we went and edited up in the formula bar. Here's our inserted multiplication step. We can see table dot add columns. And I don't want multiplication as our name. This should be revenue. So instead of adding an extra step here, if I look up in the formula bar, change type, that's the function acting on the previous step. That's the name of the column. Each means for each row in this table list dot product. That means it's going to multiply whatever's in this list. And look at that. We learned the syntax last time. Curly bracket can be the syntax for the positional index operator, but curly brackets are also the syntax for a list. So what did it do? It used the lookup operator, which is the square one, got the units for this row, net cost equivalent for this row, price for that row. The curly brackets make it a list, and then we can use the list dot product function, which means it multiplies. It even added a type. Now I'm going to come over to the formula bar, double click multiplication, and call this revenue and enter. And there we have our revenue. Now we do not need units all the way to price, so I click one, hold shift, click the last one, right click remove. Now for 2017, I have exactly what I want-- date, product, region, and revenue. Now before we close and load this query, there's two other important points. Remember we're trying to append these one on top of each other, and we have date, product, region, and revenue fields. Now remember when we had different fields, we ended up with extra columns with nulls. Well, that will happen also if you do not spell the field names the same, and you can run into trouble if you don't have the same data type. For example, if one is text, then all of the columns will end up not having the data type. And then for a revenue column like this, you wouldn't be able to add. So when appending, we actually have to have the same columns including all the tables having revenue. All the field names have to be spelled the same, and we need the same data types. Home, close and load. And now watch this when I double click three years data. That is so beautiful in Power Query. We have various queries over here, and they're connected. And I can go edit any of them. And when I come back here, it's like magic. It's all been cleaned up. Now, if we look at the formula bar, this is a cool function. All it did is it knows that there's these names-- that's the names of these queries, which happened to be tables-- table dot combine. That's all it did. And then it lists syntax. It listed each one of the tables that needed to be appended. Now I'm going to close and load. And I'm going to come over and change the load location, right click, load 2, table, collapse. And I'm going to try and come over to append, A1. There it is. Click uncollapse. Click OK. And there it loaded it, but it wasn't polite enough to jump over there. But sure enough, there it is. Actually if we come back to wherever we were, all I have to do is single click the query, and it jumps to that sheet. Our last example, UnPivot. Now our goal on this sheet is to take this non-proper dataset and convert it to a proper dataset. Now what someone did here is they took raw data that should be in a proper dataset and organized it into a cross tabulated dataset. That means these grades for these classes are associated with that student. Well, if this was a proper dataset, there would be three columns-- student, class, and grade. The next record would be student, class, and grade. Now the process of taking data structure like this and converting it to a proper dataset is called UnPivot because this sort of looks like a cross tabulated pivot table. Now it's almost like a cross tabulated pivot table except for that 1.7 grade for Ernestine in Business 218 is not a summarization. It's a bit of raw data, but it looks like a pivot table because at the column header is a condition or criteria and at the row header is a condition or criteria. So because this is actually raw data, we're able to take this in UnPivot it into a proper dataset. Another way to think about this is if we did create a pivot table, remember there's a unique list of items in the row area and a unique list of items in the column area. It is the column area that we're going to UnPivot. We literally tell Power Query, hey, take these columns in UnPivot. Power Query will know to take these conditions and put them in a new column and take all of the raw data numbers and put them into a third column. Let's see how to do this. I'm going to click in a single cell. This has already been converted to an Excel table, which is required when we take data from Excel and bring it into the Power Query editor. So with a single cell selected, I can go up to data from table range or we'll just use our keyboard, alt A, P, T. Step 1, I'm going to name this. So final proper dataset grade table and enter. I do not need change type, so I exit out, click the red x. Now when we're doing UnPivot, these are the columns I want to UnPivot. So I could click on the first column, scroll over, hold shift, click on the last column, and then right click any one of the columns. And down at the bottom, there's UnPivot columns, which would work when we select the actual columns we want to UnPivot. UnPivot other columns is another option and UnPivot only selected columns. Now the problem with doing it with UnPivot columns is we had to highlight a lot of columns instead of just one column and then click UnPivot the other columns. The second problem with this method is if we choose UnPivot columns, the code underneath the m code will always write it as an UnPivot other columns. So it's much faster, especially in this case where we have lots of columns here and one single column, to select the single column-- that's faster-- and then use UnPivot the other columns not selected. Now UnPivot is over in transform. There it is right there. You could click the dropdown, or we can simply right click UnPivot other columns. And there we go 1, 2, 3 columns. If we look over here in our applied steps, there it is, UnPivot other columns. If we look up to the formula bar, there's the table dot UnPivot other columns function. The functions always act on the previous step, then it says, hey, which column do we right click and want to UnPivot other columns-- there it is-- and then it named the two columns. Now I'm actually going to come over and x this out and watch this. I can select all the columns, right click down to UnPivot these columns. And when I do that, I get the same line of code. So x out right here right click because it's faster to just right click this one column and UnPivot other columns. Now want to rename all three columns. So I'm going to double click and call this student. If I want instead of double clicking, I can hit enter and use the arrow key to arrow to the next column and F2 to put it in edit mode. This will be class, enter arrow, F2, and this is grade and enter. I also want to change the data type. This will be text. That one's fine. This one will be decimal. There's our applied steps. There is the name. Now we want to close and load this proper dataset, click close and load then close and load 2. I definitely want this on a table on the existing sheet. I'm going to try and click in K4, click OK. And there it is. We have a proper dataset. We can do things with this proper dataset that we can't do with a cross tabulated data table. We can sort the grade column largest to smallest then sort the class column. And now we can see all of BI 348, the grades from highest to lowest, and then go on to the next class and there they are. So in this video, we saw three amazing transformations that Power Query can do. We definitely can go from a cross tabulated data table UnPivot into a proper dataset. We can definitely append three different tables, but before we append, we can fix them in Power Query and then append one on top of each other into a single proper dataset. And in our first example, we saw how to take two different files, one with a foreign key that needed to look up to a lookup table and a primary key, merge these two tables into a single table, and replace VLOOKUP or relationship. If you like that video, be sure to click that thumbs up, leave a comment, and sub because there's always lots more videos to come from Excel Is Fun, including video number 7, our next video, where we'll do lots of merges looking at all the different join types that Power Query can do. We'll see you next video.
Info
Channel: ExcelIsFun
Views: 61,943
Rating: 4.9804878 out of 5
Keywords: Excelisfun, Highline College, Data Analysis, Business Intelligence, BI 348, Mike Girvin, Microsoft Power Tools for Data Analysis 06, MSPTDA 06, Power Query, Merge, Append, Unpivot, 3 Important Transformations, Power Query Transformations, Power Query Basics, Learn Power Query, What is Power Query, Append Errors, Extra Columns in Append Query, Why are there extra columns in Power Query Append?, Merge to get Star Schema Data Model, Power Query Too Many Columns?
Id: 8F7v6YvnsiY
Channel Id: undefined
Length: 23min 39sec (1419 seconds)
Published: Sat Jul 14 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.