Excel Basics 25: Power Query (Get & Transform) to Clean & Transform Data into Proper Data Set

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Welcome to Excel Basics video number 25. Hey, in this video we've got to talk about the amazing and brand new feature called Power Query, also known as Get & Transform to convert bad data into a proper data set. Now, this is what it means by bad data. Someone decided the description column we would combine product, name, date, and region. Separated each by a space, forward slash, and space. Really, those are three separate columns. And the reason this is bad data is because, of course, we can't make a pivot table, or use all other data analysis features like sort to perform data analysis. So what we're going to learn is that Power Query allows us to clean data, transform data, and even import data. Now, we'll go from this bad data set into a proper data set, which of course then will allow us to create a pivot table. That will be example number one. Example number two, we're going to import multiple text files. And many businesses transfer data from one system to another using text files. And so every month-- April, May, June and July-- we get a new text file with the sales data. So what Power Query can do is simply look in a folder and it will take all of the text files and create a proper data set. Now, of course, proper data set-- as we have seen, what 20, 30, 40, 50 times in this class-- field names at the top, records in rows, and empty cells all the way around. Now, the reason that the proper data set is so important, of course, is because over on data analysis we've been defining data analysis and business intelligence as going from raw data into useful information. But guess what, sometimes there's this step before proper data set. Here we have bad data. The region and sales rep is combined. So we've got to go from bad data, clean it to become a proper data set, and then do whatever we're going to do from that proper data set to create useful information. So far in this class we've done all of these amazing data analysis topics. And our last topic for data analysis is Power Query, also known as Get & Transform. Over on this sheet, what Power Query does. Well, Power Query-- one, two, three-- clean, transform, import. And we'll see all three in our two examples. Now, we're got to talk about something important. Because if you do not have Office 365 for Excel 2016 the data ribbon tab looks like this. And the group we're going to be using is called Get & Transform. If you have Office 365 for Excel 2016 then that data ribbon tab looks like this. Now, the buttons we're going to need-- if you don't have Office 365-- and this is what the group looks like in our lab here at Highline-- it'll say Get & Transform group in the data ribbon tab. For our examples in this class, we're going to use the from table button, new queries, and show queries. Now, when you have Excel data in an Excel workbook, you use the from table button to bring it into Power create. When we're importing external data, we use the new query button. And when we want to see a list of our queries, we click show queries and it opens a task pane. If you have Office 365/ Excel 2016 there's the button to get data from Excel. There's the dropdown get data to import data for us. We'll be importing text data. And over in the queries and connection groups, that's the button-- queries and connections-- you click to see a list of your queries. Now. If you download the PDF notes below this video, I have detailed screenshots of every single step in this video. And when there is a difference between those two, I have screenshots of both. All right. Let's go look at our first amazing example. I'm going to click on the sheet EX one. Now, here's our data set. And if I scroll down, we can see what the problem is. This column really is three columns, and so we need to split it apart. Not only that, but once we get it into a proper data set, this is the actual pivot table report that we want to create. Now, that's a picture I saw a minute dragging it out of the way. Now, as we look at our data set over here, notice two videos ago we learned about Flash Fill. So I actually could come out here, give it an example or two, and then Flash Fill. Now, that is perfectly OK if it's a one time event. But if you have source data like this and this source data might change-- or more importantly you add new records to the bottom-- then you want to use Power Query. Because Power Query-- once we create our proper data set-- can be refreshed when the source data changes. Now, this data set here is how the data gets sent to us. So every single time we get sent new data, we have to split it, and then make our report. Now, if we scroll over to the site later after we create everything-- the Power Query and the pivot table report-- we're simply going to get these new records, include room at the bottom of our Excel table, and everything will update-- the table, Power Query, and our final report. So that's an example of when you'd want to use Power Query instead of Flash Fill. Another important bit of terminology we need to learn about is when you get multiple columns all in a single cell. Or in our next example we'll have all of the columns of data on a single line in a text file. The character-- or set of characters-- that separates one, two three columns is called a delimiter. So our delimiter is space, forward slash, space. There are all sorts of different delimiters. Sometimes people use tabs, sometimes a comma, sometimes all sorts of different characters to split the different columns. They're called delimiters. Now, another important thing is if we go up and look at data, anytime we use from table or range or get data or get text data it will open up a new window called a query window. Also, in this version if I wanted to see the list of queries, notice I click the button queries in connection. So the word query has a very important meaning. Now, this is the first time we've seen a query. And all a query means is I'm asking some question of the data. Now, our question-- or query-- is can you please take this column and split it into three columns. Namely, can you split it into product, date, and region. And the word query is universal in all areas where data is used. Next video will be our first video about the program Access. In Access we have a whole section of Access that is just queries where we ask questions of our data set. All right. So we are ready to build a query to ask the question, can you please create a proper data set from this bad data data set? Now, in order to get data from Excel and click this button, you have to first convert it to an Excel table. Now, the reason we have to convert it to an Excel table is because they want you-- if you're going to use any of these amazing query tools-- they want you to make sure that it is a dynamic table of data. That if the data changes or we add new records or take away new records, everything will update. All right. So step one we've got to convert it to an Excel table. We go Up to insert, click on table, or of course we use control T. My table has headers. Click OK. Now, we immediately want to go up to table tools, design properties, and name this table. We have the name everything smartly, especially since there's going to be lots of different names-- not only here in the start table-- but in the query and in the final clean table. So I'm going to call this start sales table and enter. Now, with a single cell in our Excel table we can go up to data. Get & Transform-- whichever group you have-- and select from table. Just like that it opens the query editor. There is our table. As the data sits in the Excel sheet, we're going to come over. And in the query setting we want to give this a good name. That start sales name is going to be the name of the table that sits in the sheet. We want the query to have a different name. I'm going to call this sales proper data set and enter. sales this query is named Sales proper data set. And once we transform and clean this data set, when we load it back to Excel it will be an Excel table which has this name also. Now, we have two columns, and our goal is to split. So I'm going to click on the description column up in the home ribbon tab here in the query editor. I go over to the transform group, and I'm going to select split column. Now, it asks by a delimiter. That's something like a tab, a comma or in our case space, forward slash, space. We can also split it by a set number of characters, like if we only wanted state and it was two characters. But I'm going to select by delimiter. Split column by delimiter dialogue box comes up. It thinks we want a space, but I'm going to click the dropdown and select custom. And I can type any set of characters that I want. Space, forward slash, space. We can choose where to split. I want to split at each occurrence of the delimiter. Now, when I click OK look at that-- one, two, three, and four. And remember this is totally dynamic. When I add new data next time, it will split that data just like it split this data. Now, we see that these are not very useful names. So we're simply going to rename them. Double click description one, and we're going to call this product, and enter. Now, I'm going to double click description two, and call this something smart date, and enter. Double click description three, and call this region, and enter. Double click amount, and I'm going to call this sales, and enter. Now, the other thing we want to notice is that there's little icons that tell us what type of data Power Query thinks this is. Now, ABC is the icon for text. The calendar is the icon for date. And one, two, three is the icon for a whole number. Now, you can click any one of these-- the icon-- and it allows you to change the data type. Now, the data type is really important, because this is a Power Query table. And when we dump it back into Excel, it will actually be Power Query output. Now, if we try to create a pivot table and want to group to date, if it was accidentally coded as a text data type, then grouping wouldn't work. Similarly, if the sales number was coded with a data type text, then we wouldn't be able to do some calculations like summing. So data types are very important. Just like we saw over in Excel with the default alignment. But here we explicitly say what kind of data type can go in this column. Now dat is the first time we've seen an explicit data type. When we get over to Access in the next couple of videos, we'll specifically have to define types there also. Now, the reason that data types are important is because it's a way of assuring that the data in this column is the correct type of data. All right. It interpreted all of these correctly, but I want to change this one. So I'm going to click on the one, two, three icon, and I'm going to select currency. The difference between number, currency, and decimal-- decimal, you can have up to 15 decimal places. Currency, you can have up to four. And a whole number you don't have any decimals. Now, we're going to come and I'm going to click on currency. Now, it's not going to show any number formatting here, but that is the data type that we want. Now, if we look over here it has saved all of the steps, including the first two steps which it did. You can actually click on any step and look exactly what it did. So you can click through looking at how you built this query. If for some reason you needed to come back and edit later-- which we're not going to do in this class. We'll do in the class after this, Business 218 Spreadsheet Construction Highline class. You could click on this, and it would actually open up that step again, and you can change it. Not only that, but if you make a mistake over here you can click the X. So if you really messed up and you needed to delete a few steps, you could just click, click, click and start over. Now, we have our steps. We have our name. Our data is set. Now, we need to load it back to Excel. I'm going to click in the home ribbon tab, close group. I'm going to click the dropdown, and we're going to choose close and load two. Now, when I select close the load two, it's going to open up a dialog box that looks different in the two versions of Excel 2016. It says import data if you have Office 365. Is load two if you have a version that's not Office 365. But in either one of the dialog boxes, it doesn't matter. You can choose to put it in a table, which is what we're going to do. And you can choose to put on an existing or new sheet. We want to put it on a new sheet. Now, I want to click OK. It's immediately going to load the clean data. Over here we can see queries and connection. If you have a non-Office 365 it will just say queries. But there's our query, and we can see that it loaded 365 rows. There is our table. If we click inside that table-- table tools design-- look at that. It has the same name for this Excel table as the query. Now, really they're the same thing. This is just a dynamic table built with Power Query. I immediately want to come down and double click. This is just the sheet name, but I don't want to leave it sheet one. I'm going to name it clean data, and enter. One last thing about the queries pane. If we come back up to data, if I click this queries and connections-- or in the other version it says show queries-- it actually hides that pane. So if you come back later and you need to find that page and you click on this button. And then if you want to edit it, you can double click to edit it or right click it. And then you can edit, or look through your steps. Also if you want to delete your query, right click delete or you can simply use your delete key. But we don't want to delete it. I want to click in a single cell and make a pivot table. Insert pivot table, or the keyboard alt NV. I want to put this on the existing sheet, and I want to put it in G1, click OK. Now, I have the queries and the pivot table fields open. I'm going to drag the date field down to rows. And in Excess 2016, it will automatically group. You can see down here it says months and date. I'm going to drag date off, and there we only have two months. Now, I want to drive product down below months. Come over to the row area, right click expand and collapse. Expand entire field. That is so beautiful. April and our product, May and our product. Now, I can drag sales down to values. Right click the values area. Number format I want currency. Two decimal symbol is OK. Click OK. Now, here is the amazing moment. We're going to come back over to EX one. We are hoping that all of those steps and all of that process is automatic. That means if we add new records, everything will update. I'm going to click in cell AA15, control asterisk on the number pad. Control C to copy. I'm going to scroll over. Click in somewhere in the first column of the data set. Control down arrow to jump down to the bottom. Click in cell A380, control V. Because this is an Excel table, it automatically incorporates it. If it did not-- then as we saw in our earlier video-- you can simply go up to design, table resize. Now I'm going to control up error. Now, notice that was the start table. Over here I see the query. And over on clean data, I see the pivot table. Now, you can right click update there, or I'm going to come to my Power Query created Excel table, and right click refresh. Actually I'm going to-- either one will work. I'm going to do it over here because this will be more dramatic. Right click, refresh. And you can see it's loading. And what it did is it re-split and added the proper names and types, but now for 481 rows. Our table is updated. I can control down arrow, and look at that. Control, home. Now, I want to come to the pivot table, right click, refresh. And just like that it totally updates. The new data was cleaned and transformed, dumped back into the table. And we simply refresh our pivot table, and our report is updated. Now, I forgot one of the columns. So now I'm going to go over and add region to columns. And there is our completed report. Power Query really is an amazing new tool that helps us clean the data. And the beauty of it is everything is refreshable. Now, we have a second example. I want to scroll over to example two. Now, I want to go to our website, our start file which we're working in. But you have to download this zipped folder with our text files. Now, I'm going to right click save link as, or however you want to download it. Now, I'm going to download it to the desktop just to make it easy to navigate to inside of Power Query. But by all means, you can download it to your system of folders, Excel folder. I'm going to click save. Now, on the desktop I'm going to unzip it. And your computer may unzip in a different way, but we can always unzip it right click. And then some option here will say extract all or extract here. I have seven sip. So I'm going to point to that, and say extract here. That will give me the single folder. Now, I want to double click this folder, and go look at what's inside. Now, here is the folder we just unzipped. There is a start folder, and then a text file with June and July data. We're going to not use those at first. Later we're going to copy those-- as if we just got them by email or something-- and then dump them in our start folder. I'm going to double click this start folder. And these are the two files that we're going to import first. Now, what we'll do is we'll point Power Query to this start folder. It will actually see the entire file path. And Power Query-- when we refresh-- will come back to this start folder and try and import anything that is here. Now, for our example we are only going to have monthly text sales data. For us, our system is clear, the start folder holds only text files. Now, lets double click one of these files and look and see what a text file is. Now, I'm going to double click. Now, it might open in Notepad or something else. I want to close this, and I want to right click open with. And either find word or choose default program and select Word. Because when we open it in Word, we'll be able to see exactly what the delimiter is. Now, when a file is saved as .txt it's almost always a tab delimited file. Now, the reason that this type of file is so t in data analysis-- and for the entire world-- is our data is saved mostly in databases. But there are so many different types of databases and systems to store data that they all agree that text files is how we send data from one system to another. So it's very common that your database wherever you're working will export the data that you need to analyze in a text format. Now, sometimes it comes with the tab. I have my non-printing characters on so we can see. Other times it comes as comma separated delimited data. Now, we can see that we have product, date, region, and sales. And then there's a bunch of records. So this is April data. I'm going to close this. And, again, our text file .txt is a common way that systems transfer data out of the current system. And then we can import it basically into any system. All right. You ready? We're going to go over to Excel and point Power Query towards that start folder. Here on example two I've selected A11. I'm going to go up to data. Get & Transform-- in this version it says get data. In the other version it says new queries. So I'm going to click the dropdown in whichever version, and point to from file. And then hover my cursor-- and this is so amazing-- from folder. I'm going click it. Folder dialog box we click browse. Now, we browse-- and you can browse anywhere on your computer. I made it simple. So I click the triangle, and I click on start. You go wherever you saved it on your computer. But when you point Power Query towards start it knows the whole file path, and it will try to import everything in that start folder. I'm going to click OK. There's the file path. That file path is memorized in the very first step of the Power Query editor. Click OK. Now, in the class after this will click edit, and learn some advanced techniques. But if you only have text files, we can use this amazing button right here, combine. We click the dropdown and say combine and edit. Now, when I click combine and edit it's going to take a while, because it's doing a bunch of steps. Actually before it does anything, it's polite. It says, hey, I'm going to use the first file. Here's the file origin. I'm going to base my data combining on what I see in the first 200 rows. But what we want to make sure is that the delimiter is correct. It even gives you a little preview here. Our delimiter is tab, so now we click OK. And now it builds a bunch of steps. It opens up the Query Editor. Now, over here it did a bunch of steps, including building a custom function that it uses to combine all the text files. Over here there is our name. There's all of the steps it's already done. We can see our columns over here. Now, I'm going to click in name, and we're going to call this something like monthly text sales proper data set, and enter. We don't need to view these automatically created steps, so I'm going to flick a collapse. There's an uncollapse and collapse. By the way, you get to see all of your queries that you have in this workbook here. There's the - we already did in example one. Here the one we're working on right now. I'm going to close this. There is the name of the query, there's our steps. Now, here's a column. And we don't need that. It tried to be polite, it thought we might need the file name. But we don't need this column. So I simply right click the t header-- or the field name-- and point to remove. Product is a good name. Text data type is fine. Date is a fine name. Date data type is good. Region ABC text type, that's fine. Sales, that's a fine name. I'm going to click on one, two, three and change it to currency. Now, I've added our last two steps. Again, this is quite beautiful. You can-- later if your folder path for that start folder changes-- you can come back and edit this. You can come over to the little gear and click on it and actually change the file path, browse and find the new start folder. Cancel. That's an example of how we would come back here later and select a step and edit. And, again, if you really mess up you can click. Click to delete those. In fact, let's just do that. I'm going to click on the last one. That was our last step. I'm going to X this one out. Notice it goes back to whatever it was before. And I'm going to change it to currency. Just illustrating how easy it is to edit this query. Now, we want to come up and close and load, close and load two. In fact, if we want to dump it straight to a sheet as an Excel table, then we can simply click this icon at the top. I'm going to click, and now it is loading to a new sheet. There's our table. There's the new query at the bottom. I can collapse all of these automatic steps. Collapse all of these automatic steps. I want to immediately come down here, double click, and call this clean text data, and enter. Now, we can see that we have 365 rows. If I click in the data set, control down. Now we have 366 rows, because we have field names at the top. Control, home. Now, let's go back to Windows Explorer. And I can see up here start. I'm going to click in the address bar on file downloads. Now, I'm going to click on 03, hold shift. Click on 04. Control C to copy, double click. Control V. Now, that start folder has four text files. When I go back and refresh-- I'm going to come over to the query pane, right click refresh-- instantly our new two months of data has been added to our table. Now, we could go ahead and create our same pivot table we did before. But that is absolutely amazing. Data, get & Transform-- either get data dropdown from table. Power Query to clean our data and give us a proper data set. All right. So in this video we saw how to import text files from a folder, and then dump new files in later. And over in Excel, everything updates. Example one, we had our bad data set. We converted it to a table. Used the button from table. Did our steps in Power Query. If we needed to look at them, of course, double click and it opens up the Query Editor. There's all of our steps. Then we loaded it back to clean data. Built a pivot table. Then later we added new bad data records to this data set. And sure enough, when we refreshed everything the table and our pivot table report updated. Right? That is our last video for Excel Basics in this Office 2016 class. That was video number 37. Next video-- video number 38-- we'll actually have just a few different videos, maybe three or four videos about basic Access. All right. If you liked that video, be sure to click that thumbs up. Leave a comment in sub, because there's always lots more videos to come from Excel is Fun. All right. We'll see you next video.
Info
Channel: ExcelIsFun
Views: 77,986
Rating: undefined out of 5
Keywords: Highline College, Busn 216, Mike Girvin, Michael Girvin, excelisfun, Excel Basics, Learn Excel, Excel, Excel 2016, Learn Excel Online, Data Analysis, Data Analysis in Excel, Excel Data Analysis, Power Query, Introduction to Power Query, Power Query Basics, Get & Transform, Get & Transform Data, Cleaning data in Excel, Clean and Transform data, Import Multiple Text Files into Excel, Split Data in Power Query, What is a Query?, What is a Delimiter?, What is a Data Type?
Id: ZBQ9Is61RLY
Channel Id: undefined
Length: 30min 54sec (1854 seconds)
Published: Sat Nov 25 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.