Basic Excel Business Analytics #29: Power Query: Import Multiple Large CSV Files Into Data Model

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Welcome to Highline BI348, class video number 29. And if you want to download this file, BI348 Chapter02.5 Import01 TextFiles start file or the finish file, and don't forget, for these examples in this chapter, here's the links at the site. You got to download that zipped folder that has all the files we're importing. All right. Now, we want to start off by going and looking at our files we want to import. Now, last video, we imported these text files. Here we have text files, but they're comma-separated values. And if I open this, it looks like Excel, but it's really comma-separated values. And if I Control down a little, there's a lot of data. There's 250,000 rows in this one workbook. So one of the main things we're going to do in this video is see how easy it is to get Power Query to import these in just a couple clicks. But we're not going to import it into Excel, because if we do and I want to go take a look at the file size, here's an example of file size. If we imported this and put it into a sheet, it would be 25 megabytes. If we instead of importing it into Excel, use Power Query and do only a connection and dump it into the data model, we'll get something close to about two megabytes. All right. So we want to import these using Power Query. I'm going to go over to my Start file. And actually, this Start file here, we actually used this last video to import gray data and build a dashboard. But here on sales report, we're going to go up to Power Query. And we're going to do the same thing, From File, From Folder. We're going to browse to our folder, and when you download, you download Import-01 Tech CSV files, you've got to unzip it, and then inside there's a Start file. Now, what we're telling Power Query right now is to always look at this folder. So Power Query will try to import everything. We will filter out what we don't want. When I click OK, click OK, the Power Query editor comes up. I'm immediately going to come over and give it a smart name, Import CSV Sales Data, and Enter. Now, just as we did last video, I'm going to come to the extension column. When I click on the filter, it gives me a unique list. And this time, I don't want the TXT. So now, I'm telling Power Query to always look at that folder and always bring in extension .csv. Right now, there's only three, right? But later, there'll be more, and Power Query will know what to do with them. So I click OK. We filtered. Now I come over to Content. Right-click Remove Other Columns. Now, because these are text files, it's easy to expand. We see our double downward pointing arrows, we click, and instantly, it looks like we got just what we wanted. We look over here, just like we did last video, and there's a bunch of cool steps, including taking the actual text field names and promoting them as headers. Power Query did all of that, including changing any data types. If we click here, we can see under Data Type, it's a whole number. Over for sales, it's a decimal number. Right here, that's the important one to look for, date and text. Now, any time you import text files, remember, the first file had field names that were promoted to column headers or fields. But there's some other column headers down there that did not get promoted and they're still there. One of the tricks we can do is to pick the column with the fewest number of unique records, and it is City. Click the filter. Because it's a big data set, you got to click on Load More, so it will look through that column and try and find all of the items for a unique list. And sure enough, City, somewhere way down at the bottom, there's an entire record filled with field names. When I uncheck and filter City on this column, it will properly remove the field names from the records further down in our data set. So when I click OK, boom, and there's that last filter. We have our name, our steps. By the way, we didn't look at it last video. You can come over to View, and Advanced Editor, and there's all the code. M is the name of this code. I'm going to click Done, Home, Close and Load. That will load it to the sheet. I want power over where it goes, so I say Close and Load To. Especially now, I do not want to put it into sheet, and I do not want it in a table. I'm saying Only Create a Connection. And now I check, Add This to the Data Model. Now we'll go over just a moment and look at a PowerPoint slide or two about what the data model is, but it is a behind-the-scenes columnar database, which is super efficient at storing data and doing analytics. When I click Load, and there it is. Import CSV sales data. So it's like almost 500,000 rows of data. Now, there's nothing in this sheet. That's in that columnar database. Now, I want to briefly go look at PowerPoint. What is the data model in Excel 2013? It's a behind-the-scenes in-memory columnar database that allows you to import large data sets, millions of rows, that would not fit in an Excel sheet. The data model stores the imported data, not an Excel sheet or a table, but in an in-memory columnar database. The bottom line is this columnar database is efficient at storing large amounts of data. It can dramatically reduce the file size, as we talked about earlier. Now, once we import into the data model, we can access the data model through a pivot table dialogue box. Now, that's what we're going to do this week. Next week, we'll look at what most people do. Most people use the data model in conjunction with PowerPivot, and that allows you to build relationships between tables and use a whole new formula language called DAX formula language. Now, I want to click on the next slide. There's a bunch of synonyms, PowerPivot database, columnar database, data model, Excel data model, PowerPivot xVelocity engine. So if you're reading, you may see any one of these as a synonym for our data model or columnar database. Now, on this third slide, this is slide 17, this is in essence what the columnar database does when it stores the data. It actually takes a data set with field names and records and it takes each field, stores it in a separate column with only a unique list. Then it builds a type of map of how these unique lists are related to the original transactional data set. Now, let's go back over to our Excel workbook. Now, notice that our query here has about 500,000 records. It would fit on a sheet, but again, the file size would be so big. So here, this would fit, but we're using the data model simply to store it and have a dramatically smaller file size. Now, how do we access it? No problem. Insert Pivot Table, or our keyboard, Alt-N-V. And we want to use an external data source. And it's not that it's outside somewhere. It actually is connected to this Excel workbook. It's just external to the sheets. It's that in-memory columnar database that's really attached behind-the-scenes to this Excel workbook. I click Choose a Connection. I'm going to go over to Tables. I'm going to select Import CSV Sales Data. That's our method of getting at that data model and making a pivot table from it. When I click Open, and OK, just like that, I get the fields from that table. We want Store ID Down to Rows. Sales over to values. Right-click Number Formatting, something like currency, no decimals. Click OK. We need a slicer, so I'm going to go up to Analyze. Insert Slicer. I want City, click OK. That's not a good label. Click there and Design, Report Layout, Show in Tabular. Now I want to make a column chart for this categorical data, this store ID. Insert Column 2D. Let's see, that's chart junk. I could keep these here, Right-Click, I'm going to hide all field buttons. I'm going to click on the chart title with that solid line. I'm not going to link it to the cell, I'm just going to type. And click Enter. Now I can close these tasks panes, and all we're going to do is that for now. We can certainly filter, but let's take a look. And in fact, I'm going to anticipate. I'm going to come up to Slicer, say two columns, and then pull this over to the side. All right. So now, what do we want to do? We want to save this and go over to our files. There's the original CSVs. But when you download this, you download the Import-01, and there's the Start and New. So we're going to grab the new one, Seattle and Tacoma new CD data. Control-C to copy, and Start file, Control-V. Now, when we refresh over in Excel, remember, the query says look at this folder and get anything with .csv. So when I come over here, let's try this. The keyboard is Control-Alt-F5. And sure enough, look at that, we got Seattle and Tacoma. No data set here, just our pivot table. So when I filter on Seattle, there's the new data. Look at that. All right. I'm going to un-filter. So in this video, we saw how to import multiple CSV comma-separated value files using Power Query, create a basic report, and then sure enough, when we update and that folder has new files, everything updates. Now, in our next video, we'll see how to import not text files, but actual Excel files. All right, we'll see you next video.
Info
Channel: ExcelIsFun
Views: 13,950
Rating: 4.970149 out of 5
Keywords: Excel, Microsoft Excel, Highline College, Mike Girvin, excelisfun, Michael Girvin, Mike excelisfun Girvin, Slaying Excel Dragons, Excel Magic Tricks, Business Analytics, BI 348, Data Analysis, Power Query Import Multiple Large CSV Files, .csv files, Import data Into Data Model, Excel data Model to reduce files size, Columnar Database, PivotTable, Slicer, Refresh, Only Create Connection, Add this to the Data Model, Use an external data source, Choose Connection
Id: LSjrrDRF0q0
Channel Id: undefined
Length: 11min 20sec (680 seconds)
Published: Mon Oct 19 2015
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.