MSPTDA 14: Power Pivot Intro #2: Amazing Columnar Database Importing Millions of Rows Data to Excel

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[MUSIC PLAYING] Welcome to MSPTDA video number 14. This is our Power Pivot intro number two. And guess what? We're going to use Power Query and Power Pivot. I thought this was a Power Pivot intro! Well, guess what? Power Query is a part of Power Pivot. Now, it's not really a part. But in the old days, when Power Pivot first came out, we used the Power Pivot user interface to go and import data. But we don't do that anymore. Now we use Power Query to go and import or extract the data, and then we dump it into the Power Pivot columnar database, and then build our DAX formulas and data model pivot tables. So in this video, we want to see a quick tip about how to import millions of rows of data into Excel. Now, just like last video, Power Pivot intro number one, this is a trick simply for Excel. And the reason it's so important is this-- we Excel people, for decades, have been getting text files that we need to import into Excel. Now, we'll go look at one of these text files in just a second. But I want you to notice something. There's 2018. Look at the file size-- 38 megabytes. If I were to import just this one file, just the sales from 2018, into an Excel file, the file size would be 30 megabytes. Now, look at this down here. This file is only 8.5 megabytes. And it has all of the text data imported and appended into a single table, then loaded to the data model, and we made our pivot table report. And look at the file size. That is simply amazing. That means the columnar database is very efficient at taking lots of data and compressing it, and then allowing us to make pivot table reports. If we go look at one of these files, all the files have ISO date, product, sales rep, units, discount, price, and sales. Now, in this video, we simply want to take text files and append them into a single table. So all we're going to use Power Pivot for is a place to load the data. Then, when we make our pivot table, we'll check, please use the table in the data model. Now, this is the name of the folder. It's a zip folder. You can download it and unzip it. Also, as we've talked about earlier in this class, when we did Power Query, Power Query is case sensitive. So we're going to have to deal with the fact that some of the file extensions are capitalized. Some of them are not. We also want to filter out any files that are not TXT. Last thing is, we have to know the delimiter. The delimiter separating the columns in the data is Tab. Now, we're over here in the Start file. I'm going to go to the sheet PT. Now, we go up to the Data Ribbon tab, and this is Power Query, Get and Transform. Now, remember, I have Office 365. So my icons may look different than yours. But now we want to go to Get and Transform, Get Data, From File. And because we have lots of files in the folder, we want to use From Folder. Now, I'm going to click on From Folder. Now I click Browse. I point Power Query to the folder that contains all the files I want to import. Click OK. OK. Now we have a preview of the files and various attributes for those files. We do not want to click Combine. I do not want to click Load. I want to click Edit. And when I click Edit, the Power Query editor opens up. This is the column that contains the files. There's the file name. Now, extension, we need to right-click, Transform, and Convert to Lower Case, because Power Query is case sensitive. We want to filter now to include text. But I only want one filter that says .txt, lower case. So I'm going to click the Filter dropdown. Notice this is just like Excel. We're using our Excel skills. Uncheck everything. Now, if you only have text files, and you still want to do this, then you have to go to Text Filters Equals, and type it out. But because we have more than one file type, I can use this method right here. And when I click OK, the same line of code is written. Now, if I use the Expand button to append all of these, this column will be included. So because I don't want that column, I'm going to right-click and say Remove Other Columns. Now, when we click this downward-pointing Combine Files button, and we've seen this before. I'm going to click and open queries. It will create a bunch of queries for us, including a function. So I'm going to click the double downward-pointing arrow. It first wants to know what the delimiter is. So we're going to say Tab. Click OK. And now it'll build, from the sample file, some M code. It will convert it to a function. And then, in the table we want, it invokes that function and does a number of other steps. Now, we've already learned all about the specifics of this process right here earlier in the class. Now we have some transformations that we want to do. And the first one is, these are ISO dates. And as we saw two videos ago, if we get ISO dates as numbers, we first need to convert it to text, and then to dates, a two-step process. Now, notice, one of the automatic steps was to change type. So when I click the Data Type icon there, I'm first going to convert it to text. And it's going to be polite. It already has a step, and it's asking me, do I want to replace it? Yes I do. So I'm going to click Replace. Now, that step contains that conversion to text. Now we click the Data Type icon, and convert to date. And very importantly, we do not want to click Replace, because we need that previous step there, converting it to text. Then I click Add a New Step. And sure enough, we have our proper dates. Now, one thing about importing a lot of data is if we have columns we don't need, we should delete them. If we're never going to use units sold, discount, and price-- which we're not, for this particular Excel workbook-- we should delete them, because it's extra space in our columnar database that we don't need to use up. So I'm going to click on UnitsSold, hold Shift, click on Price, right-click, Remove Columns. And now we have exactly the data set we want. We've got to come over here, and I should have done this first. We're going to name this. All text files, one table, and Enter. Now, as it turns out, we have over 3 million rows of data. So if I tried to close and load by clicking this button, it would try to load it to a sheet and say that it could not load. So we're going to click the dropdown, Close and Load To. I do not want table. Even if I come down here and click Add This to the Data Model, it'll try and do both. I have to click Only Create a Connection. Add This to the Data Model. And when I click OK, it will actually build a super efficient, in RAM memory, columnar database, which is part of the data model in Power Pivot. So I'm going to click OK. And now we can look in our Queries and Connection pane, and sure enough, 3.1 million rows loaded. If I hover my cursor, I can read Loaded to the Data Model. It even tells me the source. Now, this is the Power Query pane. We can see up in Queries and Connections, that button is highlighted, which means that pane is open. Now, before we go look at the data model, I'm going to click Control-S to save. And if I go look at the file in Windows Explorer, look at that. Seven megabytes. There's 3 million rows in that Excel file, in Power Pivot, in the columnar database. Now sometimes, when you're learning this for the first time, it's confusing. I thought this was Excel! What are you talking about, a columnar database? Well, if we go over to the sheet Where is Columnar Database, this is Excel. Of course, Excel has many tools-- formulas, Excel tables, standard pivot tables, Power Query, and much more. Power Pivot is just one feature. Inside of Power Pivot, there is a data model. And from that data model, we make data model pivot tables. But of course, as we talked about for the first time last video, the data model is made up of the columnar database, relationships, and DAX formulas. So we can look at a preview of the columnar database, but we actually can't see exactly how it's stored. Now, Data Ribbon tab, Manage Data Model, or, of course, Power Pivot, Manage Data Model. So I'm going to click, and over here in the Power Pivot for Excel preview of the data model, we can see our table. And sure enough, down there, it says that there's 3.1 million rows. Now, this is not the columnar database. This is just a preview. I'm going to close this window. Now, as I said just a moment ago, we actually can't go look at exactly how the columnar database stores data. As a small example to help us with the concept of how we take a table and compress it to such a small size, here's a single table. We have three columns. Now, when we import this table into Power Pivot data model, the actual columnar database, it takes the individual columns, and it only stores a unique list of each one of the columns. It also builds a sort of map that allows it to take the data that's stored, that has a really small size, and rebuild it when it needs to make a calculation. Now, the compression process to create the columnar database is called Vertipaq compression. And it does all sorts of amazing things, including proprietary methods and techniques that only Microsoft knows. So we know that it takes big data and it compresses it small. But just as a little glance, let's look over here. If this is our column, one of the methods it will use is, instead of storing all this text data, it will actually store a unique list with an integer value as the first column. And then, instead of keeping these, it will store the integer values in a column. This process right here is called dictionary encoding. And it can take a single column that takes up lots of space and reduce it down to a much smaller size. Also, very importantly, the internal engine, when making DAX calculations, has a much easier time working on integer values than it would on text values. Also, another glimpse of what might happen when it's compressing the data, here's that same column. But if there's lots of repeat values, and it's sorted in a certain way, it might take the dictionary encoding, and if it detects lots of repeated values, instead of storing the integer column in the dictionary, it will do this step first. But then it will store the dictionary, and just a list of the IDs for each one of the items, and then how many times each value is repeated. Now, that's probably more information than we need to know, especially at the beginning of this process. But columnar database inside the data model, totally amazing, because it takes a lot of data and compresses it down. Now, I want to go over to the sheet Pivot Table, because we do want to make a pivot table. Insert Pivot Table, or simply use the keyboard, Alt-N-V. Because we have something in the data model-- in fact, for us, a single table-- it assumes we want to use the data model on the existing sheet. Click OK. Here's our table. I'm going to click to expand. Product down to Rows. And just like that, hey, wait a second. It got a unique list, probably because underneath, it had that unique list already stored. Now I'm going to take SalesRep and drag it above Products. There's each one of the sales reps, and the products. Now, we talked about, for the first time last video, an implicit measure. Well, remember this is an Excel trick. All we did is take 7 million rows of text data, store it in the columnar database. Now we're going to make a simple pivot table, not actually authoring our own DAX formulas. So when I drag Sales down to the values, it will quickly calculate the totals for each. That's called an implicit measure. And next video, we'll get to compare and contrast implicit measures to explicit measures, which is just us creating the DAX formulas. Again, it's perfectly all right for our job here-- import a lot of data, make a pivot table-- to use implicit measures. Now I want to show you one last important thing, and it involves creating a new workbook. So I'm going to use the keyboard Control-N. That opens up a new workbook. I'm going to use the F12 key to open up the Save As dialog box. I'm going to call this file 014 import 1 text file. I'm going to save it. And now, what I want to do is I want to import one text file that's 38 megabytes, because Data, over to Get Data. For us earlier, we had lots of files. And that From Folder is amazing. But remember, in Excel, we get lots of text data. So from now on, we simply use From File, Text. And I do not want to dump this file into the actual worksheet 2020, because if I do, sure, this particular file will fit. But why not put it into the data model and have a dramatically smaller file size? I'm going to come down here and click Import. We need to make sure the delimiter is Tab. Edit. I'm going to do my same two transformations. Text, Replace, Date, Add New, Delete UnitsSold to Price, right-click Remove Columns. Close and Load, Close and Load To, Only Create a Connection, Add This to the Data Model. Click OK. Now, there's our 720,000 rows. We easily could have put this into an Excel sheet. But from now on, don't do it. Put it in the data model. Control-S. We're not even going to make a pivot table. We just want to go look at file size. And there it is. It's, like, 1.5 megabytes. That is the power of Power Query and Power Pivot data model columnar database. Now, over here in the PDF notes, we can now define a columnar database. It is a behind-the-scenes, in RAM memory, efficient, big data analytics database. Now, if we break this down, the behind-the-scenes simply means, when you open the Excel file, the columnar database is opened behind the scenes, and is stored in RAM memory. Efficient big data, that simply means the database encodes and compresses data and stores it in a structure that allows our DAX formulas to calculate quickly, and produces a small file size. Analytics database, that simply means the database is specifically designed to work with the DAX formula language to make calculations quickly on big data. There are also many synonyms for columnar database. You might hear columnar database, Vertipaq engine, SSAS tabular, storage engine, or XVelocity analytics engines. They all mean, we're storing a lot of data in a compressed form, and we're going to try and make calculations quickly on that database. Also, you might be wondering, why the name Power Pivot? The name came because Microsoft wanted to use the same amazing pivot table user interface to drag and drop fields, to make reports, but with more power. Now, the "Power" part of the name means we can make pivot tables from big data. We can make pivot tables from multiple tables. And we can use DAX formulas, which can process over big data efficiently, and which allow us-- uh-oh, I spelled that wrong-- which allow us more varied calculations than standard pivot tables. The "Pivot" part, of course, comes from "pivot table." All right. In this video, we saw how to use Power Query, Get Data, From File, Text File, or a folder filled with text files. Import it into the Power Pivot columnar database, so that we could import millions of rows of data into Excel, or simply have a small file size. Now, if you liked that video, be sure to click that thumbs-up, leave a comment, and subscribe, because there's always lots more videos to come from Excel Is Fun, including our next Power Pivot intro number three, where we're going to have a comprehensive introduction to all the features in Power Pivot. All right. We'll see you next video.
Info
Channel: ExcelIsFun
Views: 31,226
Rating: 4.9800501 out of 5
Keywords: Excelisfun, Highline College, Data Analysis, Business Intelligence, BI 348, Mike Girvin, Microsoft Power Tools for Data Analysis 14, MSPTDA 14, Power Query, Power Pivot, Data Model, Excel, Power Pivot Columnar Database, Data Model Columnar Database, Power Pivot Data Model Columnar Database, What is Columnar Database, xVelocity Engine, Vertipaq, Vertipaq Compression, How does Columnar Database work?, Import Millions of Rows into Excel, Columnar Database explained
Id: Nmn6qUJvWWY
Channel Id: undefined
Length: 19min 9sec (1149 seconds)
Published: Fri Oct 12 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.