[MUSIC PLAYING] Welcome to MSPTDA
video number 14. This is our Power
Pivot intro number two. And guess what? We're going to use Power
Query and Power Pivot. I thought this was
a Power Pivot intro! Well, guess what? Power Query is a
part of Power Pivot. Now, it's not really a part. But in the old days, when
Power Pivot first came out, we used the Power
Pivot user interface to go and import data. But we don't do that anymore. Now we use Power Query to go
and import or extract the data, and then we dump it into the
Power Pivot columnar database, and then build our DAX formulas
and data model pivot tables. So in this video, we
want to see a quick tip about how to import millions
of rows of data into Excel. Now, just like last video,
Power Pivot intro number one, this is a trick
simply for Excel. And the reason it's
so important is this-- we Excel people,
for decades, have been getting text files that
we need to import into Excel. Now, we'll go look
at one of these text files in just a second. But I want you to
notice something. There's 2018. Look at the file size-- 38 megabytes. If I were to import just
this one file, just the sales from 2018, into an Excel
file, the file size would be 30 megabytes. Now, look at this down here. This file is only 8.5 megabytes. And it has all of the text
data imported and appended into a single table, then
loaded to the data model, and we made our
pivot table report. And look at the file size. That is simply amazing. That means the
columnar database is very efficient at
taking lots of data and compressing it,
and then allowing us to make pivot table reports. If we go look at
one of these files, all the files have ISO date,
product, sales rep, units, discount, price, and sales. Now, in this video, we simply
want to take text files and append them
into a single table. So all we're going to use
Power Pivot for is a place to load the data. Then, when we make
our pivot table, we'll check, please use the
table in the data model. Now, this is the
name of the folder. It's a zip folder. You can download
it and unzip it. Also, as we've
talked about earlier in this class, when
we did Power Query, Power Query is case sensitive. So we're going to have
to deal with the fact that some of the file
extensions are capitalized. Some of them are not. We also want to filter out
any files that are not TXT. Last thing is, we have
to know the delimiter. The delimiter separating the
columns in the data is Tab. Now, we're over here
in the Start file. I'm going to go to the sheet PT. Now, we go up to
the Data Ribbon tab, and this is Power Query,
Get and Transform. Now, remember, I
have Office 365. So my icons may look
different than yours. But now we want to go to Get and
Transform, Get Data, From File. And because we have lots
of files in the folder, we want to use From Folder. Now, I'm going to
click on From Folder. Now I click Browse. I point Power Query to
the folder that contains all the files I want to import. Click OK. OK. Now we have a
preview of the files and various attributes
for those files. We do not want to click Combine. I do not want to click Load. I want to click Edit. And when I click Edit, the
Power Query editor opens up. This is the column that
contains the files. There's the file name. Now, extension, we need
to right-click, Transform, and Convert to Lower Case,
because Power Query is case sensitive. We want to filter
now to include text. But I only want one filter
that says .txt, lower case. So I'm going to click
the Filter dropdown. Notice this is just like Excel. We're using our Excel skills. Uncheck everything. Now, if you only
have text files, and you still want
to do this, then you have to go to Text Filters
Equals, and type it out. But because we have
more than one file type, I can use this
method right here. And when I click OK, the
same line of code is written. Now, if I use the Expand
button to append all of these, this column will be included. So because I don't
want that column, I'm going to right-click and
say Remove Other Columns. Now, when we click this
downward-pointing Combine Files button, and
we've seen this before. I'm going to click
and open queries. It will create a bunch
of queries for us, including a function. So I'm going to click the
double downward-pointing arrow. It first wants to know
what the delimiter is. So we're going to say Tab. Click OK. And now it'll build, from
the sample file, some M code. It will convert
it to a function. And then, in the table we
want, it invokes that function and does a number
of other steps. Now, we've already learned
all about the specifics of this process right
here earlier in the class. Now we have some transformations
that we want to do. And the first one is,
these are ISO dates. And as we saw two videos ago,
if we get ISO dates as numbers, we first need to convert
it to text, and then to dates, a two-step process. Now, notice, one of
the automatic steps was to change type. So when I click the
Data Type icon there, I'm first going to
convert it to text. And it's going to be polite. It already has a step,
and it's asking me, do I want to replace it? Yes I do. So I'm going to click Replace. Now, that step contains
that conversion to text. Now we click the Data Type
icon, and convert to date. And very importantly, we do
not want to click Replace, because we need that
previous step there, converting it to text. Then I click Add a New Step. And sure enough, we
have our proper dates. Now, one thing about
importing a lot of data is if we have columns we don't
need, we should delete them. If we're never going to
use units sold, discount, and price-- which we're not,
for this particular Excel workbook-- we
should delete them, because it's extra space
in our columnar database that we don't need to use up. So I'm going to
click on UnitsSold, hold Shift, click on Price,
right-click, Remove Columns. And now we have exactly
the data set we want. We've got to come over here, and
I should have done this first. We're going to name this. All text files, one
table, and Enter. Now, as it turns out, we have
over 3 million rows of data. So if I tried to close and
load by clicking this button, it would try to
load it to a sheet and say that it could not load. So we're going to click the
dropdown, Close and Load To. I do not want table. Even if I come down here and
click Add This to the Data Model, it'll try and do both. I have to click Only
Create a Connection. Add This to the Data Model. And when I click
OK, it will actually build a super efficient,
in RAM memory, columnar database, which
is part of the data model in Power Pivot. So I'm going to click OK. And now we can look in our
Queries and Connection pane, and sure enough, 3.1
million rows loaded. If I hover my cursor, I can
read Loaded to the Data Model. It even tells me the source. Now, this is the
Power Query pane. We can see up in
Queries and Connections, that button is highlighted,
which means that pane is open. Now, before we go look
at the data model, I'm going to click
Control-S to save. And if I go look at the file in
Windows Explorer, look at that. Seven megabytes. There's 3 million rows in that
Excel file, in Power Pivot, in the columnar database. Now sometimes, when
you're learning this for the first
time, it's confusing. I thought this was Excel! What are you talking
about, a columnar database? Well, if we go over to the sheet
Where is Columnar Database, this is Excel. Of course, Excel
has many tools-- formulas, Excel tables, standard
pivot tables, Power Query, and much more. Power Pivot is just one feature. Inside of Power Pivot,
there is a data model. And from that data model, we
make data model pivot tables. But of course, as we talked
about for the first time last video, the data
model is made up of the columnar
database, relationships, and DAX formulas. So we can look at a preview
of the columnar database, but we actually can't see
exactly how it's stored. Now, Data Ribbon tab, Manage
Data Model, or, of course, Power Pivot, Manage Data Model. So I'm going to
click, and over here in the Power Pivot for Excel
preview of the data model, we can see our table. And sure enough,
down there, it says that there's 3.1 million rows. Now, this is not the
columnar database. This is just a preview. I'm going to close this window. Now, as I said
just a moment ago, we actually can't go look
at exactly how the columnar database stores data. As a small example to help
us with the concept of how we take a table and compress
it to such a small size, here's a single table. We have three columns. Now, when we import this
table into Power Pivot data model, the actual
columnar database, it takes the individual
columns, and it only stores a unique list of
each one of the columns. It also builds a
sort of map that allows it to take the
data that's stored, that has a really small
size, and rebuild it when it needs to make a calculation. Now, the compression process
to create the columnar database is called Vertipaq compression. And it does all sorts
of amazing things, including proprietary
methods and techniques that only Microsoft knows. So we know that
it takes big data and it compresses it small. But just as a little glance,
let's look over here. If this is our column, one
of the methods it will use is, instead of storing
all this text data, it will actually store a
unique list with an integer value as the first column. And then, instead
of keeping these, it will store the integer
values in a column. This process right here is
called dictionary encoding. And it can take a single column
that takes up lots of space and reduce it down to
a much smaller size. Also, very importantly,
the internal engine, when making DAX calculations,
has a much easier time working on integer values
than it would on text values. Also, another
glimpse of what might happen when it's compressing the
data, here's that same column. But if there's lots
of repeat values, and it's sorted
in a certain way, it might take the
dictionary encoding, and if it detects lots
of repeated values, instead of storing the integer
column in the dictionary, it will do this step first. But then it will store
the dictionary, and just a list of the IDs for
each one of the items, and then how many times
each value is repeated. Now, that's probably
more information than we need to know,
especially at the beginning of this process. But columnar database
inside the data model, totally amazing, because
it takes a lot of data and compresses it down. Now, I want to go
over to the sheet Pivot Table, because we do
want to make a pivot table. Insert Pivot Table, or simply
use the keyboard, Alt-N-V. Because we have something in the
data model-- in fact, for us, a single table-- it assumes we want to use the
data model on the existing sheet. Click OK. Here's our table. I'm going to click to expand. Product down to Rows. And just like that,
hey, wait a second. It got a unique list,
probably because underneath, it had that unique
list already stored. Now I'm going to take SalesRep
and drag it above Products. There's each one of the
sales reps, and the products. Now, we talked about, for
the first time last video, an implicit measure. Well, remember this
is an Excel trick. All we did is take 7
million rows of text data, store it in the
columnar database. Now we're going to make
a simple pivot table, not actually authoring
our own DAX formulas. So when I drag Sales
down to the values, it will quickly calculate
the totals for each. That's called an
implicit measure. And next video, we'll get
to compare and contrast implicit measures to explicit
measures, which is just us creating the DAX formulas. Again, it's perfectly all
right for our job here-- import a lot of data,
make a pivot table-- to use implicit measures. Now I want to show you
one last important thing, and it involves
creating a new workbook. So I'm going to use the keyboard
Control-N. That opens up a new workbook. I'm going to use the F12 key to
open up the Save As dialog box. I'm going to call this file
014 import 1 text file. I'm going to save it. And now, what I
want to do is I want to import one text file
that's 38 megabytes, because Data, over to Get Data. For us earlier, we
had lots of files. And that From Folder is amazing. But remember, in Excel,
we get lots of text data. So from now on, we simply
use From File, Text. And I do not want to dump this
file into the actual worksheet 2020, because if I do, sure,
this particular file will fit. But why not put it
into the data model and have a dramatically
smaller file size? I'm going to come down
here and click Import. We need to make sure
the delimiter is Tab. Edit. I'm going to do my same
two transformations. Text, Replace, Date, Add New,
Delete UnitsSold to Price, right-click Remove Columns. Close and Load, Close and Load
To, Only Create a Connection, Add This to the Data Model. Click OK. Now, there's our 720,000 rows. We easily could have put
this into an Excel sheet. But from now on, don't do it. Put it in the data model. Control-S. We're not even going
to make a pivot table. We just want to go
look at file size. And there it is. It's, like, 1.5 megabytes. That is the power of Power
Query and Power Pivot data model columnar database. Now, over here in
the PDF notes, we can now define a
columnar database. It is a behind-the-scenes, in
RAM memory, efficient, big data analytics database. Now, if we break this down,
the behind-the-scenes simply means, when you
open the Excel file, the columnar database is
opened behind the scenes, and is stored in RAM memory. Efficient big data, that
simply means the database encodes and compresses
data and stores it in a structure
that allows our DAX formulas to calculate quickly,
and produces a small file size. Analytics database, that
simply means the database is specifically designed to work
with the DAX formula language to make calculations
quickly on big data. There are also many synonyms
for columnar database. You might hear columnar
database, Vertipaq engine, SSAS tabular, storage engine,
or XVelocity analytics engines. They all mean, we're
storing a lot of data in a compressed
form, and we're going to try and make calculations
quickly on that database. Also, you might be wondering,
why the name Power Pivot? The name came because
Microsoft wanted to use the same amazing
pivot table user interface to drag
and drop fields, to make reports,
but with more power. Now, the "Power"
part of the name means we can make pivot
tables from big data. We can make pivot tables
from multiple tables. And we can use DAX
formulas, which can process over big
data efficiently, and which allow us-- uh-oh, I spelled that wrong-- which allow us more
varied calculations than standard pivot tables. The "Pivot" part, of course,
comes from "pivot table." All right. In this video, we saw how to
use Power Query, Get Data, From File, Text File, or a
folder filled with text files. Import it into the Power
Pivot columnar database, so that we could import millions
of rows of data into Excel, or simply have a
small file size. Now, if you liked that video,
be sure to click that thumbs-up, leave a comment, and subscribe,
because there's always lots more videos
to come from Excel Is Fun, including our next
Power Pivot intro number three, where we're going to have
a comprehensive introduction to all the features
in Power Pivot. All right. We'll see you next video.