Welcome to Highline BI348,
class video number 29. And if you want to
download this file, BI348 Chapter02.5 Import01
TextFiles start file or the finish file,
and don't forget, for these examples
in this chapter, here's the links at the site. You got to download that
zipped folder that has all the files we're importing. All right. Now, we want to start off by
going and looking at our files we want to import. Now, last video, we
imported these text files. Here we have text files, but
they're comma-separated values. And if I open this,
it looks like Excel, but it's really
comma-separated values. And if I Control down a
little, there's a lot of data. There's 250,000 rows
in this one workbook. So one of the main things
we're going to do in this video is see how easy it is to
get Power Query to import these in just a couple clicks. But we're not going to
import it into Excel, because if we do and I want
to go take a look at the file size, here's an
example of file size. If we imported this and
put it into a sheet, it would be 25 megabytes. If we instead of importing it
into Excel, use Power Query and do only a connection and
dump it into the data model, we'll get something close
to about two megabytes. All right. So we want to import
these using Power Query. I'm going to go over
to my Start file. And actually, this
Start file here, we actually used this last
video to import gray data and build a dashboard. But here on sales report, we're
going to go up to Power Query. And we're going to do the same
thing, From File, From Folder. We're going to
browse to our folder, and when you download, you
download Import-01 Tech CSV files, you've got to
unzip it, and then inside there's a Start file. Now, what we're telling
Power Query right now is to always look
at this folder. So Power Query will try
to import everything. We will filter out
what we don't want. When I click OK, click OK, the
Power Query editor comes up. I'm immediately going
to come over and give it a smart name, Import CSV
Sales Data, and Enter. Now, just as we
did last video, I'm going to come to the
extension column. When I click on the filter,
it gives me a unique list. And this time, I
don't want the TXT. So now, I'm telling Power Query
to always look at that folder and always bring
in extension .csv. Right now, there's
only three, right? But later, there'll be
more, and Power Query will know what to do with them. So I click OK. We filtered. Now I come over to Content. Right-click Remove
Other Columns. Now, because these are text
files, it's easy to expand. We see our double
downward pointing arrows, we click, and instantly,
it looks like we got just what we wanted. We look over here, just
like we did last video, and there's a bunch
of cool steps, including taking the
actual text field names and promoting them as headers. Power Query did all
of that, including changing any data types. If we click here, we
can see under Data Type, it's a whole number. Over for sales, it's
a decimal number. Right here, that's the important
one to look for, date and text. Now, any time you import text
files, remember, the first file had field names that were
promoted to column headers or fields. But there's some
other column headers down there that did
not get promoted and they're still there. One of the tricks we can
do is to pick the column with the fewest number of
unique records, and it is City. Click the filter. Because it's a big data set,
you got to click on Load More, so it will look
through that column and try and find all of the
items for a unique list. And sure enough, City, somewhere
way down at the bottom, there's an entire record
filled with field names. When I uncheck and filter
City on this column, it will properly
remove the field names from the records further
down in our data set. So when I click OK, boom,
and there's that last filter. We have our name, our steps. By the way, we didn't
look at it last video. You can come over to
View, and Advanced Editor, and there's all the code. M is the name of this code. I'm going to click Done,
Home, Close and Load. That will load it to the sheet. I want power over where it goes,
so I say Close and Load To. Especially now, I do not
want to put it into sheet, and I do not want it in a table. I'm saying Only
Create a Connection. And now I check, Add
This to the Data Model. Now we'll go over just a
moment and look at a PowerPoint slide or two about
what the data model is, but it is a behind-the-scenes
columnar database, which is super efficient
at storing data and doing analytics. When I click Load,
and there it is. Import CSV sales data. So it's like almost
500,000 rows of data. Now, there's nothing
in this sheet. That's in that
columnar database. Now, I want to briefly
go look at PowerPoint. What is the data
model in Excel 2013? It's a behind-the-scenes
in-memory columnar database that allows you to import large
data sets, millions of rows, that would not fit
in an Excel sheet. The data model stores the
imported data, not an Excel sheet or a table, but in an
in-memory columnar database. The bottom line is
this columnar database is efficient at storing
large amounts of data. It can dramatically
reduce the file size, as we talked about earlier. Now, once we import
into the data model, we can access the data model
through a pivot table dialogue box. Now, that's what we're
going to do this week. Next week, we'll look
at what most people do. Most people use the data model
in conjunction with PowerPivot, and that allows you to build
relationships between tables and use a whole new formula
language called DAX formula language. Now, I want to click
on the next slide. There's a bunch of synonyms,
PowerPivot database, columnar database, data model, Excel
data model, PowerPivot xVelocity engine. So if you're
reading, you may see any one of these as a synonym
for our data model or columnar database. Now, on this third
slide, this is slide 17, this is in essence what
the columnar database does when it stores the data. It actually takes a data set
with field names and records and it takes each field,
stores it in a separate column with only a unique list. Then it builds a type of map
of how these unique lists are related to the original
transactional data set. Now, let's go back over
to our Excel workbook. Now, notice that our query
here has about 500,000 records. It would fit on a sheet,
but again, the file size would be so big. So here, this would fit, but
we're using the data model simply to store it and have a
dramatically smaller file size. Now, how do we access it? No problem. Insert Pivot Table, or
our keyboard, Alt-N-V. And we want to use an
external data source. And it's not that it's
outside somewhere. It actually is connected
to this Excel workbook. It's just external
to the sheets. It's that in-memory
columnar database that's really attached
behind-the-scenes to this Excel workbook. I click Choose a Connection. I'm going to go over to Tables. I'm going to select
Import CSV Sales Data. That's our method of
getting at that data model and making a pivot
table from it. When I click Open,
and OK, just like that, I get the fields
from that table. We want Store ID Down to Rows. Sales over to values. Right-click Number Formatting,
something like currency, no decimals. Click OK. We need a slicer, so I'm
going to go up to Analyze. Insert Slicer. I want City, click OK. That's not a good label. Click there and Design, Report
Layout, Show in Tabular. Now I want to make
a column chart for this categorical
data, this store ID. Insert Column 2D. Let's see, that's chart junk. I could keep these
here, Right-Click, I'm going to hide
all field buttons. I'm going to click on the chart
title with that solid line. I'm not going to link it to the
cell, I'm just going to type. And click Enter. Now I can close
these tasks panes, and all we're going
to do is that for now. We can certainly filter,
but let's take a look. And in fact, I'm
going to anticipate. I'm going to come up to
Slicer, say two columns, and then pull this
over to the side. All right. So now, what do we want to do? We want to save this and
go over to our files. There's the original CSVs. But when you download this,
you download the Import-01, and there's the Start and New. So we're going to grab the new
one, Seattle and Tacoma new CD data. Control-C to copy, and
Start file, Control-V. Now, when we refresh
over in Excel, remember, the query says look at this
folder and get anything with .csv. So when I come over
here, let's try this. The keyboard is Control-Alt-F5. And sure enough, look at that,
we got Seattle and Tacoma. No data set here,
just our pivot table. So when I filter on Seattle,
there's the new data. Look at that. All right. I'm going to un-filter. So in this video, we saw
how to import multiple CSV comma-separated value files
using Power Query, create a basic report, and then
sure enough, when we update and that folder has new
files, everything updates. Now, in our next
video, we'll see how to import not text files,
but actual Excel files. All right, we'll
see you next video.