Welcome to BI 348
class video number 27. Hey, if you want to
download this file, BI 348 chapter 2.5 Excel
Data Transformation Methods, the Excel file, or you want to
download the PowerPoints, click on the link below the video. Hey, this chapter is
not in the textbook, and it is-- well, wait a second. What's this whole series? It's business analytics. And what do we always start
with when we're doing analytics? Well, we almost always start
with some data set that may or may not be in
a proper data set, and it may or may
not be in Excel. So of course, one of the
most important topics is learning how to import
clean and transformed data so we get it into Excel
and it is in a form that we can use to
do our analytics. Now, in our PowerPoints, I'm
going over just a few slides. I want to jump over to Excel. Hey, our topics-- we're going to
look at Excel built-in features to clean and transform
data, like formulas, the amazing flash fill text
to columns, and power query. Then we'll talk
about importing data. And for importing, cleaning,
and transforming data, there's nothing more amazing
and powerful than power query. And so that will be
the bulk of this week. We'll just be using power query
to get the right type of data so we can do whatever
it is we'd like. And we'll briefly talk
about data dashboards, and we'll introduce the
data model in Excel 2013. Now, here's a list of
all of the power query. Because besides
this first video, all the rest of the
videos are going to be topics about power query. And there they are. There's nine topics. Actually, there's 10,
because the first one will involve two amazing examples. Now let's go over to Excel. And now, this
video is just going to be how to use
regular features, like formulas, text
to columns, flash fills, and even power query to
deal with data that you already have in your Excel workbook. Later videos will
be all about how to go and get data
from various sources in [INAUDIBLE] et cetera. Now, here's our first example. We're given a data set. It's a simple
questionnaire that we've sent out-- where do
you currently live, and where would
you like to live? And if I Control, Down Arrow--
so 100, almost 150 answers. Now, here's how the
data set comes in. If I build a pivot
table, it's not going to have labels that are
very helpful for the viewer of the report. So here's our little legend. BC means Big City. C means City, R, Rural,
S, Suburb, T, Town. And so we would really
like to have these labels in our pivot table. No problem. I'm going to add
some helper columns. And I did a little
formatting there. And simply use--
well, hopefully when you see this situation here,
some stuff-- here's a legend. This is what we really want. You should immediately think
this is a lookup situation. So I'm simply going
to type equals VL Tab. And our lookup value will
be the particular letter. And notice that's a
relative cell reference. When I copy down, it moves. When I copy it to the
side, it moves over there. Comma, the table-- I'm
only going to highlight the first two columns. F4 to lock it, comma,
column index number. The second column
has the thing I want to go and get and
bring back to the cell. So I type a 2 comma. And although this
column is sorted here, we could use approximate. I'm still going to
use exact match just in case this is not sorted. 0 for false, close parentheses. Control, Enter, copy it over to
the side, and then double click and send it down. Control, Down Arrow, and F2 just
to make sure we got it right. It looks like it's working. Control, Home. Now I'm going to
highlight just these two columns from my pivot table,
Control, Shift, Down Arrow, Alt, N, V. And I want
it on a new sheet, so I'm going to hit Enter. Now I'm going to
drop where would you like to live down the rows. There's instantly a unique list. I drag it over to Values. Now it's counting. I come up to Design, Report
Layout, Show in Tabular to get my field names,
over to Analyze. And I want to, in the filter
group, insert a slicer. And I'm going to say currently
live, click OK, and there. Actually, let's do
one better, really. We really should
have percentages. I'm going to drag length to live
down to Values, right click. Show Values has this
amazing list here. Percent of column total
so each one of these is expressed as a percentage
of the column total. And there we have it. When I say big
city for my filter, it looks like 33% say I
want to stay where I am. 24% say I want to go to a town. I click on City. 44-- wow-- in the city. That's a big number. And you can see the
percentages for each. Now I'm going to come down here
and double click this sheet and call this something like
survey report and Enter. So we talked about V
lookup-- very simple. We used a formula to
transform our data into something more useful. Now let's go over to spaces. And here's a common problem. Now, I made an
exaggerated version here where there's all
sorts of extra spaces. And if you didn't know this
and you built a pivot table and you dropped that
field into the row areas, you would get lots of
different Gigi pretties. Because look, there's--
oh my heavens, a bunch of spaces there. There's some spaces
from the beginning. So sometimes, you
have data, and you need to remove extra spaces. Now, what is the
definition of extra spaces? Because look--
right here, it looks like, wow, I have a few spaces
between first and last name. Extra spaces are any spaces
except for a single space between words. Now, one way to do this is to
simply use the Trim function. No, it doesn't
give you a haircut. No, it doesn't
put you on a diet. It simply removes extra spaces. I'm going to click
on the two cells to my left, Control, Enter,
Double Click, and send it down. Click in the top cell, Control
Shift, Down Arrow-- watch this. Right click, and I'm going
to hit the Format painter. Scroll all the
way up to the top, and then paste just
the formatting. Clean Sales Rep will
be our field name. Now I'm going to click
on the D and point to the edge with my move
cursor and click and drag. Because certainly, that
is one awesome option. But watch this. I'm going to copy and paste here
and then delete the formula. Because here's a formula. And I better add
some green here. And the advantage to formulas
is simply that they update. If that data changes, this
will automatically update. Not only that, but
in this situation, sometimes I think I can
type trim faster than I could do some other feature. But again, formulas
are not the only way. And in fact, this is probably
the most amazing way. I simply type clean sales
rep as my field name, and I'm going to
use flash field. Now, a flash field only
works in Excel 2013 or later. I'm going to type the correct
name-- Battalion, space, just a single space, and pretty. Now, what I'm doing if
I'm using Flash Fill-- and in fact, any time you type
next to a dataset like this, Flash Fill will try
to interpret what you type as an example
of the correct data type. And Microsoft calls this
programming by example. Because Excel will try
to, behind the scenes, build a little program
to fix the data. But watch this. When I hit Enter-- and now I'm
going to type Gigi-- instantly, Flash Fill gives
me this ghost list. And because I typed a name
that exists right there with no spaces, with
a single example, it guessed, and it
looks like it's correct. And when I hit Enter,
that is amazing. Now, Flash Fill doesn't always
work with a single example. Sometimes, you have to give
it two or three examples. Control asterisk, and I'm
going to add a border. Now, a lot of the examples we
do in this particular section, I'm not going to build reports. We're just going to see
how to clean the data. So very clearly, there
are two examples. Now let's go over to ISO dates. Now, oftentimes, you get dates
in this form-- year, month, day. And we need to get them to
proper serial number dates. Now, there's a few
ways we can do this. There are a lot of different
formulas that will do this, and I'm going to show
you my favorite one here. Now, this is not
the fastest method. However, again, why
do we have formulas? In case this data changes, the
formula will instantly update. And we're actually going
to use the text function. Now, the text function takes
numbers or numbers that are stored as text, and
with a custom number format, will convert it to that custom
number format as a text item. Now, I'm going to say hey,
look over there, comma. And the number formatting
we're going to use-- and this is really
kind of a hack. We're going to use a number
formatting that is usually only use for Social Security. I'm going to say 0000, because
there are four digits over here that represent year, dash
00, for the month, dash 00. Those are for the days. Now, of course, the text
function converts it to text. And when I enter
this, that is text, and that is not
going to help us. But F2-- any time we have
a number stored as text, we can do any math
operation on it, and it will instantly
convert it back to a number in the
math operation. I'm going to use this plus 0. And check that out. That's the serial number. Control 1, and now we can
apply whatever date number formatting we want. You've got to be kidding me. Now I'm going to actually move
this over here for a second, add some green, double
click and send it down. Highlight that whole column, and
then move it over a little bit. Control, Up Arrow. And we could go to
any one of these and see, sure enough,
it has given us exactly the right date. Now, again, that's
not the fastest way. Here is the fastest way. And I'm not going to
highlight the text. I'm going to highlight
the actual date. Control, Shift, Down Arrow. And it is text to columns. And this has been
around forever-- if data ribbon, text to columns. It doesn't matter. We're just going
to go Next, Next. Because here's the
magic button to click. We want to say ours are stored
as year, month, and day. Now, notice it says A2. And oftentimes, you absolutely
don't want those there. So you'd simply
click Finish, and it would dump all of the serial
number dates into this column, replacing it. But just to illustrate,
I'm going to paste it off to the side, like in E2. And when I click
Finish, you gotta be kidding me-- text to columns. So if you're just in a hurry,
that is the fastest way. And text to columns
goes way back. Now, a more recent way that
we can do this is power query. And again, I wouldn't
use power query if I just have this data set
and I needed to do it quickly. But I want to show you
that power query will do it pretty easily. Because oftentimes, when
you're importing data, right from the outset before you
even get the data into Excel, the dates are coming like this. So power query has no problem. And actually, it's almost
as easy as text to column. So watch this. I'm going to highlight that, and
I'm actually going to copy it over to the side, because you
cannot use power query to get data from Excel
unless it's an XL, which means you have to
go insert and convert it to a table or Control T.
And we'll actually do this in an example coming up. Or you've actually at least
one time turned on the filter. Now, watch this. See it says Control, Shift,
L. I'm just going to Control, Shift, L. And watch this. I'm going to turn it off. So my trick-- if I'm
using power query and I don't need Excel
table feature which would create dynamic
ranges, I just want to quickly do
something in power query. I just Control, Shift, L, L.
For that flash of a second, that whole dataset got
turned into filter. So watch this. When we go up to power query,
we're going to use from table. Because Excel data almost
always comes from a Excel table feature, so the
ranges are dynamic. But no problem. We use this button, and it
will respect that filter. And there they are right there. But notice over here, it
took the actual sheet name and said filter database. That's pretty cool. I'm going to leave it like that. All you have to do in power
query is click on the column, transform. And data type, I
want to say date. And you gotta be kidding me. That is amazing. Now we have to close and load 2. And we're going to do this
a lot in this chapter- close and load 2. I'm selecting the bottom one. Close and load just puts it as
an Excel table on a new sheet. I'm going to say close and load,
because I want to choose Table, and then I can choose
whether to put it on existing sheet
or a new worksheet. Later, we'll see the amazing
only create a connection. Existing worksheet--
I'm going to select, click that button there,
and I want to put it in J1. Click OK. Click Load. And there we have
our workbook query. I'm going to scroll over,
and there are our dates. And now I'm going to close
this, control and roll. Clearly, there are
multiple ways-- formula if you want it, dynamic,
that great-- in essence, a social security-type custom
number formatting plus zero. Text to columns is the
fastest, the absolute coolest. And then, of
course, power query. Now I want to go over to the
sheet-- count yes and no. Now, here we have a field
that's city and region. And we really want a city here. Formula, Flash Fill, Text to
Columns-- they all can do it. Now, if you wanted to
do it with a formula, notice that it looks
like the first eight characters is Oakland. The first four
characters are Reno. So from the left, we need
eight characters and then four. Well, we could use
the left function, but we need a dynamic
number of characters. Search function is the
perfect function for this. Equals s, and there's
a search and a find that will both search
for a particular character, like comma, and tell
you the position. Search happens to not
be case sensitive. So I'm searching for finding
the text in double quotes. I always want to find a comma. End double quotes. Comma within-- boom, that
relative cell reference. Close parentheses. When I hit Control, Enter, it
tells me it found comma at 8. Double click and send it down. Right here, it found comma at 5. Now we can use that formula
result inside of the left, because the left
will take characters from the left-- 8, 5. Oh, wait a second. We need to subtract one,
the active cell at the top. I'm going to hit F2
and amend this formula, because I don't really want 8. I want minus 1. Now I'm going to populate this
edited formula all the way down with Control and Enter. And now we have exactly
from the left 7, 4. That will work. F2, and right at the beginning,
after the equal signs, left. It wants the text. Boom. Comma. There's the number
of characters. I come to the end. Close parentheses. Control, Enter to populate
that all the way down. Now, formulas are
much more complicated than either one
of these methods. But again, that's dynamic. Flash Fill. Oh my heavens. Watch this. Oakland, Enter. As soon as I type in an R, my
ghost Flash Fill list comes up. I have given it two
examples-- a full give me everything before the comma,
and then as soon as I type an r, it's guessing that I want
everything before the comma. And when I hit Enter,
that is amazing. We can also use text to
columns, because remember, Flash Fill is only in
Excel 2013 or later. Control, Shift, Down Arrow. And I'm going to use the
keyboard here for text to columns in 2007 and later. Alt, A, E. Delimited,
we definitely have a comma which separates
the two fields in [INAUDIBLE]. I'm going to click Next. We do not want Tab. We want to say comma. And look at that. Now we're not going to need
this second column here. So when we go Next, I can
simply click on the region. Do not import. Not only that, but if I
replaced it right there-- I'm not going to replace it. I'm going to put it
in Destination M2. And when I click Finish,
exactly the same thing as Flash Fill and R formulas. Earlier versions, you
want a quick and easy way, text to columns. Flash fill 2013 or
later, and dynamic, use that left and search. Now we want to go over and
look at a much more complicated situation. I want to go over
to Description. And here it is. We got a data dump into
our spreadsheet that has only the description
from the transaction. And it looks like we have
the department or category, the actual product. Then we have a date, and
we have a sales number. And we need to break this apart. Now, we can definitely
do this with formulas, but it's going to be
pretty complicated. The real answer here,
if you have power query, is to use power query. However, we do want
to look at formulas to get the date and the number. It's important that we are
familiar with text functions. Now, in the prerequisite for
this class, Business 218, if I go over to
the Note sheet, we covered all these different text
functions for extracting data. These are also over
in the PowerPoints. Now let's try and get
at the date first. I added some field names
and some formatting. Now, if we're going
to get the date, notice the pattern is
definitely dash dash. So if I were to use the Search
function and look for this, I might get that. But you know what? I'm actually going to use
the substitute function and find the second occurrence
of this dash and get rid of it. That'll make the actual
text easier to deal with. So I'm going to look
at the substitute. And all it does is you
say, hey, what the text is, comma, what the old text is
that you're searching for, end double quotes,
then dash, comma, the new text is going to be
double quote, double quote-- that's the syntax for nothing--
inside of substitute, comma. And then look at that. Not all text functions
give you the options for instance number--
instance 1, and instance 2. So I'm going to put 2, close
parentheses, Control, Enter. I'm actually going
to widen this. And then double click
and send it down. And you can see, sure enough,
it's missing that second dash. Now I'm going to
amend this, because I want to use the mid function. And the mid function lets
us take from the middle part of a text string. I'm going to use the
mid, and then it's going to ask me where is
the starting position. And I'll use Search
to find that. And then guess what? The date is always going
to be 10 characters long. So F2, come here, MID function. The text-- that substitute
is our new text, comma. I need to find the dash. And notice the actual dash
is in the same location in the original and
in our new text. So the start number, S Tab,
and for find text for search, I'm going to put
in double quotes a dash, comma, within
this text right here. Notice again, we're
allowed to do that, because the dash is in both
text strings in exactly the same location. Close parentheses. If we were having a start
number, dash is there. We're going to need to add 1
and 2 to get to the 0, plus 2, comma. Number of characters--
it's always going to be 10. I think it's 10. Yeah. That's 10. So 10, close parentheses,
Control, Enter, double click and send it down. We can see it's text, so
I'm going to F2, plus 0. Any math operation will
convert it back to a number. Control, Enter to populate
that edited formula all the way down. Control 1 to open up Format
Cells, Date, click OK. Wow. That was pretty complicated. Now, Sales-- notice it's
always going to be at the end. But really, we can search
for that dollar sign since it's the only one here. And we're going to use not
substitute, but replace. Now, replace is
slightly different. We'll give it that text. It needs a start number
and number of characters. Now, the start
number, I'm always going to start at the beginning. So I'm going to type
just a 1, comma, and this number of characters,
so from 1 all the way up to the dollar sign. So search, and I'm
searching for a dollar sign in double quotes, comma,
within this text right here. And what this will do--
number of characters-- it will say from 1 all
the way up to that, whatever position
that is, and we're going to take that
increment from 1 to whatever and replace it with double
quote, double quote. That will be the syntax
we're using inside of replace to say don't put anything. Close parentheses. Control, Enter. Wow. That's pretty cool. And notice we always have
our default alignment to the left as a visual cue
immediately saying that's text. So F2 plus 0 Control, Enter,
double click, and send it down. Control, 1, and I'm going to
do something like currency. Click OK. Now, that is definitely
quite complicated. And the only time we'd
want to do something like that or
something like this is when we want it to be dynamic. Now we want to see
the power query. I'm going to click
on the top cell, Control Shift, Down Arrow,
Control, C, Arrow, Arrow, Arrow right here, Control,
V. Because remember, if we're going to
use power query, this needs to be either a
filtered data set or a table. I'm in a Control, T to
convert it to a table and hit Enter to
get that OK button. I absolutely want to go
design and up to Properties. I want to name this Or. If you do this a lot and
you're using power query, ALT, J, T, A will get you
right up to the properties. Give me a name, and I'm going
to call this Descriptions, and Enter. Now I can go up to
Power Query from table. And there is our
descriptions name. It took for our query name. We're going to click on this. And hey, I think pretty much,
we can use home split column. We have-- it looks like a
colon and a space delimiter. We've got some dashes. So I think this is
going to be quite easy. We'll say split by delimiter. And for the first one, we're
going to get at that colon, so I'm going to click the
drop down and point to Custom. Right here, I'm going
to say Colon, Space. And certainly, this is pretty
cool, different than text to columns. We have at leftmost
delimiter, at rightmost, and each in currents . We only have one, so I simply
need to put-- actually, let's make them a stake here. Watch this. This is another amazing
feature about PowerPoint. I'm going to put just
a Colon, click OK, and it will obey
me and split it. But whoa. There's an extra space there. And here's something awesome. I don't know how many times I've
used this learning power query. You can simply x
this out, x this out. And notice it did two steps. And it will totally remove
those and get us back. So now when I click the column,
Split, Delimeter, Drop Down, Custom, colon, and a space--
I'm putting two in there-- when I click OK,
that is beautiful. No extra spaces there. Now let's click on this column. And I want to split it
with a space, dash, space. And notice the date
and the sales amount are separated by the same
series of delimiters. Split by Delimiter,
Custom, space, dash, space. And when I click OK, boom. There it is, split apart. Now, we want to make sure
that each one of these has the correct data type. So data is usually the one
that gets us in trouble. Those are both text. Now we can double click
and name the columns. Department, double click. Product, double click. Date, double click. Sales, and Enter. Now we have our steps. We have our name. Close and load to table. We're going to dump it
into our spreadsheet. Later, we'll see this awesome
option here for data model. We're going to put it
on this existing sheet and click the Collapse button. And we'll put it
somewhere like G1. Click OK. Click Load. And sure enough, there it is--
68,000 records split apart into our four fields. Power Query certainly is easier. And we want to
check out the fact Power Query is dynamic also. Now, this dataset over
here-- if I come over and change this first
transaction to 1,000 to 2,000, when I hit Enter, formulas
are instantaneous. There we go. Now, over here, if we
change this to 2 2, oh, it doesn't update. No, no. It's just not instantaneous. You have to come and
right click Refresh. And it actually goes
through all of the steps of taking the data, transforming
it, and bringing it over Here. Now, actually, this is
sort of a silly example, because the only time
you're going to get a data set like this is
from a data dump where this is the description
from a transaction or something like that. But nevertheless, we saw
how to, with formulas, extract data from, in this
case, the middle and end, and then we saw how to take
the same exact descriptions. Power query made it a
lot easier to transform all of the columns. So in this video, this first
video for this cleaning and transforming
data section, we saw how to do a power query and
some formulas on a description. We saw back on the
count yes and no. We saw how to do formulas,
Flash Fill, and text to columns to get just the city
from City and Region. Back on I SO dates,
we saw how to use text to columns, formulas,
and Power Query, noting that if it's a
one-time deal, text to columns is pretty fast and easy. We saw how to use Flash
Fill and the Trim function to get rid of extra spaces. And then back over here on
unhelpful labels, sometimes we get data, and if
we make a pivot table, those are not good labels. So we simply used V Lookup
to help transform our data. Now, next video is going to be
our first of many Power Query videos about
importing, cleaning, and transforming data. We'll see you next video.