Welcome to E-DAB
video number two, Data Analysis and
Business Intelligence Made Easy with
Excel Power Tools. And in this video, we got
to talk about data, tables, sorting, filtering, and the
ever important logical tests. Now, for every
video in this class, you can download the file
and notice it says start. That means the file we're going
to use when we start the video, there is also the same file
name, but it says finished. That's the file you can
download as an example of what it looks like after we've
completed the video. Now, here are our
topics for this video. Also, for every
video in this class, these objectives are
always below the video. This is Excel Basics 1,
not our current video. But if you look below
our current video, you'll find this
show more button. When you click, there's a time
hyperlink table of contents. And we want to start off by
going to the sheet Excel. So I click on the sheet Excel. Now, just in case some
of you joining this class have never used
Excel, we're going to start off with just a brief
introduction to what is Excel. This section of
the video will also help us understand
how Excel interprets the raw data we enter into an
Excel spreadsheet for our data analysis. Now, the essence of Excel
is that there is a column. This is column I.
There is a row. This is row 4. And the intersection
is called a cell. That's the cell I4. Now, cells will allow us to
hold data, make formulas, pivot tables, and things like that. Now, all of the cells together,
that's called the worksheet. Now look down here. There's a bunch of different
worksheets, also known as sheets. I can click with my mouse on
data, data or information, proper data sets. Each one of these
is a new worksheet. Now, all of the
worksheets together, that's called the workbook. Now, the workbook or file
name is up in the title bar. Now, navigating
through these sheets, we can certainly use
our mouse and click. We can also use
this scroll arrow, click, click, and notice it
moves the sheets, but not the active sheet. I can click to scroll back. But here's the cooler trick,
and if you hover your cursor, you can see, it has
two helpful hints. You can use control left click
to scroll to the last sheet, or watch what happens
when we right click. That is amazing. And because we'll have
workbooks with lots of sheets, this is very handy. If I want to jump to homework
problem number one, I click. Click OK. And the active sheet
is now homework one. Right click, scroll up. Excel, click OK. Now, if we're going to start
putting things into the cells, we got to talk about
the amazing default alignment for data in Excel. Text is always
aligned to the left. Numbers, dates and times are
always aligned to the right. Boolean values,
trues and falses, are always centered
and capitalized. Now, if I type Excel,
as soon as I hit Enter, I know that Excel thinks that
this is text, because it's aligned to the left. I type a number, Enter. Look at that,
aligned to the right. I know the default alignment
is signaling that Excel thinks this is a number. Now, this default alignment
can be very helpful. Notice, numbers,
dates and times, all have to be
aligned to the right. If I were to enter 43..15
by mistake, a type in error, when I hit Enter, there's my
visual cue that Excel does not think this is a number. That will not work in formulas,
pivot tables, and so on. Dates and times are similar. If you don't enter
them correctly, Excel and the other tools will
not understand them as dates. So if by mistake,
I type 18/2/2019, let's change the
column width between I and J, hover my cursor,
there's the right cursor to click and drag. My particular regional
settings in my control panel expect month, day, and year. Now, some parts of the
world, the regional setting, would accept this as a date. Time is similar. If I enter 8:00AM and hit
Enter, there is my visual cue. Excel doesn't think
that's a time value. Now, of course, the
correct way to enter these is 43.15, Enter, 2/18/2019,
Enter, and then 8:00 space AM. Now, watch what happens, just
like these Boolean values, if I hit Enter, it
automatically capitalizes. Also if you enter
true or false, all lowercase it automatically
assumes it's a Boolean, capitalizes, and centers it. Now you might be wondering,
I thought we typed in 2019. What happened to the 20? Well, if you select the cell
and look up to the formula bar, look at that. Something's being
displayed on the surface of the spreadsheet that's
different than the actual item in the cell. That's our next important
topic, number formatting. If we go to the Home
ribbon tab, number group, this is all number formatting. Now let's select cell K13, 43. And instead of using
Enter, I'm going to use Control
Enter, because I want to put the thing in the cell
and keep the cell selected. Now, I immediately
want to look up to the formula bar, OK, two
digits, two digits, Enter, 43 Control Enter. Two digits, two digits. Now, I want to use
number formatting to display this number
with two decimals showing. So up in number format
in group, once and twice. We accomplished our goal. We have displayed this number
with a decimal and two zeros. If we look up in
the formula bar, we can see the underlying
number was not changed when we applied number formatting. As another example, if I type
the decimal 0.0145, Control Enter, that's a
tax rate, we would like to use a number formatting
to display it as a percentage. So I select a percentage
number formatting. Now if I decrease the
decimals, it's displayed as 1%, but number formatting,
none of this up here actually changes
the underlying number. We can see up in the formula
bar the 1.45 is still there. That number formatting didn't
change the underlying number. Now, why is understanding
number formatting as a facade, where the
number formatting displays a number in a certain way
without changing the underlying number, why is
that so important? Because if we're
dealing with this data, we have to understand that, yes
that may be displayed as 1%, but it's the 1.45 that will be
used in any calculation or data analysis. Now, that is misleading. So of course, when we're
applying number formatting, we want to make sure that it
is helpful, not misleading. Two other important
number formatting topics. Anytime you enter a date or,
time automatic date number formatting and time number
formatting will be applied. That's so we can do date
math, things like subtract two dates to figure out how
many days between two dates. One last what is Excel topic. You can think of Excel
is doing two main things, making calculations and
performing data analysis. Here's an example
F2 to put the cell in edit mode of someone
created a formula to calculate net income. That's not what we're going
to be doing in this class. We're going to be doing the
second one, converting raw data into useful information. Now, we need to talk about
data and how we store data. Let's go over to the sheet data. What is data? Raw data? It is data stored in
its smallest form that allows Excel data analysis
features and power tools to work. What that means is this
is not good raw data. Why? Because if we needed
to sort by zip code, this data has the full
address in one cell, rather than broken apart
into its smallest form. In this form, we can do things
like sort by zip code easily. As another example of
bad data, look at this. We have date, product, and
sales all in a single cell or a single column here. This is bad data. With this data, we
can't use the awesome built in features, like a pivot
table, to add sales by product. What we need is to
take this bad data and convert it into
data we can use, data stored in its smallest
form, date, product, and sales. Then, features like pivot table
can deal with this data set, and in just a few clicks, we can
have our product sales report. Now, the good news
is this bad data. Well, that's not the good news. But Power Query, as
a tool, that tool was invented to deal
with just this situation. With Power Query, we can
easily split this bad data into three columns. Now, let's go over
to the sheet D or I, because we do not want to
confuse the word data the word information. Data, that means the raw data. Information, that's what we
convert the raw data into. Data is the
unorganized raw data. Alone, it doesn't yield insight. Alone, it's not very useful. Information is organized
and presented data. This helps people to make
decisions, see patterns, and gain insight. Now it may seem obvious,
but lots of people use these words loosely. Data, to us, is the raw data. Information is
the final product. Our next topic is
proper data sets. I'm going to click on the
sheet proper data set. Here are the terms we
would like to define. Down here, we have two
examples of a proper data set. Now proper data
set, the synonyms are table or table of data. In most tools, we just
refer to them as a table. Now, the first most important
thing about proper data sets is fields or column headers
have to be in the first row. So here, we have a
transaction table. But very importantly
in the first row, I list the names of
the column headers, or what are called field
names, date, product, ID, units sold, and sales. Over here in our product
table, product ID, product, flight range,
and retail price. So every single proper
data set or table has field names in the first
row, fields, column headers, field names, columns,
variables, and attributes. Those are all synonyms for
what's in the first row. And if you think about it, they
help us define the data type. We're organizing our data. Date goes in this column,
product ID here, retail price here. Now, this table
and this table are sitting in the Excel worksheet. Later, we'll have to
import these proper data sets into either PowerQuery,
PowerPivot or Power VI desktop. And in those tools,
the data type has to be explicitly defined. Here in the Excel
spreadsheet, I don't have to specifically say this is
an integer, this as a decimal, this is text. But in those other tools we'll
have to explicitly define the data type. The next important
characteristic of a proper data set is we have rows. That row represents
a sales transaction. This row represents a product. Rather than referring
to these as rows, they're referred to as a record. This is a record
of a transaction. This is our record for
the particular product. So a proper data
set has field names in the first row, records
in subsequent rows. In all the other tools, this
table will be sitting alone. However, in an Excel
spreadsheet, notice, I'm allowed to put anything
I want into any cell. So when you have your proper
data set in an Excel worksheet, you have to remember to have
empty cells all the way around. If you have any
little bit of data, like some note
right here, next to, touching the proper
data set, that can create a problem when we're
trying to create a pivot table or filtering. So we don't want anything
around the data set. Now, the next term is delimiter. And I want to go back
over to the sheet data. A delimiter is a
particular character that separates bits of data. The delimiter here is a dash. So we clearly can see that
there are three different things here, date, product
name, and sales, separated by a delimiter. Later in the class, we'll
learn about text files. Text files are how one system
gets data to another system. And in those cases,
the delimiter will be a comma, or a tab. The last important thing about
proper data sets is naming. We want to name all tables,
fields, and other items smartly and consistently. I do not want a
name for this table, like table one or
columns that say column 1, column 2, column 3. A smart name for
a table like this would be sales transactions
or F transactions, a smart name for this table,
something like product or D product. On the sheet T not C,
I want to make sure that we always call
this proper data set a table and this
visual portrayal of these numbers, a chart. Yes, that seems obvious. This is the table. This is a chart. But you would be amazed how
many people call something like this a chart,
not us, table, chart. Let's go over to the
sheet, Excel tables. Here are two proper data sets
stored in an Excel worksheet. Now, any time we have proper
data sets in a worksheet, we want to convert the proper
data set to an official Excel table. Now, some data analysis
features will work when we have just a proper data set. But for example, if we go
to the Data ribbon tab, the get and transform
group is PowerQuery. We can't even get our data
from an Excel worksheet into PowerQuery unless we first
convert the proper data set to an Excel table. Similarly, PowerPivot
we can't use this button add to the PowerPivot
data model, unless they're
official Excel tables. Now, let's scroll down. Here's our two Excel tables. There are four
requirements for us to convert this proper
data set to an Excel table. Now, the first requirement is
that it is a proper data set. The second requirement
is you click in one cell, then you go up to the
Insert Ribbon tab, and then in the tables group,
you click on the Table button. Notice there's a
keyboard, Control T. So I have a single
cell selected, Control T. This brings up
the Create Table dialog box. Make sure it says my
table has headers. Remember, headers are a
synonym for field names. Click OK. The fourth requirement is that
we go to the Table Tools Design ribbon tab. And by the way, if I click
outside the table, that goes away, back inside
the table, there it is. The fourth requirement is that
we give our table a smart name. So in Properties, up in
Table Names text box, type little f and
then transactions. Now, one convention
in data analysis is when we have what's
called a fact table, the numbers in this
transactional sales table are called facts,
because those are the numbers we're going to
use to make our calculations. So one convention is to always
list our sales table or fact table with a little
leading lower case f. Our second table, it
is a proper data set. I click in a single
cell, Control T. I can click Enter to invoke OK. And we go up to
Design Properties. And here, since this is a
lookup table, oftentimes in data analysis, we call
this a dimension table. We're going to use
the naming convention where we have little d, and then
a smart name, like products. And Enter. Now, we have our
two Excel tables. And the reason
that we always want to take our proper data
set from an Excel worksheet and convert it to an Excel table
is because of dynamic ranges. That means if we add any records
or take any records away, anything pointing to
the Excel table object, we'll see those new
records or deleted records and can be updated. That's different than if we
just use a proper data set from an Excel worksheet. In that case, we'd
have to manually go and change our solution
to recognize the new data. Now, that's the reason that
PowerQuery and PowerPivot require that we convert
our proper data sets to Excel tables. As a simple example,
let's create a formula, because it doesn't matter. It could be one of the
power tools or anything else we might do, create a chart,
create a simple formula. They all update when we add or
delete records from our table. So I'm going to
create a formula, equal sum, and because this
is an Excel table, watch this. I can help her my
cursor, and when I see my downward
pointing black arrow, I click right above the
field name, and look at that. It puts in our table
formula nomenclature, the smartly named table, and our
field name in square brackets. Close parentheses and Enter. Now, let's add a new
record below this table, and see if our formula will
recognize the new data. Now, I'm in the last
cell in the last record. So I'm going to hit the Tab
key to add a new record, date. Now, I want to add
sunspot product 2005 Tab, and we sold 16 of them. Now there's our new record. Did our formula update? Yes, indeed it did. I click on the cell, hit F2,
and sure enough, it still looks the same. But it totally expanded. These dynamic ranges
inside an Excel table are like magic for
us data analysts. Now we understand why when
Microsoft invented PowerQuery and PowerPivot, they
forced us to put our data, if it's coming
from an Excel worksheet, into an Excel table. Now, Excel tables also
add some formatting, and very conveniently,
if you're going to work with the data
in the worksheet, we can easily sort
and filter, which we will see just in a moment. But I want to jump
over to our PDF notes and look at some other
examples of proper data sets. This is page 10
in our PDF notes. We want to talk about
data types and proper data sets and the various
tools we're going to use. Now, example number one. If we have a proper data
set in an Excel worksheet, we can declare what type of
data goes into each column by naming the column. But we don't have any way
to force only dates or only text in a particular column. So we're allowed, in
an Excel spreadsheet, to have dates and text
in the same column. The problem is this inconsistent
data will cause problems. For example, if we try to
group dates in a pivot table, we're not allowed because
there's inconsistent data. Remember, in data analysis,
it's all about that raw data. The same thing will happen
if we use an Excel table. If we have inconsistent data,
it is going to cause problems. Just like if we have the data
in the cells, in an Excel table we can name the columns, but we
can't force a single data type on a column. If we bring that same data set
into the PowerQuery editor, in PowerQuery, you are required
to declare a data type, date, whole number, decimal. By declaring a data
type, we are assuring that what we want and
expect in this column is going to be there when
we make a calculation. Here, because it was text,
it gives us an error. That tells us we have to go fix
the data or remove the error. If we take this data inside
the Power Query editor and load it to the
Excel sheet, then we'd get an empty cell,
where that error is. And that empty cell
will cause problems. If we bring that same table into
PowerPivot, look what happens. Everything shows up as text. Now we're going to
have big trouble, because in order for a date
to be used in calculation, it needs to be a number. Now, in PowerPivot,
just like PowerQuery, we'll have to explicitly
declare what type of data is in a column. In Power BI Desktop
we'll have to, just in PowerQuery
and PowerPivot, define the exact data
type for each column. Here, if we import that data,
we're going to get all text. And of course, that
will cause problems. So in all of our tools, when
we have our proper data set, we're going to have to be
careful about consistent data in each column. Two other proper data sets. Text files. This type of text
file, where there's a tab delimeter
between the field names and the bits of raw
data for each record, this is a common file
that we use to get data from one system to another. It is a proper data
set, but it's all text. When we use Power
Query to import this, we'll have to take the
columns, define the data type, and create a proper data set
that the analysis tools can use. Finally, it's common in
data analysis and business intelligence to refer to the
table that has the columns that we're going to
use in calculations. It's common to refer
to these as facts and refer to the whole
table as a fact table. Also, the lookup tables-- these are tables
with unique lists-- oftentimes columns from the
lookup table, or dimension table are used as conditions or
criteria for our calculations. So these types of tables
are called dimension tables. Both of these are
proper data sets. All right, in these PDF
notes from page 10 to 13, we saw eight examples
of proper data sets and talked about the
importance of data types. Now let's jump
back over to Excel and talk about
sorting and filtering. Now, sorting is a feature that
will be in all of our tools, and it can take a
particular column and sort it alphabetically
or numerically. Now, at the top of each
column, the field name will have a dropdown. If I click that,
I see A to Z and Z to A. Now, if you're
sorting for the first time, the fear is always if
I sort this column, will the records remain intact? And yes, in all
of our tools, when we're using our proper
data sets, when we sort-- and let's try sorting this A to
Z, click the dropdown, A to Z, that record remains intact, as
does all of the other records. Now we sorted a single
column, and all the records sorted together. Oftentimes, we want to sort
by more than one column. Now, if our goal
is to sort sales within sales rep, that
means as the final sort, I need alphabetically sales rep. And then for each sales
rep, in the sales column, I need a mini sort,
biggest to smallest. Now, sometimes the goal might
be communicated differently. For example, you might
hear, hey, sort the sales for each sales rep. Same thing. Now the key is that
when we're sorting by two or more columns,
whatever the final sort is, that's called the major sort. So for us, the major
sort is sales rep. If we're using these little
drop downs at the top, we always want to do
the major sort last. So I come to sales, and let's
do it largest to smallest. Instantly, every
number, from biggest to smallest, but notice, once
we jump over to sales rep and sort this A
to Z, for example, Gigi's records are already
in the perfect order from biggest to smallest. So when I come up to
sales rep and do A to Z, just like that Gigi's
records are together. But we have A to Z, and
then within each sales rep, I have biggest to
smallest, biggest to smallest,
biggest to smallest. Now, the next topic we want
to talk about is filtering, so let's go over to
the sheet filter. Now here's our
notes for filtering I'm going to scroll down. Now the goal of filtering
is to take a proper data set and specify some
conditions or criteria for extracting not all
the records, but just some of the records. As our first
example, our goal is to extract only the records
for the sales rep Alma. That means we have to look
through this column here, and only when Alma is the sales
rep do we extract the record, just as for sorting at
the top of the column, next to the field name
there'll be a drop down. And this filter
and sorting feature is in all the tools we use. Now, for filtering, there are
some special filters that we'll look at in just a second. But there will always be
a list of the unique items from the column that we're
allowed to select from. So if I want to select
just Alma, unselect all, select Alma. When I click OK, look at that, I
have just the records for Alma. Now, in an Excel
sheet, the rows that don't match the condition
or criteria are hidden. Over in PowerQuery, they'll
actually be removed. And if we go to Power BI
Desktop and PowerPivot, we don't use filtering as much,
except for as the secret trick to look in a particular
column to see exactly what the unique list is. All right, so here
in Excel, our goal is to copy and paste these. I'm going to click
in a single cell, and I want to use the keyboard
to highlight the entire table, control asterisk. Now, you can use the asterisk on
the number pad or Control Shift 8. Now, I'm going to use the
keyboard for Copy Control C. And look at that,
the dancing ants are dancing around just
the visible records. The hidden rows
will not be copied. Now I come over to the
sheet Alma records. I'm going to click in
some cell and Control V. Our goal has been accomplished. We used filtering based on
one condition on the sales rep column to filter and then
copy and paste our records. Come back over to filter. To unfilter, we click the Filter
icon and say Clear Filter. Now the dancing ants
are still dancing, so we have to click Escape. Now, whenever we
filter, we actually have to run a logical test. Now, for our first example,
we had to ask for each record, is the sales rep equal to Alma? In this case, we got true,
so we extracted the record. Then we went down
and said hey, is this sales rep equal to Alma? False. Now, that's a single condition. A lot of what we
do in data analysis has multiple conditions. So the first
logical test we want to talk about, where we have two
or more conditions or criteria is an and logical test. Our goal is to show records
for the sales rep Alma and the auto Chevy. That means we have to go
through every single record in this table and ask
two questions of two different columns. So for the first record, I say
hey, sales rep, are you Alma? True. Autos, are you Chevy? True. Because I get two trues,
I extract the record. Then I got to go
to the next one. Are you equal to Alma? False. Are you equal to Chevy? True. I only got one true, so that
record is not extracted. Here, I get a false,
false, not extracted. Down here, I get true false. That record is not extracted. So when we run an
and logical test, a record is included for
filtering, later for adding or counting only
when we get trues for all the and logical tests. Now, in our example, we
only have two questions, but we could have two or
three or four or many more. All right, so
filtering, it's easy. I just go to my drop down, and
from my unique list, I uncheck, and say, hey, I
want Alma, click OK. There's Alma. Now I've got to get
rid of all the autos that are not Chevy, so I click
the drop down and select Chevy Only, click OK. And there it is. Only one, two, three records
result from our and logical test. Now, if we're in Excel, and we
have multiple columns selected, we could actually go up
to Data and look at that. There's a clear. Otherwise, I'd have to go inside
each one of the drop downs and clear the filter. Now the next logical test
is an or logical test. And our goal for this
example is to show records for the sales rep Alma or Rina. Now, get this, because we're
asking two questions of one column, I actually
have to ask the two questions of the single cell. Hey, are you equal to Alma? True. Hey, are you equal to Rina? False. That record is
included, because all I need for an or logical
test is one or more trues. Now, when you're running
and/or logical test on a single column,
we're never going to get two trues, because the
sales rep can't simultaneously be Alma and Rina. But I go down to the next one. Are you equal to Alma? False. Are you equal to Rina? False. That one's not included. When I get down to a record with
Rina, are you equal to Alma? False. Are you equal to Rina? True. So for an or logical test,
it's at least one true. Now for filtering, we
uncheck everything, and we simply say, hey,
I want Alma and Rina. Now, I want you to
notice what I did there. That is how we
say it in English. We actually use
the word and here. That's what will confuse
people some time. And in fact, your boss will
come and say, hey, hey, I want to see all the
records for Alma and Rina. So when we're using
filter, we don't explicitly have to think of
the or logical test, because we're just checking and
picking the two items we want. But as you get further
into data analysis, you have to be
clear when you have multiple conditions
if it's an and logical test or an or logical test. So I'm going to
click OK, and just like that, I can see the
column has both Alma and Rina. Those are all the records that
match are or logical test. Now, we'll unfilter, and the
last filter we want to look at is over here for sales. For each data type, there
are specific filters, because it's a number. Look at that. There's a secret drop down. Equals, does not equal, greater
than, greater than or equal to, any of these. But look at this, top 10. And by the way, these filters
will exist in all of our tools, including pivot table reports. So I'm going to say top 10. Click the down arrow
and we'll select five. Click OK. And just like
that, I've filtered hiding all the records, where
the sales are not top five. Now I'm going to unfilter this. Now, that was our first
introduction to logical test. I want to go over to
the sheet logical test. Now for an and logical
test, we used filtering. We ask the question, hey
sales rep Alma and auto Chevy. We got two trues for
three different rows. So when we're filtering,
we show these rows. And then we extract them and
paste them somewhere else. But guess what, an and
logical test is used in lots of other situations. For example, this is
a pivot table report that we're going to build. That number right there
is the sum of sales, but it's not all the sales. It's adding just the sales for
auto Chevy and sales rep Alma. Now, internally,
inside the pivot table, the pivot table will have to
run that same and logical test on the entire data set. But when it does that, it will
find the same three records, but instead of
showing the records, it'll just take the
numbers and add them. Now, we'll see lots of
other and logical tests, or logical tests, and other
types of logical tests throughout this
data analysis class. Wow, that was an epic video. Here are the various topics
we covered in an introduction to what is Excel, what's
data, not confusing data and information, what's
a proper data set. Remember, don't confuse
tables and charts. We learned about the amazing
dynamic Excel tables. We learned how to sort, how
to filter and extract records, and we talked about
logical tests. All right, if you
like that video, be sure to click that
thumbs up, leave a comment, and subscribe, because there's
nine more awesome E-DAB videos. All right, we'll
see you next video.