Hey, on today's agenda, we got the top ten Excel functions
you're going to need as a data analyst. Let's go! Excel is one of the best tools out there
for people getting to grips with data analytics. In this video, we're going to show you how you can boost your efficiency
and productivity by showing you the top ten Excel functions
for working with data analytics. So before we dive into the functions,
let's just take a little look at the dataset we'll be working with. We'll be using the same dataset
for every single function in the series. And what we have in front of us
is a list of rows which describe information about a series of stores
located across America. For each store,
we can see for a variety of different days the temperature,
fuel price, CPI, unemployment rates, as well as a couple of other pieces
of information about the store itself. Well, that's a brief introduction
to the dataset. Now, let's crack on with the functions! Let's start our list of functions
by looking at the "IF" statement. The "IF" function runs a logical test
against one cell or a variety of cells that you can select and returns a "true"
or "false" evaluation of those cells. So it's really handy
if you want to work out if a variety of cells
will meet a condition or not. In this case,
I'm going to write a function in column M that evaluates the temperature for every
single store and every single day to work out whether that temperature
is too hot or not. Because there are laws in the U.S. which regulate
whether employees are allowed to work in stores given particular temperatures. So I'm going to create a new column in column M and I'm going to call it "Too Hot", and I'm going to write an IF statement
in every single row in that column. And that IF statement
takes three parameters. The first parameter is a logical test,
something that evaluates to "true" or "false". In this case, I'm going to check
is the temperature in column G greater than 76? Write a comma
and move on to the next parameter that tells Excel
what to do if the value is true. So if the value is greater than 76, I'm
going to write in a text that this temperature is too hot,
then a comma, and then the value. If it's false and if it's not hotter
than 76, it's okay to work. Then I complete the function
with a bracket and you can see that it's evaluated -
this temperature is okay to work. I can double click on cell in the bottom right
hand corner and will auto complete for me. And we can see that there are indeed
some examples of temperatures which are too hot. So that's the IF statement in practice. See if you can find a use case
for yourself! Next, let's look at the UNIQUE function. The UNIQUE function is a function
that looks through a list or a collection of items and basically
removes all the duplicates from it. So what you get in return
is just a list of unique items in that. So I have a series of rows
in this dataset, but you can see that the store is repeated
several times, one row for each day. So if I wanted to summarize
the store numbers or the store cities or in fact anything about the store, this UNIQUE function is going to come in
super handy. So why don't I create a summarization
of the store numbers over in column O? So I'm going to just title this column
"unique store numbers", and then in column O I'm going to write
=UNIQUE and select the UNIQUE function. You can see this takes one mandatory
parameter, which is the array in question and I'm going to pick column
A, I don't want to pick the entire column because I don't want the column name
"Store" included. So I'm just going to scroll from the top,
from A2, I'm going to scroll right down to the bottom of the dataset,
which is, in this case, 111. Now I can just write a closing bracket
to complete the function. And if I scroll back up to the top,
I'll see that it's auto completed. All the unique store numbers for me. Okay, are you still with me so far? Great! Now let's take a look at the "AVERAGEIF"
statement. The AVERAGEIF statement is a function
that will calculate the average of a series of numbers across one column or multiple columns,
depending on an IF statement. So depending on the evaluation
of a "true false" statement, and we've already worked with the
IF statement, so we should understand how that works. Let's incorporate that logic
into the AVERAGEIF statement. In this case,
I'm going to use the AVERAGEIF statement to calculate the average temperature
for each unique store in our dataset. We've previously calculated unique stores
by using the UNIQUE function so I can create a column next
to this unique store number. I'm just going to go ahead
and maximize that. So the column sizing is right. I'm going to add a new column in P called "Average Temperature Per Store". And this is where the AVERAGEIF statement
will come in handy, I can say AVERAGEIF, go down and select it. Now the first two
parameters are mandatory, as we can see. They are the range and the criteria. The first parameter for AVERAGEIF
is the range parameter and that's the range of values
that I want to use in my IF statement. So in this case, I'm looking to only calculate
the average temperature for each store. So the range of IF values for me
is the store values, right? That's the thing
that I'm filtering my temperature by. The next parameter is the criteria. So for that range of store values, what value will I be picking out to filter
the results of my temperature? And in this case,
I'm going to filter by this store number. And the third and final parameter
is the range that I'm going to use and the average calculation. Because I'm calculating the average temperature,
I could either pick column G or column H, but I'm going to pick column G,
which is the Fahrenheit values. And if I complete that, I will get the average temperature for just store number one,
which is the store in Denver and towards a complete
for the rest of the stores, I can just double click in the bottom
right hand cell. Now if you want those numbers
to look a little neater because they're quite messy right now,
there are lots of decimal places - just go ahead, right
click and format the cell and let's pick a number with two
or one decimal places behind it. In fact, why not just go down to one
to make things a bit easier? And now we have a pretty simple Fahrenheit
average calculation store. Let me just close by saying
I've shown you AVERAGEIF, but it's not the only sort of mathematical calculation
you can do based on an IF statement. You also have functions
like COUNTIF, AVERAGEIF, and SUMIF are hugely useful for data analysts because it allows you
to filter large amounts of numerical data by conditions around that data
and allows you to aggregate based on various features of your dataset,
which allows you to summarize information usefully and present
those findings to your wider business. Great for a data analyst, right? Well done for staying with me
with all those IF statements. Now let's take a look at CONCATENATE. Right? Spoiler alert, concatenate is just a fancy
way of saying join together. So in the CONCATENATE function, we can join together
the values of several text columns into one column, and this is useful
for a variety of reasons. In data
analytics, you'll often find yourself trying to summarize information
or combine information that's stored in relational databases or lots of Excel
tables across multiple columns. So let me give you an example. Here I can see we have the manager's first name
and surname stored in separate columns. So let's
go ahead and make a column in column N, and we'll call that column
the "manager name" and then we can get rid of manager
first name, my manager surname. So I'll go ahead and write a
title in here. Manager name. Double click to resize it. So let's use the CONCATENATE function
and in Excel this is called =CONCAT. I can see that
it takes multiple parameters, the first column you want to combine,
followed by the second column you want to combine, and so on. You can combine
as many columns as you like. So let's go ahead and combine the manager's first name
with the manager's second name. Then we'll see what that looks like. Well, it looks okay,
but you can see that there's no space between the first and last name. So why don't we add our own space
manually in that function? And now when I go to complete,
I can see that the first and last names are spaced
nicely. So don't forget, CONCAT allows you
not just to combine text from various columns,
but also to add in your own pre-written text if you wish. So now we have the manager name in column N
and we could go ahead and delete columns D and E. But don't forget,
we still have a function in column N and that function would error
if we got rid of the original text in columns D and E. So two ways
you can get around that. One would be to copy
and paste column N as values. So that's paste special as values, the other way,
which the way I'm going to do it is just to go ahead and highlight
D and E and hide those columns. All right. The data is still there,
but it's hidden from view, which is handy. Now, let's take a look
at the ISBLANK statement. The ISBLANK function will find all the nulls
or empty cells in your dataset. And this is a really useful function
for data analysts because often data analysts work with dirty data. That means data with missing values
or wrong values. And if you pass those missing values
into several of the upstream mathematical calculations, you get errors. So it's important to spot these problems at the beginning
of your analytics process. I'm going to write an ISBLANK function which will look at missing values
for any one of the columns in my dataset. And I'm going to start by looking at
the temperature in Fahrenheit of the data. So let me add a column next to temperature
Fahrenheit, which will check whether the Fahrenheit column
has any missing values. And I'll call this column missing temp question
mark, double click on it to expand it. In the first cell,
I'm going to write the ISBLANK function. The ISBLANK function takes one value,
which is the cell that you're looking to evaluate for blanks. In this case, I'm
looking at the temperature of Fahrenheit, so I'll just put H2 and then Autocomplete. We can see that this is "false". But if I autocomplete it
in every other cell and then filter by that column, I'll be able to only look at rows where that's true. And in fact, we can see now we do have
four rows where the Fahrenheit is missing. Let's look at these rows
with missing Fahrenheit to see if there are any upstream errors that have been caused by these
missing values being incorrectly applied. Yeah, well, I can quickly
see that in the column that I created using the IF function. Actually,
I don't think this is working correctly. Just because the temperature is missing
doesn't mean that it's okay to work. So why has this
IF statement not worked correctly? Well, let's think about this a little bit. The first argument,
if age 27 is greater than 76, evaluates to "too hot"
if true and "okay to work" if false. Well, clearly, if the value was missing,
then H27 is not going to evaluate to greater than 76
because there's no data. So it clearly evaluates to "false",
in which case we've written okay to work
and this obviously isn't quite right. Okay, so we can go ahead and change this
IF statement so that it pays attention to whether there's missing data before it calculates
whether the temperature is too hot or not. How are we going to change the statement? Well, remember the IF statement,
our first parameter is the logical test. So all I'm going to do here
is I'm going to change a logical test, so that instead of looking
at the temperature directly, the first thing it does is check
whether there is missing data. So I'm going to write a nested function
in Excel. And nested functions are something that you'll use a lot
as a data analyst or a data scientist. So now I'm going to add an
ISBLANK statement as my first parameter. This ISBLANK statement which evaluates
to "true" or "false" is going to check whether the temperature field is missing or not. Now, what happens if it's missing? That's the "true" statement. Well, if it's missing if it's true,
then we want to bring that to the attention of the data analyst. So let's print out
something like "Missing data!!". Now we need to work out what to do
if the value is "false". Well, if the value is false, we're going to go ahead and perform
our original calculation. And that was by using the
IF statement that we looked at originally. So I can now add another
IF statement in here that evaluates whether it's above 76 or below 76. I just need to add one more
closing brackets to this statement. Just so opening brackets and closing
brackets match. I could go ahead and complete that. And then when I auto populate
this data down, you can see that now we have in this column
next to the missing values, we have missing data in exclamation mark. So let's just finish this off
by adding a filter function that goes to check
all the instances of missing data. Great. We've definitely captured all
the instances of missing data, we're good! Okay, now we can look at the "left", "mid" and "right" statements
for passing text and numbers. "The left", "right" and "mid" functions enable you to extract subsections of text
from a text cell. So let's use this to extract
from the address column the zip code of every address. I'm going to set a new column
and I'm going to call this column the "zip code" column, and I'm simply going to use
the "right" function in this case, the "right" function takes two parameters
the text you're going to search through, and then the number of characters
that you want to extract from that text. So in my case, I'm going to look at the address
cell, that's the text I'm looking out. And the number of characters
I want to extract is the number of characters
in the zip code, which is five. So I'm going to say take the right
five most characters from the address, and that does indeed
look to be the zip code. Let me go ahead
and autopopulate the rest of the cells and we can see that that's worked nicely
for all of our addresses. You can also use the "left" in much
the same way as the right, except it works on the left of the cell. Or you can use the "mid" function to
extract text from the middle of your cell. Next, let's
take a look at the "length" function. The "length" function
or "len" function enables you to check the number of characters
in a given cell or column. This can be really, really useful for data
analysts when you're trying to check whether your data input
is standardized or not. In my case, I might find it useful
to check the length of the column "zip code" to make sure that all my zip codes
have five characters in it. So why don't
I just add a column next to zip code and check the length of the "zip
code" column? Once I've done that, I can auto complete
to fill out the rest of the cells. And now why don't I filter or at least check the filter to see
if we have any results outside of five? No, everything is five,
which means that at least zip codes are all five characters long. Before we move on, let's name this column. Now, let's take a look
at one of the most useful functions you're going to need to use in data
analytics: The VLOOKUP function. The VLOOKUP function is one of the most popular
functions for data analysts. It allows you to find things
in a large table or large range by row
based on some conditional value. In our case, I'd like to add some more information
to the average temperature of each store by, in addition to showing the store
number, showing the city of that store. And I can use the VLOOKUP function
to help me do that. Let's scroll over here
to the unique stores and firstly I'll expand the "average
temperature of the store" column out so it's fully formatted. And now I'm going to add another column
which I call "store name", and this will enable me to use VLOOKUP. So let's use the VLOOKUP function. The first parameter is the lookup value. In this case,
I'm going to use the unique store number as my key
to look up the rest of the information. The second parameter is the range of values in which the store and the extra information
that I want to find are located. So I know that I want to find the city
from the store number. So those two pieces of information
are stored right next to each other
at the beginning of our dataset. The next thing I'm going to use
is the "call index number". This basically means
how many columns shifted is the data that I want to show from the data
that I'm looking to find it up. Now because the store
and the city are next to each other, the city is going to have value 2. If I wanted to show the store ID itself, it would have value
1, that's essentially referring to itself. In this case, the nextdoor column
is the one I want, so it has value 2. And finally, the final parameter
is a "true false" statement. You can specify "true"
if you only want an approximate match done or specify "false" if you're only happy
with exact matches of the return value. In our case, it's safer to specify "false"
because we only want exact matches. So let's go ahead and do that. Let's say auto complete the statement
and go back to my original column. You can see that
Denver has been populated here. Simply copy
this formula down into the remaining cells for the VLOOKUP table function
to have been completed. Okay, now I'm going to show you
how you can sort your values using the SORT function. We can use the SORT function to sort any column of text or number
into ascending or descending sorted order. So we've got a lot of numerical values in our data set, for example,
the unemployment rate and column N and that changes all over the shop sometimes
it's lower, sometimes it's higher. Why don't we
look at the sorted unemployment rate by creating a new column W called "sorted unemployment". So let's go ahead and populate this column
with our sorted unemployment values. The SORT function
takes one monetary parameter and that's the array of values
that you want to sort. So I'm going to go ahead and select
all the values in the unemployment column. And once I selected them
and complete this function and now you can see that
we sorted our unemployment values by highest unemployment
to lowest unemployment. Why don't we
close by looking at the FILTER function? The FILTER function allows you to filter
one column of multiple columns by a filter term
and just show the filter results. In our case, why don't we filter
our unique stores that we previously calculated
the average temperature for by condition as to whether they are on average too
hot or not. So, I'm going to create a new table under the "unique store"
table called "Hot Stores", and I'm going to use a FILTER function
in this "hot stores" space. So let's create the FILTER function. The first parameter is the array
that we're going to filter. I want to filter every single unique store
that we've performed calculations on. So that would be my array. The next parameter is what filtering function
we're going to use. In our case. I want to look at whether the average
temperature is over a certain amount. So I'm going to select
the temperature column and I'm going to say
I want that temperature column to be greater than 54 degrees Fahrenheit. And actually,
that's all I need to include. The final if empty parameter is optional, once I select this and autocomplete,
I can see that my original cells have now been filtered
down. And all I'm looking at
is the two stores in Boston and Atlanta
where the average temperature is hot. And there you have it. These are all the ten Microsoft
Excel functions I have for you today. I know these are going to make you
a more efficient data analyst. Let us know in the comments below if there are anything we've missed or
if there's any other Excel related content you'd like to see in the future. For example, pivot tables. We didn't include those. At CareerFoundry, we have a free data
analytics short course, where you can start using these functions
straight away. If you'd like sign up,
the link will be in the description below. Or if you're just starting out
and you'd like to find out more about what a typical data analyst job looks like, why don't you check out this video
that we made a while ago where I take you through a typical day
in the life of a data analyst. It will give you some insights into what
tasks a data analyst might typically carry out, as well as what
some of their favorite tools and tips are. Check it out here and
see you in the next video!