The 10 Most Important Excel Formulas - Made Easy! (2023)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Hey, on today's agenda, we got the top ten Excel functions you're going to need as a data analyst. Let's go! Excel is one of the best tools out there for people getting to grips with data analytics. In this video, we're going to show you how you can boost your efficiency and productivity by showing you the top ten Excel functions for working with data analytics. So before we dive into the functions, let's just take a little look at the dataset we'll be working with. We'll be using the same dataset for every single function in the series. And what we have in front of us is a list of rows which describe information about a series of stores located across America. For each store, we can see for a variety of different days the temperature, fuel price, CPI, unemployment rates, as well as a couple of other pieces of information about the store itself. Well, that's a brief introduction to the dataset. Now, let's crack on with the functions! Let's start our list of functions by looking at the "IF" statement. The "IF" function runs a logical test against one cell or a variety of cells that you can select and returns a "true" or "false" evaluation of those cells. So it's really handy if you want to work out if a variety of cells will meet a condition or not. In this case, I'm going to write a function in column M that evaluates the temperature for every single store and every single day to work out whether that temperature is too hot or not. Because there are laws in the U.S. which regulate whether employees are allowed to work in stores given particular temperatures. So I'm going to create a new column in column M and I'm going to call it "Too Hot", and I'm going to write an IF statement in every single row in that column. And that IF statement takes three parameters. The first parameter is a logical test, something that evaluates to "true" or "false". In this case, I'm going to check is the temperature in column G greater than 76? Write a comma and move on to the next parameter that tells Excel what to do if the value is true. So if the value is greater than 76, I'm going to write in a text that this temperature is too hot, then a comma, and then the value. If it's false and if it's not hotter than 76, it's okay to work. Then I complete the function with a bracket and you can see that it's evaluated - this temperature is okay to work. I can double click on cell in the bottom right hand corner and will auto complete for me. And we can see that there are indeed some examples of temperatures which are too hot. So that's the IF statement in practice. See if you can find a use case for yourself! Next, let's look at the UNIQUE function. The UNIQUE function is a function that looks through a list or a collection of items and basically removes all the duplicates from it. So what you get in return is just a list of unique items in that. So I have a series of rows in this dataset, but you can see that the store is repeated several times, one row for each day. So if I wanted to summarize the store numbers or the store cities or in fact anything about the store, this UNIQUE function is going to come in super handy. So why don't I create a summarization of the store numbers over in column O? So I'm going to just title this column "unique store numbers", and then in column O I'm going to write =UNIQUE and select the UNIQUE function. You can see this takes one mandatory parameter, which is the array in question and I'm going to pick column A, I don't want to pick the entire column because I don't want the column name "Store" included. So I'm just going to scroll from the top, from A2, I'm going to scroll right down to the bottom of the dataset, which is, in this case, 111. Now I can just write a closing bracket to complete the function. And if I scroll back up to the top, I'll see that it's auto completed. All the unique store numbers for me. Okay, are you still with me so far? Great! Now let's take a look at the "AVERAGEIF" statement. The AVERAGEIF statement is a function that will calculate the average of a series of numbers across one column or multiple columns, depending on an IF statement. So depending on the evaluation of a "true false" statement, and we've already worked with the IF statement, so we should understand how that works. Let's incorporate that logic into the AVERAGEIF statement. In this case, I'm going to use the AVERAGEIF statement to calculate the average temperature for each unique store in our dataset. We've previously calculated unique stores by using the UNIQUE function so I can create a column next to this unique store number. I'm just going to go ahead and maximize that. So the column sizing is right. I'm going to add a new column in P called "Average Temperature Per Store". And this is where the AVERAGEIF statement will come in handy, I can say AVERAGEIF, go down and select it. Now the first two parameters are mandatory, as we can see. They are the range and the criteria. The first parameter for AVERAGEIF is the range parameter and that's the range of values that I want to use in my IF statement. So in this case, I'm looking to only calculate the average temperature for each store. So the range of IF values for me is the store values, right? That's the thing that I'm filtering my temperature by. The next parameter is the criteria. So for that range of store values, what value will I be picking out to filter the results of my temperature? And in this case, I'm going to filter by this store number. And the third and final parameter is the range that I'm going to use and the average calculation. Because I'm calculating the average temperature, I could either pick column G or column H, but I'm going to pick column G, which is the Fahrenheit values. And if I complete that, I will get the average temperature for just store number one, which is the store in Denver and towards a complete for the rest of the stores, I can just double click in the bottom right hand cell. Now if you want those numbers to look a little neater because they're quite messy right now, there are lots of decimal places - just go ahead, right click and format the cell and let's pick a number with two or one decimal places behind it. In fact, why not just go down to one to make things a bit easier? And now we have a pretty simple Fahrenheit average calculation store. Let me just close by saying I've shown you AVERAGEIF, but it's not the only sort of mathematical calculation you can do based on an IF statement. You also have functions like COUNTIF, AVERAGEIF, and SUMIF are hugely useful for data analysts because it allows you to filter large amounts of numerical data by conditions around that data and allows you to aggregate based on various features of your dataset, which allows you to summarize information usefully and present those findings to your wider business. Great for a data analyst, right? Well done for staying with me with all those IF statements. Now let's take a look at CONCATENATE. Right? Spoiler alert, concatenate is just a fancy way of saying join together. So in the CONCATENATE function, we can join together the values of several text columns into one column, and this is useful for a variety of reasons. In data analytics, you'll often find yourself trying to summarize information or combine information that's stored in relational databases or lots of Excel tables across multiple columns. So let me give you an example. Here I can see we have the manager's first name and surname stored in separate columns. So let's go ahead and make a column in column N, and we'll call that column the "manager name" and then we can get rid of manager first name, my manager surname. So I'll go ahead and write a title in here. Manager name. Double click to resize it. So let's use the CONCATENATE function and in Excel this is called =CONCAT. I can see that it takes multiple parameters, the first column you want to combine, followed by the second column you want to combine, and so on. You can combine as many columns as you like. So let's go ahead and combine the manager's first name with the manager's second name. Then we'll see what that looks like. Well, it looks okay, but you can see that there's no space between the first and last name. So why don't we add our own space manually in that function? And now when I go to complete, I can see that the first and last names are spaced nicely. So don't forget, CONCAT allows you not just to combine text from various columns, but also to add in your own pre-written text if you wish. So now we have the manager name in column N and we could go ahead and delete columns D and E. But don't forget, we still have a function in column N and that function would error if we got rid of the original text in columns D and E. So two ways you can get around that. One would be to copy and paste column N as values. So that's paste special as values, the other way, which the way I'm going to do it is just to go ahead and highlight D and E and hide those columns. All right. The data is still there, but it's hidden from view, which is handy. Now, let's take a look at the ISBLANK statement. The ISBLANK function will find all the nulls or empty cells in your dataset. And this is a really useful function for data analysts because often data analysts work with dirty data. That means data with missing values or wrong values. And if you pass those missing values into several of the upstream mathematical calculations, you get errors. So it's important to spot these problems at the beginning of your analytics process. I'm going to write an ISBLANK function which will look at missing values for any one of the columns in my dataset. And I'm going to start by looking at the temperature in Fahrenheit of the data. So let me add a column next to temperature Fahrenheit, which will check whether the Fahrenheit column has any missing values. And I'll call this column missing temp question mark, double click on it to expand it. In the first cell, I'm going to write the ISBLANK function. The ISBLANK function takes one value, which is the cell that you're looking to evaluate for blanks. In this case, I'm looking at the temperature of Fahrenheit, so I'll just put H2 and then Autocomplete. We can see that this is "false". But if I autocomplete it in every other cell and then filter by that column, I'll be able to only look at rows where that's true. And in fact, we can see now we do have four rows where the Fahrenheit is missing. Let's look at these rows with missing Fahrenheit to see if there are any upstream errors that have been caused by these missing values being incorrectly applied. Yeah, well, I can quickly see that in the column that I created using the IF function. Actually, I don't think this is working correctly. Just because the temperature is missing doesn't mean that it's okay to work. So why has this IF statement not worked correctly? Well, let's think about this a little bit. The first argument, if age 27 is greater than 76, evaluates to "too hot" if true and "okay to work" if false. Well, clearly, if the value was missing, then H27 is not going to evaluate to greater than 76 because there's no data. So it clearly evaluates to "false", in which case we've written okay to work and this obviously isn't quite right. Okay, so we can go ahead and change this IF statement so that it pays attention to whether there's missing data before it calculates whether the temperature is too hot or not. How are we going to change the statement? Well, remember the IF statement, our first parameter is the logical test. So all I'm going to do here is I'm going to change a logical test, so that instead of looking at the temperature directly, the first thing it does is check whether there is missing data. So I'm going to write a nested function in Excel. And nested functions are something that you'll use a lot as a data analyst or a data scientist. So now I'm going to add an ISBLANK statement as my first parameter. This ISBLANK statement which evaluates to "true" or "false" is going to check whether the temperature field is missing or not. Now, what happens if it's missing? That's the "true" statement. Well, if it's missing if it's true, then we want to bring that to the attention of the data analyst. So let's print out something like "Missing data!!". Now we need to work out what to do if the value is "false". Well, if the value is false, we're going to go ahead and perform our original calculation. And that was by using the IF statement that we looked at originally. So I can now add another IF statement in here that evaluates whether it's above 76 or below 76. I just need to add one more closing brackets to this statement. Just so opening brackets and closing brackets match. I could go ahead and complete that. And then when I auto populate this data down, you can see that now we have in this column next to the missing values, we have missing data in exclamation mark. So let's just finish this off by adding a filter function that goes to check all the instances of missing data. Great. We've definitely captured all the instances of missing data, we're good! Okay, now we can look at the "left", "mid" and "right" statements for passing text and numbers. "The left", "right" and "mid" functions enable you to extract subsections of text from a text cell. So let's use this to extract from the address column the zip code of every address. I'm going to set a new column and I'm going to call this column the "zip code" column, and I'm simply going to use the "right" function in this case, the "right" function takes two parameters the text you're going to search through, and then the number of characters that you want to extract from that text. So in my case, I'm going to look at the address cell, that's the text I'm looking out. And the number of characters I want to extract is the number of characters in the zip code, which is five. So I'm going to say take the right five most characters from the address, and that does indeed look to be the zip code. Let me go ahead and autopopulate the rest of the cells and we can see that that's worked nicely for all of our addresses. You can also use the "left" in much the same way as the right, except it works on the left of the cell. Or you can use the "mid" function to extract text from the middle of your cell. Next, let's take a look at the "length" function. The "length" function or "len" function enables you to check the number of characters in a given cell or column. This can be really, really useful for data analysts when you're trying to check whether your data input is standardized or not. In my case, I might find it useful to check the length of the column "zip code" to make sure that all my zip codes have five characters in it. So why don't I just add a column next to zip code and check the length of the "zip code" column? Once I've done that, I can auto complete to fill out the rest of the cells. And now why don't I filter or at least check the filter to see if we have any results outside of five? No, everything is five, which means that at least zip codes are all five characters long. Before we move on, let's name this column. Now, let's take a look at one of the most useful functions you're going to need to use in data analytics: The VLOOKUP function. The VLOOKUP function is one of the most popular functions for data analysts. It allows you to find things in a large table or large range by row based on some conditional value. In our case, I'd like to add some more information to the average temperature of each store by, in addition to showing the store number, showing the city of that store. And I can use the VLOOKUP function to help me do that. Let's scroll over here to the unique stores and firstly I'll expand the "average temperature of the store" column out so it's fully formatted. And now I'm going to add another column which I call "store name", and this will enable me to use VLOOKUP. So let's use the VLOOKUP function. The first parameter is the lookup value. In this case, I'm going to use the unique store number as my key to look up the rest of the information. The second parameter is the range of values in which the store and the extra information that I want to find are located. So I know that I want to find the city from the store number. So those two pieces of information are stored right next to each other at the beginning of our dataset. The next thing I'm going to use is the "call index number". This basically means how many columns shifted is the data that I want to show from the data that I'm looking to find it up. Now because the store and the city are next to each other, the city is going to have value 2. If I wanted to show the store ID itself, it would have value 1, that's essentially referring to itself. In this case, the nextdoor column is the one I want, so it has value 2. And finally, the final parameter is a "true false" statement. You can specify "true" if you only want an approximate match done or specify "false" if you're only happy with exact matches of the return value. In our case, it's safer to specify "false" because we only want exact matches. So let's go ahead and do that. Let's say auto complete the statement and go back to my original column. You can see that Denver has been populated here. Simply copy this formula down into the remaining cells for the VLOOKUP table function to have been completed. Okay, now I'm going to show you how you can sort your values using the SORT function. We can use the SORT function to sort any column of text or number into ascending or descending sorted order. So we've got a lot of numerical values in our data set, for example, the unemployment rate and column N and that changes all over the shop sometimes it's lower, sometimes it's higher. Why don't we look at the sorted unemployment rate by creating a new column W called "sorted unemployment". So let's go ahead and populate this column with our sorted unemployment values. The SORT function takes one monetary parameter and that's the array of values that you want to sort. So I'm going to go ahead and select all the values in the unemployment column. And once I selected them and complete this function and now you can see that we sorted our unemployment values by highest unemployment to lowest unemployment. Why don't we close by looking at the FILTER function? The FILTER function allows you to filter one column of multiple columns by a filter term and just show the filter results. In our case, why don't we filter our unique stores that we previously calculated the average temperature for by condition as to whether they are on average too hot or not. So, I'm going to create a new table under the "unique store" table called "Hot Stores", and I'm going to use a FILTER function in this "hot stores" space. So let's create the FILTER function. The first parameter is the array that we're going to filter. I want to filter every single unique store that we've performed calculations on. So that would be my array. The next parameter is what filtering function we're going to use. In our case. I want to look at whether the average temperature is over a certain amount. So I'm going to select the temperature column and I'm going to say I want that temperature column to be greater than 54 degrees Fahrenheit. And actually, that's all I need to include. The final if empty parameter is optional, once I select this and autocomplete, I can see that my original cells have now been filtered down. And all I'm looking at is the two stores in Boston and Atlanta where the average temperature is hot. And there you have it. These are all the ten Microsoft Excel functions I have for you today. I know these are going to make you a more efficient data analyst. Let us know in the comments below if there are anything we've missed or if there's any other Excel related content you'd like to see in the future. For example, pivot tables. We didn't include those. At CareerFoundry, we have a free data analytics short course, where you can start using these functions straight away. If you'd like sign up, the link will be in the description below. Or if you're just starting out and you'd like to find out more about what a typical data analyst job looks like, why don't you check out this video that we made a while ago where I take you through a typical day in the life of a data analyst. It will give you some insights into what tasks a data analyst might typically carry out, as well as what some of their favorite tools and tips are. Check it out here and see you in the next video!
Info
Channel: CareerFoundry
Views: 42,925
Rating: undefined out of 5
Keywords: data analytics, data analyst, excel tips and tricks, microsoft excel tutorials, data analytics in excel, microsoft excel for data analysis, microsoft excel for data analytics, data analytics for beginners, data analyst skills, how to become a data analyst, microsoft excel, excel for analysts, excel formulas and functions, data analytics project, excel functions for data analysis, excel data analysis, data analytics job, data analytics career, data analyst day in the life
Id: B4dBMsOT1aA
Channel Id: undefined
Length: 19min 40sec (1180 seconds)
Published: Fri Aug 26 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.