How to Clean, Analyze and Present Data with Excel (FREE Adv. Course)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Microsoft Excel continues to be the number one software for data analysis work so in this video Let Me Explain how we can clean structure analyze and present data using Excel this is a really long video and we are gonna cover quite a bit of width and depth of excel features in this video for the purpose of this exercise I am going to use a fictional data set of awesome chocolates employee data the sample data set is available for you to download in the video description below in this video we are going to learn how to use Excel in general how to work with power query for data management and data cleansing how to use simple as well as complex formulas to analyze data and answer business questions how to use pivot tables to structure your data and quickly calculate various types of outcomes how to create and format Excel charts how to use data validation how to use slicers conditional formatting and a whole bunch of other things along the course of this I will also introduce many powerful shortcuts and Excel tricks that can help you solve problems quickly and elegantly whenever you need to do that as this is a long video I recommend that you set aside time and watch this either in chunks or all in one go but as you are doing this also give time to practice the concepts so that you understand and Implement them for your success let's go this is the sample data file that we will be using in this video here as you can see we have two kinds of data one is the data and another is India staff awesome chocolates primarily operates out of Wellington New Zealand but we also have a branch in India and the data is split into two tabs one for our Wellington staff and this is our India staff data so our first task is to take all of this data consolidate it clean it and make sure that it is in the ready to use form from a data analysis perspective whenever you have data like this it is a big challenge to understand and use the data if the data is not properly maintained here as I can see there is a lot of information and we are not even sure if there is any duplicates missing elements or anything else in the data so what we will do is first take each of these data sets and then turn it into a usable format this is where Excel tables come into picture usually anytime you're analyzing business data the data always tends to be in a tabular format so all you have to do is either select your entire data and this is where you can use the shortcut Ctrl a you must select one of the cells in the data and press Ctrl a and Excel will select the entire range where your data sits together and then press Ctrl t t for table this will prompt you how do you want your table and whether you have a header Row in our case we do have a header row so I'm going to check that and click OK and now we have a table here let's do the same for India staff control a control t so now we do have both data sets formatted in the tabular format what tables make is they quickly apply some simple formatting on your data give some borders and highlights and everything they also add this header row in a different color and set up filters for you so you can quickly eyeball and see what departments we have what is the age range of our employees from 19 all the way up to 46 or what sort of ratings are there let's see this entire data in one view clearly you can see that for example the name column doesn't show the full name likewise the rating kind of spills outside one quick trick that you can apply here is you can select all of these columns place your most pointer somewhere in the middle of the columns and double click and that will resize all the columns now what happened in the case of CES because my header here is quite long this column C also became too big this I can quickly see some problems with our data for example some employees do not have any gender information likewise there might be other issues that are harder to spot but they are still there let's do the same for our India staff and now we do have two parts of the puzzle our New Zealand data is here in the data table this will be in the blue color or India data I'm going to change the color of this for this you can go to the table design button and let's just go with the this orange color for India data so it's easy for us to tell them apart blue is New Zealand orange is Indian but this kind of telling apart is too tricky so this is where a table name comes into picture if you go to the table design ribbon you'll see that the tables have a name option right now the India table is called table two I'm gonna call this as India underscore staff and then this one would be NZ underscore staff so now I have two tables New Zealand staff and India staff neatly arranged let's just say you don't want to combine the data you just want to keep the data as it is in Separate Tables but you would like to do some quick summary analysis of the data like for example you want to know how many employees are there or what is the average salary Etc what you could do with tables is when you have a table you can select any cell in the table go to the table design ribbon and from here enable the total row option this will add a total row at the very bottom and here it tells me we have 100 people in the New Zealand data this in itself looks suspicious to me because I don't think we have that many people in New Zealand but we will investigate that a little later let's say I want to see what the average salary is so I can go to here it will show me a little bit of drop down next to it and I can pick the average option from there so the average salary in New Zealand is 77 472 dollars now it would be good if this is formatted in dollars so I'm gonna select this entire column and then press the shortcut Ctrl Shift 4 or control dollar symbol this is going to apply dollar formatting with two decimal places so that it's easy to read and then for age again I can do an average New Zealand average age is 30.52 for date joint kind of things you don't want to do averages or anything because it doesn't make any sense some of these dates look all right the other dates are looking in number format this is because by default Excel stores data for dates in the number format so if you don't apply date formatting it defaults to this kind of four four four six what it means is this date is actually forty four thousand four hundred and forty six days since the start of excel calendar Excel calendar begins on 1st of January 1900 so every day is one number and this is how many days that date is we don't really care for this kind of a thing so I'm gonna select all of these again and then press Ctrl shift 3 to turn the date formatting correctly so these two shortcuts Ctrl Shift 4 for currency Ctrl shift 3 for dates are really simple to remember and very handy to press and they quickly change the formatting for you so using that total row we can add quick analysis of our information to the tables but that sort of a thing is quite useless and also if the data itself is not clean then these numbers will be wrong so let's do some quick checks here I want to see if there is any duplications of the names if I use the filter here I will not be able to tell apart if a name has repeated or not so this is where we can select an entire column to select an entire column in the table you can place your cursor on the header and when it turns into that black color down arrow symbol click and it will pick the entire column now you can go to home ribbon click on conditional formatting highlight duplicate values this is a great way to check if there are any duplicates and when you click OK you can see that all of these guys have actually appeared many times in the data like Molly Hanway probably appeared somewhere else further down yeah here and then all of these as well so looks like someone when they are preparing this data copy pasted the information and that created a problem one way to fix this problem is you can highlight one set of the duplicates so we do still want to keep one of these rows but the second occurrence we can delete it this kind of a thing can be done manually using the data ribbon remove duplicates button I'm not gonna use this because we need to solve problem plus there are other problems that we would like to fix as well for our next step we are going to take the New Zealand data and India data and then combine everything into one Consolidated table and then that is the table that we will use to do all our data analysis to do this kind of a thing are going to use the data ribbon get and transform data area this is what normally we refer to as power query a quick note here I'm using Excel 365 for this particular video if you are using an older version of excel you can still use tables power query and many of the other things that I will be showing but they might be called different or they might appear on different parts of the screen so refer to the help on the web or somewhere if you get stuck or post a comment so that I can help you out where to find these things so now that we have data and India data in separate tabs let's go ahead and combine these two into one giant table and then do the analysis first up let's take this data and load it into Power query to do this we can just select any cell here and then from the data ribbon click on from table slash range option this will load my NZ staff into the power query as a table if you have never used power query before power query is a data preprocessing engine that we can use to kind of tell Excel how we would like our data to be cleaned what additional steps we want to do on our data and all of that I do have many other detailed videos on power query on my channel so I will not do a introduction to power query here but feel free to watch some of the other power query videos that appear as a card up on the top here or in the video description to learn more about power query so now that our NZ staff data is here we would like to bring the India staff data also into the power query so that we can then combine both of them as we are already here if I try to go back I can't minimize nor I can go back to my data because it won't let me do that while I am in power query the only way is to close and load but that will create additional headaches for us so instead of that what I'm gonna do is because we already have NZ I'm just going to make a copy of this right click on it and then use the duplicate button to create a duplicate of this so it kind of copy pastes so we now have two NZ steps same data really in the second one I am going to look at the formula bar if you do not have the formula bar here you can go to the view ribbon and check the formula bar option so that the formula bar appears and then from here make sure that you are selecting The Source step on the second duplicated one and where it says NZ underscore staff I'm going to change this to India underscore staff and when you hit enter it's gonna load up India data here so now we have two tables the first one is called NZ staff the second one is also called NZ staff two but it shows the India data let's rename this as India staff and both data sets are here before we combine this let's go to the New Zealand staff and add a column here that tells us that this is actually New Zealand people so to do this we are going to use the add column option and then say custom column that column just has to say New Zealand for everybody so we will name this column as country and here the formula will be just double quotes NZ so it will just put the text value of NZ all the way through this is an easy way for us to identify that this data is actually in New Zealand data let's go to the India and then do the same make sure you name the columns exactly same so country and so the second table would have India as the country first table has New Zealand as the country a quick note here when you are observing New Zealand data has a name gender department age in that order whereas India staff has name gender age rating so department is somewhere here in the middle The Columns are jumbled but the names of the columns are exactly same so this is a important thing to keep in mind when you are consolidating the data now I do have two tables but I don't want two tables I want everything to be combined into one single table so this kind of an operation is called appendix where you take two data sets that have similar column structure and then append them stitch them together one underneath another so go to home ribbon and use append queries button click on that under link here and then say append queries as new this will create a new query that combines both my India staff and NZ staff so just pick the tables that you want and it will when you say okay it's gonna combine everything into a new query it will be called append one and we can then just call this table as staff this is all our staff it has all our India and New Zealand people everybody one underneath another one at this point let's do a quick cleaning of the data the first cleanup activity that I would like to do is check the name column and if there is any duplicate values take out those duplicates so to do that we'll just select this column right click and then say remove duplicates what power query will do at this point is it will keep the first occurrence of the name and then subsequent occurrences of the same name will be deleted this sort of a thing could be a little dangerous to do if you have a large set of data and the same name could potentially appear multiple times for example if you take a very large company there might be five people who are called barfoni in awesome chocolates we only have one such person so if you do have such a duplication possible then you want to pick a column like employee ID or some other number that is distinct and use that to remove duplicates so let's do the remove duplicates on this and it will remove some duplicate values from the Consolidated data next up let's deal with the problem of gender values missing uh anytime a gender value is not present what we really want to do is code them as other gender so to do this we can use a replacement type of a step within power query so select the gender column right click and then say replace values the missing genders are called null here so we can say value to find would be null in small letters and then replace with other next up I'm going to look at the date join column here the green bar tells me that potentially there is a blank value because it is it doesn't go all the way to the end there is one empty value I'm just gonna see what's happening there is somebody without a date join I guess this is probably incorrect data again from a data analysis perspective if an employee doesn't have a date join they shouldn't even be in our data so it could be that that is the grand total row I'm just gonna yes see this was the total row that we added in our pre-analysis work that we did so because that total row is there it kind of appears as the data this is not actual data so what we can do is I'm going to undo that step by deleting that and then uncheck the null value so anybody that have null date joined that means they don't have a date of join it would be either total row or incorrect data so we can just delete that as well and that all looks good so we have got name gender age rating date joint Department salary and Country so this is the final data this is the data that we can now take and use to do our data analysis work so at this point what we want is we can go to the home ribbon click on the little arrow next to close and load and use the third option close and load 2. and then for now I'm just gonna say just make a connection don't load anything and click ok so this will load all the three queries that we built engine staff India staff and final staff as connections now let's add a new tab in this tab we will call this as all staff and this is our Master data so I'm going to right click here tab color and then make the color yellow so we can quickly spot it when you have multiple spreadsheets so in this all staff page I want to load my all the steps so right click this one uh and then right click on the staff and then say load 2 and load it as a table in the existing worksheet here so that it will load the all staff data here so we started with the blue table orange table combined and cleaned everything to end up with this data set that has all our employees you might think oh does the staff have same kind of salary in India and New Zealand yes or awesome chocolates even though it has a branch in India the salaries are still paid in dollars so we can combine everything here what about the date the dates look like numbers again we can apply date formatting again on this column but let's just do that in power query so I'm gonna right click on my staff data and edit the query quickly jump back to Power query and select the date joined right click change type to date so that we tell part query that this is actually a date column and click on close and load so that when when the data is loaded into Excel it will appear as a date value perfect so now that we have cleaned our data let's go ahead and answer some business questions I have got five business questions for us to answer as part of this section of the video These are a quick analysis of the data so we have got all our Consolidated employee list here I just want to know who are our these people how many are there what are some quick statistics about them the second question and information finder is where I can put in a name of an employee and I'll know everything about them and then the third one is we want to extend this information finder so that we can ask about a department and get the entire results of that department as a clean view for us and then we also would like to do a male versus female comparison of the data we have got three genders in the data but for this purpose we are just going to look at male versus female and then finally we have had a very good year of chocolate sales in 2022 so to celebrate that in 2023 awesome chocolates would like to give bonus to all our staff so we want to build a bonus calculator using Excel and then understand the process for calculating the bonus based on the employee data so that is the second part of this video in this process we are going to learn some cool Excel functions Excel pivot tables and other simple tricks and Powerful useful things in Excel once we finish this then we are going to do some analysis of the data using a different set of questions we'll go there after we finish this let's go [Music] our first question is a quick analysis of the data to understand what is it that we got here we could do this formulas I recommend using formulas because that way we get to learn some of the cool and helpful functions in Excel for the sake of Simplicity I'm going to build our formulas right next to the data that way it's easier for us to kind of cross check and verify the results but in usual business situations you may want to build these formulas in a separate page or a tab rather than everything being here so the things that we want to calculate are count of employees average salary average age and then average tenure tenure refers to how long an employee has been with us given that we know the date joined and we know what current date is we can calculate how long and a person has been with us and then we would like to calculate female ratio this is percentage of female staff as against the total staff to get a count of employees we just want to count how many values are there in one of the columns let's pick the name column so this one we could use the count a function Excel has two kind of counts count and count a count refers to count of numbers because we are looking at the name column we should refer the count a function so count a of the name column here you can use the mouse pointer and point to the column like using that Black Arrow trick or alternatively you can also type the name here you simply type the table name which is Staff open the square bracket and then point to the name column with n letter and then tab to complete that so we have 183 people in our data make a note that this is after we have removed all the duplicates so if you actually count how many people are there in the original data sets this number will be a little less than that because some people are removed due to duplication next up average of the salary column is using average function average of salary column so select like that average salary is 773. average AG is average of the age column normally when we calculate summaries in business situations like this it's a good idea to not just calculate the average but also calculate the median that way you will understand if the data is a little bit too wild or not average is sum of all the numbers divided by count of the numbers whereas median refers to the midpoint if all the salaries are arranged in ascending or descending order what is the Middle Point looking like so let's calculate median here the median can be done with median function and again select the column and we can see that so for example here my average is 77 000 whereas median is 75 000. this kind of a difference indicates that there are probably a few people who have very high salaries and hence the average is getting pulled towards the other side not by much but a little bit let's do the median for ages as well median of staff h average age and median age are pretty much similar so that distribution looks all right to me how do we calculate the average tenure we only have the date joined but we need to calculate tenure first before we can average it out we could of course write a big complex formula that does all of it in one go but that kind of thing is not recommended it's a good practice to add the tenure column first and then average it out so let's add a tenure column to the table when you have a table if you type a value right next to it and hit enter it will add a column to the table the tenure can be calculated by taking this date and subtracting that from the current date to get the current date we can use the today function and then minus of this that will give you a number unfortunately here Excel has formatted that number as a date but if I select the column and then change the formatting to General we can see how many days each of these people have been with us for example this person deepali Charan has been with us for 879 days now this kind of a number looks a little bit ridiculous so what I want to do is calculation and then divide that with 365. so that we can get that in years and then let's just apply a number formatting here 2.41 years for this person 0.93 for that person now let's calculate the average of that average oh staff in your column and then finally we want to do a female ratio that means count how many females are there and then count that as a percentage of 183. we'll do that in two steps so that it's easier to see the process first we'll do a female count this is where I want to count how many gender values are female so we could use the function countifs to do this kind of things Excel has two sorts of functions count if and countifs I always use the ifs functions wherever possible this way it's one function to remember we don't need to know what the other one does anyway so countifs gender column and then within double quotes we can type female so we have 86 female employees hence the ratio would be 86 divided by 183. 0.4699 to apply percentage formatting to this you can press the Ctrl shift 5 for this Ctrl shift 5 or control percentage symbol on your keyboard and that will apply the percentage formatting to this 47 female ratio pretty good almost one in every two employees is a female at awesome chocolates so this is a quick summary like I said we could use either formulas or pivot tables to build this kind of things as I plan to introduce pivot tables later on in this video I thought we can learn some of the formula versions of this and do that here is one homework problem for you I want to calculate the ratio of ninety thousand dollars or more people so some of our employees have salaries the salaries start from 33 000 all the way up to 119 and we just want to see what percentage of our staff have more than 90 000 as salary so calculate this using a formula either one formula or a bunch of different formulas and then print that value there as your homework so that is the first section of the business question which is doing a quick summary of our data [Music] thank you for our second section we would like to build a information retriever or information finder wherein if I type an employee name I would like to see their entire details printed to me neatly underneath this sort of an operation is called looking up and we can use the lookup functions in Excel to do this work we can for example if I have a name like bar funny and I just want to know where this person works when they have joined our organization how much salary we are paying them Etc I can use the filter button here and then just type bar and you know we can see that the name shows up there and then click OK and then quickly see all the details about them they work in New Zealand 75 000 in the procurement department they have been with us for 0.61 or less than one year but that is apply the filter so this is where we could use the lookup functions in Excel to do this kind of a job and we are going to learn the basics of these lookup functions there are two kinds of lookup functions that we could use we could use either X lookup or vlookup if you have Excel 365 I highly recommend that you use the X lookup function but if you do have any other older version of excel use the vlookup I'm gonna build the formulas with both of them so that you're familiar with it but later on we'll just stick to one of them because it's that way it is simple and the video will move faster so what we want to do is for the cell here this is l17 I'm just gonna print some color there and for that cell we want to look up and get their entire record here so we could use x lookup like this x lookup lookup bar phoning so the l17 and then you need to specify the lookup array where is this name it's in the staff table name column and then what do we want we want their entire record what is their gender age rating everything so I'm going to say staff table and then we can see from gender colon tenure so we want all of these columns for that person and then when you close you're gonna get their entire record here or bar phone in this is the record because there are multiple values returned by Excel X lookup it's gonna spill the values in this direction because the data is also in the horizontal Direction so the X lookup will also go in like this many times this format may not work from a presentation perspective you may want to see the results like this so in that case what we could do is we can say x lookup barfoni in the staff name column and then return array instead of typing it you can also select the column so I'm just gonna select like that and then instead of just doing X lookup hello we can use the transpose function around it so that it will be transposed so the horizontal data becomes vertical and we will get a format like this let's do a quick formatting for this I'm just going to move this here [Music] and we will get the information for bar falling the beautiful thing about this is it is all Dynamic so if I change this and I want to find for example I just type that name and I will get the information for such a person here is that person now let's see how to do the same with vlookup like I mentioned my preference is to use x lookup but if you do not have X lookup then you can use the vlookup so we can say the same thing vlookup look up this name in this time you just say the whole table so staff table and then what is it that we want to return now the lookup will happen on the leftmost column which is the name column but return you have to specify which columns you want so let's just say I want to look up their salary salary will be one two three four five six seventh column so we're gonna say seven and then you have to also specify that you want an exact match so this is where you will need to say false and then you will get the salary for that person as forty nine thousand six thirty same as here if you do want to go into the columns rather than just one column alone you need to specify that as a list of numbers here and then you need to do something else in older versions of excel to be able to see the entire column list this is also why I like xlookup because it's simpler Formula One formula gives me everything that I want foreign we need to build an information finder version 2.0 this is where if I provide a department name I would like to see who is working in the department what are their salaries how old they are and few other details we're gonna do this in a separate tab this way it's easier to work with this without getting confused with so many other details that are on the other page so here if I put a department name in this cell C5 I would like to see who are the people that are working in the department what is their salary and what is their rating just three columns I don't want to see the whole history of them I want to know the names salaries and their employment rating information and I would like this information to be sorted in the descending order of salary so whoever is getting highest amount they'll go on to the top and the lowest salary goes to the bottom apart from that list I also want to know up top how many people are there what is the head count of that department sounds good so to build this kind of a thing we need to use new functions available in Excel 365. so this part of the video will be very hard to follow if you are using a old version of excel here we can use the use the shin film initially we'll just get everything later on we'll figure out how to get only the required columns so staff table include by saying staff table Department is equal to C5 if I say this then this is going to take the staff table filter it down to just the sales department and give me a list of all the employees in the sales department you can see that you know everybody here is sales person only so this is the data like I said we don't want all of this what we want is name salary and then the rating columns so if I see put a number here then essentially we want column one column seven and column four of this data so instead of just filtering what we want is we want to choose the columns column one seven and four this is where a new function called choose columns comes into picture choose columns from the filter result and then we want column one seven and four in that order so this is going to just take out that entire thing shrink it down to these three columns rearrange them in this order once this data is there then what we want is we want to sort this by the salary column so that the highest salaries goes to the top for this we can then send this output to the sort function so sort this entire thing buy the second column second column has my salary in the descending order descending order stands for minus 1 here so like that and then we will get the information so these are our sales people names salaries and their rating quite a few people are making more than 100 000 there but we also have some people that are at thirty thousand forty thousands next up we would like to count how many such people are there so you can say number of people and then we can either directly refer to this or we could do something else so I'm gonna use our good old countifs function countifs staff department is C5 so we have 28 people and these are the 28 people and this is all Dynamic so if I put something else like website here I'm gonna get the website people neatly arranged in the descending order here underneath in all the 54 as the head count there so that is our information finder version 2.0 want you can add some additional layers of formatting or other things on top of this I leave all of that for your imagination and homework for our next question we would like to do a quick male versus female comparison and understand how many people are there by each of these gender groups what is their average age or average salary or some of the other stuff we could again use formulas to do this but we have already used in a formulas so I'm going to introduce the pivot table approach to do this analysis we can go to the data set here select any cell and then use the insert pivot table option to introduce a pivot table to do this analysis so I'll tap on this and you can straight away go and do a new worksheet or pick an existing worksheet since I already have an existing thing set up I'm going to select male versus female and put the pivot table here one additional thing that you may want to do and I personally like to do this every time is add this data to the data model option what this does is it enables some special features in your pivot tables so that you could answer some very interesting questions quickly as well let's click ok and we will get a canvas here for you to build a pivot table you will also have this pivot table Fields option here using which you can construct the pivot table if you have never built a pivot table in Excel think of this like a quick and easy reporting system built into Excel we can use pivot tables to answer many business questions or build reports quickly from your data I do have many other videos on the channel That Go in depth into pivot tables so here I'll build the pivot table and hopefully it all makes sense if you have already been using pivot tables for some other purpose then it all will make sense anyway so here what we want is we want two columns one for mail one for female and then we would like to see how many people are there what is their average age Etc so I'm gonna take the gender and put it into columns because we want one gender per column so this will have female male and other as well as grand total we'll worry about what to do with this other gender and grand total a little later for now what we want next is we want to see how many people are there it could be a count on the name column so I'm going to take the name and drop it into the values and it's just gonna count the name so we have 86 females 89 males Etc you might already remember that number 86 from the earlier portion where we calculated the female percentage fix up will take the age and put it into the values again by default because age is a number Excel is going to sum up the age so some of all female employees age is 2701 and all male employees age is 26.6 this number is useless so I'm gonna right click on the number and change the summary from sum to average 31 is the average female age 29 is the average male age next you can see that this pivot table is kind of growing horizontally what we want is counts in one row averages in next row like that so we'll take this values option here click on it drag and drop it into rows so that you will pivot table gets restructured like this this looks better next up we are going to take salary and put that into this again it sums up the salary right click on it and then say summarize by average so we can see what is the average female salary male salary and other gender salaries we could do the same for tenure and now that the information that we want is there let's see how to take out this other gender so that we can just look at male versus female so for this purpose because we only have few people with other gender that kind of skews the analysis so it's better to focus on the two larger groups when you're doing these kind of comparisons to do that you can click on the column labels arrow and then uncheck the other option and click ok so this is how the information looks across the countries both countries let's apply some quick formatting to format the values in the pivot table you can right click on the number and then use the number format option and select the appropriate format that you want click ok we don't want to see the grand total you can also go to the pivot table designer button and then from here grand total and you can turn that off as I mentioned this is the information for both countries let's say I want this but only for New Zealand what we can also do is right click on the country option here and then add it as a slicer this will give you a slicer which will give you two buttons for one for each country and these are interactive so if I tap on New Zealand I'll see the numbers for New Zealand alone if I tap on India I'll see the numbers for India so this gives me a quick sense of how our staff are in both countries with slicer we get to see one country at a time or if you want to see both you can clear the filter and then select both of them to see how it is happening together with slicers pivot tables become so much more easy to work with and answer any kind of question without writing complex formulas imagine to get these kind of numbers earlier we had to write some not so complex but still complex formulas so this is why I do like pivot tables to do some of the ad hoc analysis that said I like the flexibility of excel functions because that lets me build more complex answers or control things in a better way as well now let's go back to our data and our last thing is to do a bonus calculation the bonus calculation is really simple we would like to give three percent bonus on salary so whatever is your salary your bonus would be three percent of that assuming you have been with the organization for at least two years if not your bonus would be two percent so it's a simple rule at awesome chocolates have you been with us for longer than two years then you get three percent else you get two percent we can calculate the bonus as a new column here because everything is touching each other I'm gonna introduce some columns here so that there is a little bit more room for our table and we can calculate the bonus as an extra column now you can do it in two places you can add a column to your table or if you are comfortable with power query you can also do this as a step in power query too I'm not going to show you the power query option right now but that is something that you can try yourself so let's do the bonus percentage here bonus calculation like I said Boris would be three percentage of salary but we can't just be three percent it would be three percent only if your tenure is more than two years so this is where we could use the IF function if my tenure is greater than 2 because tenure is already in years we just say two then it needs to be three percent else this will be two percent so this if formula tells me what the percentage is this is not enough we need to now take that percentage multiply that with the salary and then that will give me the bonus value if you want you can round this bonus so I'm gonna round it up to the nearest number so round it up 0 means no more decimal points just take it up to the next highest number and then that will give you the bonus let's apply some currency formatting on this little thing and we will get the bonus values as you are looking at this your boss suddenly comes up and then says oh you calculated the bonus good job can you tell me who are the employees that are getting the highest bonuses I just want to know the top 10 names you can do lots of different things you could write a formula you could do a pivot table and all sorts of things but I'll show you the quickest way to get there click on that little arrow number filters and then select top 10. just click OK and you will see the top 10 employees making highest bonus all right so now that we have answered those business questions let's understand how to use charts and other data visualization options in Excel to do some data analysis I have got four topics for you in this section so we are going to look at the salary spread to begin with how the salaries are spread what the distribution looks like are there some extreme values or not so that is the first topic for our data analysis and then the second question that I would like to answer is is there any relationship between salary and employee rating for example are the exceptional people getting paid more or are the poor people getting paid less you know that is is that something that exists within our organization or not and then the third one is how did our company grow from the very first employee up until the 186 or 83 employees that we have now what was the trajectory like how was the our head count over a period of time where is it heading you know can we tell anything about the future based on that information and then finally I would like to prepare a India versus New Zealand report card so that we could compare both countries and understand various key statistics and business metrics from the HR perspective and you know make it fun so that is the scope of this part of the video again we are going to introduce some very fun and useful chart options in Excel as well as conditional formatting in data validation let's go thank you so our first Topic in data analysis section is understand the salary spread and you could do this in few different ways so if you want to do it quick and dirty here is how I would do it just select the salary column and from the home ribbon conditional formatting apply either a color scale or a data bar let's go with color scale because I think that is better suited for this situation and we can pick a color scale like green to Red Green being higher salaries or green color and red being lower salaries or red color if you are thinking that green kind of coincides with my table formatting here you could also use blue to Red which kind of contrasts well with our data so let's go with that and that kind of shows you how the values are spread but this is a quick and dirty one and when you have hundreds of rows of information it might be hard to get a visual sense of your data once you do have this here is one other handy trick that you can apply you can also apply a sort order so if I do smallest to largest I'm going to get a nice Spectrum going on here that shows the spread of our salaries and if you zoom out a bit now you can kind of sense how that spread is looking it looks up fairly even to me but I'm not really sure if there are some extreme values or not so let's go back and try something else the next option is we can select the salary column that entire column and then we can go to the insert ribbon and use one of the statistical charts to quickly do the analysis these charts are newly introduced into Excel 2016 and 19. so if you're using a really old version of excel you will not find them in your Excel so from insert you're looking for this histogram or statistic chart option and from here you could try either histogram or box end whisker plot I'm gonna try the histogram option first and this will give me a histogram of our salary spread it kind of looks a little bit funny that's because the initial bucketing of these groups it starts from 33 to 48 48 to 63 like that this has no meaning so I'm gonna select this axis and then press Ctrl 1 to format it from here instead of automatic binning I'm going to specify the bin width of let's just go with ten thousand dollars at a time and we'll start an underflow bin at thirty thousand what that does is it will begin the lowest salary at thirty thousand thirty thousand doesn't seem to work let's go with 40. so from 40 40 to 50 50 to 60 like that and now we can kind of get a sense of our distribution we have very few people under 40 000 just about seven and then most of the majority of our employees are between 40 to 80 000 and then between 80 and 90 we surprisingly have very few people but then it picks up again and we do have a significant number of people between 110 to 120. so this kind of tells a story of what is happening and with about just 180 people there is not much but if you apply this technique over a large set of data then you can quickly get a sense of how the values are spread let's cut this chart and paste it into our salary spread page so this is our salary spread by ten thousand dollars and you can apply that as a title so this is one analysis now let's go and see what else can we do to understand the salary spread while keeping the salary column selected you can again go to insert and let's try the box plot option which shows the entire salary in a box plot diagram I'm going to cut this as well and put explain what this is trying to do so a box plot shows your entire salary as a box with these two whiskers at the end so the middle shank is the mid half of your people this line refers to the median the midpoint of the salary and then the text Mark is the average and previously when we are looking at the numbers we kind of knew that the midpoint is less than the average so that's where the x is above it and then this box is the 75th percentile and that box edges the 25th percentile Edge so essentially this kind of tells me that middle class or the mid half of our employees their salaries go from maybe around 55 50 000 55 000 all the way up to a hundred thousand and then we do have some people here some people there there seem to be no other extreme values but I'm just gonna select this go here and uh yeah make sure that show or clear points is checked even when that is checked there is no outlier so no nobody has extreme values in our data that's why this looks like this let's see what happens if we kind of artificially change somebody's salary so I'm gonna take uh these two guys salaries and then let's just make this 25 000 uh for here and then 19 320 for this person so we introduced some really extreme values at the bottom Edge I'm gonna take some of these people and then make them hundred and thirty two thousand hundred and forty eight thousand like that not really come even then they are not extreme because I think they kind of still fall within uh the natural expected range but now our histogram change it and it kind of has these extra things at the end but if you do have some very extreme values you will have that as a pointer identified in this box plot so this is our salary spread box plot now let's answer the second question which is is there any relationship between salary and the rating that an employee gets we're gonna try two different ways to understand this particular thing one is through a pivot table and another is through a chart so for our pivot table I'm gonna add a new worksheet salary versus rating and here insert the pivot table remember in the earlier step we have added the pivot table to to add this data to the data model option so now that that pivot table was created and then added to the data model next time you want a pivot table you don't need to go back and select your data you can just go to the place where you want the pivot table and then straight away click pivot table from data model option this will bring up the original data that you have push it to the date model and then it will show you that in fact it's going to show you even the India and New Zealand staff tables we don't need that we only need the staff table so I'm going to use that and from here we are going to put rating into the row label area and then take something like name and then add that so that we can see how many people are there by rating groups a majority of our employees are average rated we have only four people exceptional and very poor so it doesn't really tell you much but you know you can still understand what is happening how many people are there now let's see how well they are paid so I'm going to take salary and put that there right click and then change this to average salary so that we can kind of get a sense of what the average looks like our exceptional staff are paid really high so they have 92 000 but everybody else seem to be around the same ballpark one way to read this better is if we arrange these ratings in a better order right now they are in the alphabetical order and that is not a good way to do it so to fix this problem what we can take do is select the label that is exceptional and we want this to be up top so once you have selected place your cursor on the edge of that box it turns into that black colored arrows click and drag and move it above so exceptional above average average poor very poor so this is the correct way of reading this set of information that is how the distribution looks and that is how the averages look when you have a bunch of numbers like this sometimes it may be hard to kind of see which numbers are big or small given that some of them have decimal points and whatnot so a good idea is to do number formatting and apply consistent formatting for everything so that you can quickly see the numbers better the next trick that I like to do is maybe add conditional formatting data bar on this so when you add a data bar in the pivot table usually it gets applied only to the value that you selected but you will have this extra button here and from this you can select the third one so that the bars are applied consistently and you can see what's happening while the exceptional people have very high salary these other people their salaries are all around the same number in fact very poor and poor people actually have higher average than this group so we can kind of say that probably not much is going on with respect to rating versus salary it's one thing to be rated at a very low level but you might still be paid higher than on average than the people who are rated above you but also because we do have a significant group of people in average rating it might be hard for us to kind of spot and understand what is going on now that this thing is done let's see if there is another way to look at this data and see what is going on for that to happen what we also need to do is for each of our ratings we need to give a numeric value as well because these are text values it will be hard for us to do the analysis correctly we need to assign a number value for each of these ratings so I'm going to first set up a mapping table we can do this in a new page here I'll copy these five words from there and then paste them here as values and then the number for them is let's just go with 5 4 3 2 1. so this is the rating and that's the corresponding number value and now we can come back here rating as a number and then quickly calculate what is the rating as number this way is where we could either use xlookup or vlookup I'm just going to use xlookup x lookup rating and then go to this column here make sure that it is absolute reference so it should be B dollar to B dollar six and then select this column make sure that is also absolute when you have a formula and you need to quickly change the reference style you can press the F4 key and keep pressing it until you get the style you want so now that that is ready we'll close that and we'll get the number representation for all our ratings you might think what is the point of this but you will quickly understand once we make the chart that I want to build now that we have salary as a number and rating as a number a quick way to visually explore this data is to make a scatter plot where rating is on one axis salary is on another axis that way we could see if there is anything funny going on with these dots so I'm going to select my salary column hold down control and select the rating column both columns are selected and then now I can go to insert ribbon and click on the scatter plot option here and then add a scatter plot and I'll control X this and move it right next to our pivot table here so this is how our salary versus rating looks like 3 is the average rating so that is the middle one a lot of people are in the middle some people are on the high like that so earlier when we were looking at the average the average is only 76799 and we thought on average an employee who rated average has lower salary but if you look at the actual spread here you can see that somebody rated average that is three their salary could be anywhere from 19 000 all the way up 248 000. remember I adjusted some of the values earlier to pull up the box plot so that's why these three dots are there so in fact the average group has like a really wide spread whereas the exceptional group has really two salary points 109 000 here and 75 000 and probably this is why their average is 92 000 because maybe there's like two people here two people there and that's why the average is coming up to be 92 000 whereas this spread shows a little bit more Wild Thing so using a scatter plot is a really fun and Powerful way to quickly see how your data is and understand the only way we could do this is if both axes are numbered our salary is already a number but the rating was text that is why we had to have that mapping table create that let's name this as salary versus rating and again because of the way this diagram looks I couldn't really conclude anything meaningful from it it kind of doesn't tell me much but if you do apply this technique for some of your business data you might actually see a meaningful pattern here and then that might help you go and investigate the data a little bit more to see if there is something strange or interesting going on there our third question is to understand how our company has grown in terms of number of people so we would like to see a trend of date joined in how many people have joined on each date as a trend one simple way to do this is if I just take the date join and then plot the head count against it we will get an idea but that's not gonna give us much so we are going to use some pivot tables and build this out slowly we'll start by inserting a pivot table from our data model and go to the staff table take the date join and put that into the rows area whenever you have a date and you put that into pivot table Excel conveniently groups this data into year within the year quarter and then month and the day level let's say for the purpose of this analysis we just want to look at year and maybe at a month level we are not interested in quarter or individual days so what you could do is take out the fields that you don't want from the pivot table just click and drag them drag and drop them out so that they are gone and you will have something like this and for each of those periods I just want to know how many names are there so I'll take the name and put that there so total is 183 in 2020 when we started our company we brought in 37 people so really small 2021 was our big year we hired 82 people then 60 to more and 2023 we're just getting started so we brought in maybe two people so far in at this point you can go and insert a chart like a line chart that shows you that trend and you can see what is going on this chart is linked link table so if your pivot table expands it will show you the same expansion in the visual I'm just going to expand everything out so this is how our company has been growing we had a kind of like a hiring spree between March and September of 2021 we brought in a lot of people then Jan and Feb here we we hired quite a few people between Feb and May and now we are slowing down the problem with this graph is it doesn't show the growth of the company it only tells each point time what is happening so a good way to visualize what we want is if we can get cumulative totals that is running total of these numbers over the entire period of time so to do this you could kind of build some formulas or do complex things or you can right click on the number here and then say show value as and then running total option here if I do a running total I kind of get like these kind of spots where it goes and then it resets at the start of the year and then it shows you those three kind of spikes so 2020 we grew from zero to 37 then we started again from six to 82 and then it kind of went down three two so each year resets that's because we have had we have this kind of a two level structure here so it it will always reset based on one of the fields that you pick in the running total you might think oh this is good but I don't really like this structure how about if I want to just see an entire running total right from the point of time that we started all the way until end and we could do that with two techniques one is because we are already using a data model we could use power pivot here to build a measure that calculates these kind of things but that would be kind of like very advanced stuff in I don't recommend jumping into that straight away the second option is we could also build a page where we can do such calculations ourselves through some simple formulas so let's do the formula option here I'm going to use the same page but move it a little bit side and calculate our running total of employees here so we're gonna have two columns one is month and then headcount the idea is for each month what is our head count and then we can then calculate the running total here so we started our company in 2020 May and then we would like to go up to April of 2023 all the months so I'm gonna say may 2020 and then you can just drag this down if it is a one-off exercise this is fine you can just drag it down until you see April 2023 but that is uh not a fun way to do it the next option is for example from May 2020 I would like to go up to three years which would bring us to 2023 May so this way what we could do is instead of doing it like that just select this and use the sequence function in Excel to generate the months so initially we are going to make a sequence here we could do this everything with one formula but you know it's easier if we have two or three formulas like this so sequence of 36 will generate all the numbers 1 through 36 and then we can have a e date function with the start date being date of 2020 fifth month first day so that is May 2020 and then from that I want to go one month at a time for 36 months so the 36 is here so you can say P3 hash that points to the entire range of sequence values and I think we should start from 4 that way because this is already one then it will add that one there then it will come up doesn't really like that probably because he date was not built for doing spill ranges but once I add a plus 0 to it and fake it it kind of works so we will get all the dates up to April 2023 here this is a dynamic formula so what it means is if I date my data changes I can come back here and for example change this to 48 and I would have all the dates up to 2024 here as well let's just undo that for now and then for each month what is our head count so let's go back to the all staff and see what the problem would be because employees would join on a specific date So within may we hired the one on 7th of May 11th of May 29th of May so I would like to irrespect of when they are hired I want to count them all in the month of May so to do this we could use the countifs function but we'll have to get a little bit creative around it so we can say countifs staff date joined now the date join has to be on or after 1st of May 2020 so we could say greater than or equal to Ampersand that number because that is the number and greater than or equal to user operator we have to use Ampersand here and then it's not enough if they have joined on or after may we will also need to make sure that they are just in the month of May let's just see what happens if I do that alone then I'll get kind of like a reverse running total but this is not the correct one so what we want is they have joined in on or after me but they didn't join any time after they it has to be before the end of May so one more condition this time staff date joined these before the end of so less than or equal to and and then whatever this date is we need to go to the end of the month so this is where we could use the Evo month function of that within the same month so we'll say 0 and close the brackets so what Evo month does is it will take you to the end of the month of 2020 May which would be 31st of May 2020. so three people joined in May one person in June five in this month like that if we didn't hire anybody in a month that will be simply zero so now that all the individual numbers are there if I add all of these up we'll get 183 as you can see in the status bar here so the running total would be just sum of R3 to R3 but we'll make the first part of the sum absolute so it will be dollar R dollar three last part is relative and then when you fill this guy down it will be a running total so now that all the information is here I can just select my month column and then select the running total column and go to insert line chart and add a line chart that kind of shows us how our company is growing so you could see what is happening over a period of time we had some rapid growth but we seem to have stabilizer now at about 183 people or 180 people that's because while people do love to eat chocolates we have also gotten a lot efficient with chocolate production so we didn't need to hire new people constantly all right time for the grand finale making a report card between New Zealand and India without data so maybe we can make a report like this where it kind of has two parts inside India on both sides and here we would like to show similar information so maybe I want to show the head count for example 86 and then female ratio let's just say 48 percent and then average salary 75k and then similar kind of information here and then I want to show Department wise head counts as a graph on both sides and that's it but if you want you could also add other stuff here so let's go ahead and make this in Excel [Music] in a new sheet I'm gonna make a layout for our India versus New Zealand scorecard so we need three sets of information one is head count one is female percentage and one is average salary and then the same thing needs to be repeated here so I'm gonna take these three cells and then make them wide and Tall so each cell will contain one set of information and if you want add an extra column in the middle and make this column narrow this way the boxes will not be touching each other and then we can fill up some color here let's just start with that color but later on we can change this so we want a similar structure on the other side as well but first let's make this for one side and then we can copy paste it onto the other side next up here we would like to make a place for our department spread so for each of the Departments how many people are there that information would go here now that one side is done I'm going to copy this Ctrl C and then go here and Ctrl V it would only paste the thing but it won't adjust the widths but after you paste use this little box here and then use this option to get the exact column width same as on the other side so this is my New Zealand side and on the India side we are going to change the colors to that end for the New Zealand let's go with the blue color that we had originally so that will be New Zealand numbers here India numbers there and then the department spread graph goes here so what we want here is how many people are within New Zealand and this is can be done with a countifs function so we could say countifs and fill up the formula so staff country is NZ 91 people I'm just put it in the middle and let's make this nice and big like that and the female ratio is we need to calculate how many female employees are there within New Zealand and then divide that with 91. so countifs staff country is NZ staff gender is female and then divide that with the 91 that we already have here that will come up with the percentage Ctrl shift 5 we'll get the percentage formatting and that number goes there now I want this to look exactly like that so I'll select this one click on the format painter and paste that there and then apply the Ctrl shift 5 maybe tone that down a little bit 36 points let's go with 28 and then for the salary we could use the sumifs sorry we could use the average ifs function and then say staff salary staff country is NZ TAF I think that's it we don't need to do any more conditions and we'll get the average salary in New Zealand here again Ctrl Shift 4 will apply the currency formatting and we'll put this so let's make I think that looks good we'll just copy these paste here as formulas so only the formula portion comes here and then change the words from NZ to IND all right they're almost neck to neck here but that's because we do have equal amount of staff in both countries and they are kind of paid on par now let's see what is the department level distribution looking like for this we could use some pivot tables so it might not be a good idea to put the paper tables directly here because that look clutter so you may want to just set up the pivot tables in a different page and then copy paste the results here so here insert a pivot table from the date model put Department into row label area and Country into report filter and name as the value so that we will have all the Departments now this shows everybody so if I filter this down to just IND I'll have my India distribution of 92 people I'm going to control C this pivot paste it here and then change this one to New Zealand so we have two pivots one for India one for New Zealand let's just sort this in the descending order of the employee count so for example India has a higher number of people in procurement whereas New Zealand has higher number of people in website well they both are same but that's how that looks once that is there then you can just select this and insert this as a bar chart we'll make a bar chart for this one as well and then the India bar chat I'm just going to change the color because they both look blue it kind of gets confusing so India is orange and New Zealand is blue I'm just gonna select these Ctrl X them and paste them here that looks good but it kind of looks a little bit ugly with all these buttons and Superfluous words so let's clean this up select this one and then go to Pivot chart analyze ribbon and uncheck the field buttons so those buttons are gone likewise I'm going to take out the legend there is only one series value so we don't need the legend and then if you look at the ordering here the highest department is at the bottom lowest department is at the top that kind of looks incorrect so I'm going to select these departments press Ctrl 1 to format them in from the format access options use the categories in reverse order option to flip the order while we are here let's go to the bars themselves and then the Gap width is too high so I'm going to make this 25 percent so that the bars are nice and thick and then the word total is not required so I'm going to take that out and place it like that this looks good let's do the same thing for this one as well and then let's just move them down a little bit and then let's add few more rows in the middle for our title so here I would like to put a title so I'm going to select that entire thing and then thank you all right that looks good let's just tidy It Up by putting a word here that says New Zealand and India and we'll go to the view ribbon take out the grid lines so that we have our scorecard ready if you want you can add a vertical line in the middle you can just go to the insert shape and then select the line and add a line that cuts through the entire thing like that and just make it dull in dotted so there is our kind of simple but also elegant India versus New Zealand comparison like I said there is nothing interesting going on here but it does prove that through this process we discover more about the data and also make it more presentable and elegant and all of this is dynamic so if your data changes for example either you have more people in New Zealand or India and one of these tables change all you have to do is update the data so for example let's just take this person and then make their salary 65 000. so we changed one person's salary and I can come here right click and then refresh this is gonna refresh this table which will update the person's salary here and then we can go to our India versus New Zealand page go to data and refresh all in any of the pivot tables or any calculations that will all be updated of course here because the salary is just directly formula it would have already been updated but if there is any other changes to the department or something that would have also been reflected hey I hope you enjoyed this video If you do give it a like and share this video with some of your friends and colleagues so they can also learn Excel and Excel in their work so much for taking time to watch this video and it's where I take a business data and go in depth into solving various problems and showcasing how to present the data in elegant beautiful manner then to consider checking out some of my detailed courses I have got two courses one on Excel and one on power bi and the links for that are available in the video description check them out and sign up if interested as always it is a pleasure talking to you and I'll catch you again somewhere else bye foreign [Music]
Info
Channel: Chandoo
Views: 78,801
Rating: undefined out of 5
Keywords: chandoo, chandoo.org, Excel, spreadsheets, data cleaning in excel, data analytics, how to use excel, excel formulas and functions, excel tutorial, advanced excel complete course, free excel advanced class, Data analysis with excel, how to use pivot tables, how to use formulas in excel, how to clean data in excel, excel functions, data analyst, data analysis with excel
Id: H6k28jhclwI
Channel Id: undefined
Length: 92min 40sec (5560 seconds)
Published: Tue Jun 13 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.