Excel: Data Cleaning with Excel Part 1

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello welcome to analytics for all today we're doing data cleaning within Excel it's gonna be a workshop and I'll walk you through a typical data cleaning process using Excel as always I uploaded the files you can work on exercise files can be downloaded HTTP analytics for all forward-slash exercise - files - downloads when you come to the site just look for this data cleaning workshop and this green data clean that drop-down that will give you the downloaded excel file we'll be working with you don't need any other software for this project today just from using plain excel data cleaning is a very important part of any data analytics data science work actually it's probably 60 to 80 percent of the time you're going to spend working with your data so taking the time to learn some steps dealing with this is pretty important now there are some great third-party software options out there but some of them rather expensive but really taking the time Excel can handle a whole lot of this and taking a little bit of time to learn some steps will bring you a long way in this in this example today we're going to be teaching you the following handling the following steps of data clean you know obviously every job has its own extras and things you don't use based on the job but here in this example we're going to be doing some very common data cleaning operations or be finding duplicates we're going to be looking for spelling errors we're going to learn how to handle missing data we're going to be converting some data and we're going to merge our datasets together okay well before we go any further let's go ahead let's look at the data we have okay so open up the file I downloaded what we have here is this is an imaginary garage and this garage what I've got here is a list of the work orders the dates the work was performed the license plate of the car the services performed animate Kanaka form the service I've also got second sheet here which provides me with the license plate numbers and what kind of cars they are ok this is from a separate data set and then finally I have a list of the services and the prices they're provided we're going to try to merge all this together into one file so that we can do a little analytics work off it okay so the first thing we're going to focus on is duplicate values because that's one of the biggest problems when you pull data data is never clean there's only and that's having duplicate values as a real problem especially when you start adding up you're doing your aggregates your sums won't come out right your totals are going to come alright you're not going to match the need ledger books if I've got four or five of the same work order in there because we obviously didn't charge the customer for that same work three times so we've got to find these duplicate values and get rid of them so Excel mix is pretty easy to do what you want to do is go across your data set here and highlight it okay and then come up here on your under your Home tab on the ribbon bar you're going to see this thing conditional formatting okay this is pretty easy you just drop down the menu and then we are going to slides over as you can see we're highlight cell rules scroll down look and you see this one here duplicate values just click on it now this option pops up I don't really care for the red text myself I'm going to change it to green okay now say okay bring it over now you look and you see we've got a lot of duplicates now some of these are fine some of these obviously the mechanics are all going to do because these of all people have all done different same jobs where have the same services cars will return but these are the concerns here see these work orders I have to work order let's tear for this guy for this one in 80 and 5 3 & 2 this one eight four six those are duplicate workers we want to get rid of them now in a small data set like this you can just delete them but imagine you run a real grudge you'd have hundreds maybe thousands of these work orders you don't want to go through and try to delete each one individually so one thing you can use is under the same tool you're going to go up to here to the data tab now and there should be a tab here is called remove duplicates okay make sure you still highlighted click on that and it will say select all remove the duplicates and say ok to duplicate values or remove 2627 a unique remain so now you see that removed one each of those duplicates I don't like or have the full duplicates now as far as getting rid of this color now it doesn't matter just go back we highlight I'm going to go back to our home screen conditional formatting and clear rules and we slide this over as you can see it whoops sorry and highlight again and conditional formatting clear rules clear rules from selected cells okay now you see we're back no more color everything looks good again okay so the next thing I list is looking for spelling errors well on this dataset you're not really going to find any but what I want to do is I want to show you how you may find them through typical worker so let's go ahead and click some spelling errors let's go down pick one I'm go to record 8 here i'ma change oil filter to oil falter and I'm going down to record 13 and change tires to TVs okay leave alone don't worry about now we've inflicted these spelling errors I'm going to show you how we're going to find these spelling errors later because a lot of this work really this list is not a guideline of steps down this list you're going to be dealing with iterative you're going to be finding problems and keep going back and doing new things too and fix the problems so let's move on to the next step of handling missing data now this is one of the trickier parts here okay what I've got here is you can see I've got a couple workers here where the mechanics name is missing now this is a situation where you're going to have to refer to your company's policy talk to your boss whatever how do they want to handle missing data some people would want to say well just delete the records because we don't know who did it others would say well let's put in a place value let's get something in there so we don't we don't want to lose this work was done we don't know who did it but work was done the customers charged money was made but we just have no place value so let's put in a missing place five so what we're going to do is we're going to use an if statement to fix this problem okay if statements what's known as a logic statement anybody is used to programming it's done these before well we're stating basically is if something is true then we're going to do this so you know if it is raining bring an umbrella so the syntax used in excel was really simple it's equal to always use equal to start any formula or function if then we put the parentheses in if x equals y comma then the then statement so if x is raining then bringing umbrella else don't bring an umbrella okay so let's go ahead and put this into practice on our excel what we're trying to figure out is we're trying to find a way to fill in these missing spots so we're going to go right next to it and F and we're going to start evil and we're going to say if and then we're going to say if e 2 which is the mechanic thing here is equal to then just put two double quotes with no space in between that means empty then I'm going to use 90 I am putting it in quotes because I want to be a string else just leave this alone so my statement basically says if this is empty then I want to place 99 in place otherwise believe it alone keep the space okay so I'm going to say enter and there you go I got BL so we're just this little square right here on the bottom we bring your cursor over it comes an X and you double-click it fills in everything all the way down your table saves you a lot of time ok so as you see where a name was there good where it wasn't you've got this 99 value here here so good we filtered it now you have two options at this point what you can do is I can right click and cover all these copy and I can do what's called a paste special values and that will paste over top everything there and fill in my spots that's good in this place in this position it actually works a lot of times what I don't like to do that so I end up leaving it a lot and I just hide a column but in this case it's okay we're just going to copy this whole section again right click copy go here scroll down to pace go to paste special and hit this values right down here and there okay that's good now we can just get rid of this cell completely alright so now we fixed it now we have 99s where our empty space is used to be okay our next step is we're going to convert data all right now now often the data you get is not in this it's not how you want to see it you want to see it differently what I've got here is I have this date/time thing I want if I'm curious as to what day of the week this work is done because that that you know you can report on that with Sunday my busiest day is Saturday my busiest day I don't have one of those Rain Man type minds where you can just give me a date and I can tell you what day of the week it was so I'm going to use Excel to show me this so we're going to go here we're going to insert a new cell and we're going to call it weekday and now what we're going to do is we're going to do a conversion of the data we're going to use a converting function okay we're going to convert to text using the equals text function the way this works the syntax behind it is text parenthesis the cell which is the reference data you want to convert and I'm using DD DD this gives me day you could use mmm-mmm for months you could use yyy for years you could use all kinds of things you could use weeks ww for weeks but we're SKUs in DD DD gives us the date okay so going back to our Excel worksheet we're going to go here I want to say text oops I forgot the T text and you notice these things light up below you can see them and you can hit on it there and it gives you hints under what to value format text you know so the value I want is the request on date and in double quotes in Excel if you're used to like sequel use single quotes it messes you up I understand close it out and there we go now we know that day is Friday again hit that little box and you can see the day's the week all this work is done okay now if you're not concerned with the state changing leader you can do that whole copy and paste special again and that'll get rid of the formula cuz you noticed when I bring these up if you look in here it still has the formula the value instead of in this place it just has the value there but if you do do that if you do the copy and paste special if you go and change this date later this will not work again it'll it will not change properly so I'm just going to leave it with the formula we're at the 11 minute mark so I'm going to bring this to a break right now this is an alerts for all we're going to go into a second part to do the merging in there and hope you come back for part two and good welcome welcome to go visit my site analytics for all that or get lots of good videos and lots of good articles I'm working with Excel and all things analytics until next time
Info
Channel: Analytics4All
Views: 55,600
Rating: 4.9183674 out of 5
Keywords: data cleaning, excel, analytics, data munging, vlookup, data prep, data, data merging
Id: WRk9t5yo5Zs
Channel Id: undefined
Length: 12min 5sec (725 seconds)
Published: Sun Apr 03 2016
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.