10 Super Neat Ways to Clean Data in Excel

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello and welcome to the video about Trump Excel I am somewhat bunsen and in this video I will show you ten ways to clean data in Excel here I have the text welcome to Trump Excel written in four different ways first one is the regular way with only one space between words in the second case I have more than one space between words in a third case I have some leading spaces along with a couple of spaces between words and in the fourth case I have trailing spaces you can see there are a couple of space after the last word now this could typically be the case if you get this data from a colleague or you get it from a text file or imported from a database so to clean this data and get rid of these extra spaces you can use the function trim trim function takes one single argument which could either be the text which you type manually or it could be B cell reference in this case I will take the cell reference a 1 and what this function does is it would remove all the leading spaces and trailing spaces and extra spaces between words except one single space that is allowed so if I drag this down you would see that it has corrected all these text it has removed the extra space here between welcome into it has removed the leading spaces and trailing spaces here now this is a formula if you just need to use the text you can convert it into values by using paste special I have student names here and their marks in three subjects you can see that there are gaps in this data set which could be because the student could not appear in the exam now you may not want to leave this data set with blanks you may want to type not appear in all these cells which are blank so to do that you can either go and select each cell manually and type not appear but if you have a huge data set that because this could be very tiresome so to do it at one go select the entire data set go to find and select and select this option go to special this opens the go to special dialog box you can also use the keyboard shortcut f5 and when you do this it opens the go-to dialog box here you have special button click on it and it again opens it equal to special dialogue box click blanks and click okay this would select all the blank cells in your data set at the same time so now you have these cells in gray and the first cell is in white because this is the active cell so to type not appear in all these cells just start typing not appear and hit ctrl enter and as soon as you hit ctrl enter this gets entered in all the cells here I have this number 123 entered in three different ways in the first case it is a number as you can see it is aligned to the right of the cell numbers are always aligned to the right while text gets aligned to the left of a cell and in the other two cases you can see these are text format because these are aligned to the left now to convert all these three back into numbers the first one is already a number but to convert these two back into a number there are two ways to do it the first one is I would go here the formatting box and I would type general and when I hit enter the second one gets converted back into a number because in this case it was it was merely in the text format but the third case is a little more difficult because it has been entered by using a leading apostrophe and a lot of people do this a lot of people enter numbers for starting with an apostrophe so that it gets converted into a text and this could create some problems for example if I have this number 125 here and if I check for a condition that this number is greater than 123 which it obviously is so it should return a true now I would check this condition for all these cells let me lock the first cell now I would check it for all these three conditions for all these three cells and you can see the first two are true which is the right way but the second one is false the reason being that Excel always considered text to be greater than a number so no matter how big your number is if there is a text that text would always be rated higher than a number and in this case you can see there is an apostrophe so this is still a text even after changing the format so to take care of this let me delete this to take care of this a very foolproof method is type 1 in any of the blank cells go to the cell and copy this now select these cells go to paste paste special and this opens the paste special dialog box here you have operation category within this select multiply and click okay so what this does is this multiplies this number with 1 and any number multiplied by 1 unchanged but this also takes care of the apostrophe so now all these three numbers get converted back into a number format here I have a data set of students and their marks in three subjects and there are duplicates in this data so you can see there is a duplicate for Bill and duplicate for Phil now if you want to remove these duplicate values there are two ways to do it first is using conditional formatting so you can select the data set go to home conditional formatting here you have highlight cell rules and go to duplicate values and as soon as you select this it gives you the option to highlight duplicates and the formatting I will keep the formatting as a red fill with dark red text and when I hit OK you can see that this has been highlighted and all those numbers and names that appear more than once it highlighted in red I I can manually see that Phil repeats twice and Bill repeats twice so I can select this data and manually delete this I will control Z to go back so this is the conditional formatting way the other way to remove duplicates is by selecting the entire data set going to data and here I have the option remove duplicates I click on this option and it opens the remove duplicates dialog box here make sure that if your data has headers which in this case it has this option is selected if this is not selected then this is also counted as a part of your data it should not be the case when you have selected this option these names are the names of the columns so I can see that there is a student column math column physics column in chemistry column I can then select ok to remove all those rows or all the data set which is duplicate but in this case it would not remove a number which repeats again rather the entire row has to be an exact duplicate so for example Bill and these three marks has to meet exactly the numbers and name here bill and these three marks and if that is the case then this row gets deleted similar is the case with Phil and these numbers so now when I click OK it says one duplicate values found and removed 11 unique values remain the reason being that in case of Phil in case of Phil you can see that the marks does not match so this entire row is not an exact match and hence it remains but since in case of Bill it was an exact match then the row was removed here I have a data set for five companies I have their revenue number for three years and net income numbers for three years and using these numbers I have calculated the net income margin which is net income by revenue now you can see that there are errors in 2014 data for Company B and Company D the reason being that there is no revenue number for these companies in 2014 and hence I get a division error because I try to divide their net income by nothing now this is a small data set and you can this visually spot these errors but if you have a huge data set these errors could be difficult to spot so to do that you can use two methods first is using conditional formatting so select this entire data set go to home conditional formatting and select new rule within new formatting rule dialog box select format only cells that contain and from this drop-down select errors when you select errors you would get the option to format the cells which has error in this case let me select red and I click OK okay and as soon as I do this all the cells that have errors in it get highlighted in red we control Z to go back the other way to do this would be to select those cells which have errors and you can do this by using the go to special dialog box so to do that press f5 this opens the go to dialog box here you have the special button click on it this opens the go to special dialog box button go to special dialog box here select formulas and within formulas as soon as you selected all these four options get available deselect these first three options and only keep the errors option selected and now click OK when you do this all those cells which have error in it get selected now you can manually either delete all these cells or type something like not oops not available and hit control enter so that it gets entered in all the cells which have error in here I have names written in different ways you can see either it could be all caps it could be all lowercase and in some cases it's a mix-and-match of uppercase lowercase so to make it all consistent you can use one of these three formulas lower upper and proper lower formula takes one argument it could be either the text that you type in or you can use a cell reference in this case if I'll use the cell reference a3 and when I hit ctrl enter this gives me jane fonda the name but all the alphabets have been converted into lowercase and when I drag this down this is the case for all these names all these names now look consistent in lowercase you may want these all in the uppercase so in that case you can use the formula upper and you can see this these are all in uppercase now as I drag this down the most used way is proper case because it would keep the first alphabet of your name as in capital and the rest all would be in the lowercase may show you I will select proper III and I had ctrl enter and you can see J of Jane and Fonda F of Fonda is in caps and rest all the alphabets are in the lowercase and now I drag it down so these are three formulas that can very quickly make your text consistent this could be the case when you are sharing a worksheet or you get it from a text file where a lot of people enter it in different ways these formulas can quickly make these consistent I have these address here in column a and I want to break these address based on their locality city and country so you would see that they follow a consistent format where the locality name is followed by city name and followed by country name and the separator is always a comma so there would be locality comma City comma country now if you have your data in a consistent format then you can use an Excel feature called text to column to get this data in these columns and separate these localities city and country based on the separator which is a comma now I'll show you how to do it so you go to data and here you have the option of text to column click on this and it opens the text to columns wizard and taste it has three steps the first is to identify whether it's a fixed length data fixed width data or delimited in this case it is not fixed width you can see the characters the locality length is different in all these case and same is the case with city and country so I would use a D limited i would go to next and here i need to specify the delimiter so in this case my delimiter is comma as you can see all these three things locality city and country are divided by a comma so it would use the comma to divide these and you can see the preview here i would have the locality in one column city in one column and country in the other column now i hit next in case you have other delimiters save space or tab or semicolon or any other daily limiter then you can specify that here now i hit next and when i hit next it also gives me the option to select the format in this case i want to keep the format as general but in case you are doing this with numbers or dates then you can specify that format also you need to specify the destination if you do not do that it would overwrite this but it is a good idea to select the destination in to it at some other place in this case I have selected b2 and I hit finish okay and as soon as I do this you can see that the locality all for all these address the locality comes here city comes here and country comes here so it's a very good to change your data parse your data using text to column if you have huge data set and you want to only extract a part of it while Microsoft PowerPoint and Microsoft Word have a feature where it would underline if there are any errors grammatical errors or spelling errors Microsoft Excel does not have that feature however you can still a run spellcheck and correct these errors so to do that select the data and press f7 and when you do that it ruins the spellcheck for you and it is the same thing that you see in market Microsoft Word or PowerPoint it will show you the text that it thinks is an spelling error and it will show you the suggestions as well so you can change these and once it is done it will show you that spellcheck is complete and you are good to go if you have a worksheet where there is lot of formatting and you need to clear all the formatting you can quickly do that by selecting the entire data going to home and here with in editing you have this option clear within clear you have the option clear formats you can also use clear all this would remove everything from your sheet including the content you can only clear the content would remain the formatting would remain intact you can clear the comments and the hyperlinks in this case let me show you what happens when you clear the formatting so click on clear formats and as soon as you do this you can see that all the colors including the borders have gone now you can simply introduce the borders if you want and you have your clean data I'll show you four ways using find and replace that can clean your data in the first case you can change your cell references using find and replace so here I have data set with students and their marks in three subjects and I have these two names Jane and Martha and their marks in maths you can see Jane has 69 and Martha is 91 to get this I have used a formula so this is an index formula where I take the math column as input and I match the name with column a and wherever there is a match it fetches the marks for that column for that subject but say for example I need to change this and get the marks for physics now this is this is a small data set with only two student names but if I have a huge data set of hundreds of students and a huge master data set then it would be very difficult to do this manually you can do this by using the find and replace feature to do that press ctrl H this opens the Find and Replace dialog box and in find what change the reference in this case I want to change dollar be to dollar C dollar C in this case right now I have the numbers formats and hence I want to change dollar be $2 C and it would give me the numbers for map physics so I do I go to the place fall and it makes for replacements and now you can see that I have the new numbers for physics for Jane I have 71 and for Martha I have 46 similarly I can make it for chemistry I could change this from dollar dollar C to dollar T and now when I replace all it updates these new numbers you can also use Find and Replace to find specific formats and change these formats so to do that let me show you how to do it I would press ctrl H and this opens the Find and Replace dialog box here I select options and I would clear these boxes and here I have this option of selecting the formats so I would want to find a format either I could specify the format I have in mind and then look for it or I can choose the format from a cell so let me select a format from here and it says this is the preview so this is the format I want to find and I want to change this format to say I want to fill these cells with orange light orange color and I click replace all and you would see that instantly all the formats have been changed so you can also use Find and Replace to identify formats and change those formats Find and Replace can also be used to remove line breaks from your text so for example here I have the address where the locality city and country are separated by a line breaks you can see that it's the same cell but they are in different rows now to remove these line breaks press select the data and press ctrl H this opens the Find and Replace dialog box you cannot use alt enter to find that line break but the trick here would be to use ctrl J so when you are in the find what box press ctrl J and you can see that there is a blinking dot here go to the replace with box and just type a single space bar and replace all and this would make all these replacements and you can see that the line breaks have gone and now I have the entire text in a single line find and replace can also be used with wildcard characters to clean the data so in this case I have these 5 company names along with a code name in parentheses and I want to remove all these parentheses and code names I only want the name of the company so to do that I select the data and I press ctrl H here I would type beginning round bracket yes risk ending round bracket this would identify all those characters which are within parentheses because a stress represent any number of characters so all these code ABC within parentheses would be found by this formula and I would replace it with nothing because I want it to be completely erased now if I go to find all you can see that it has found all these codes within parenthesis and now if I do replace all it would quickly replace all these and my data is all cleaned up so these are ten quick tips to clean your data in Excel if there are any tricks that you use please share it with us I hope you found this video useful thank you and have a nice day
Info
Channel: TrumpExcel
Views: 1,599,817
Rating: 4.9056678 out of 5
Keywords: Clean Data in Excel, Data Cleaning in Excel, Data Cleansing in Excel, cleaning up raw data in excel, how to clean data in excel, data cleaning, Excel Data Cleaning, trim function in excel, top data cleaning tricks in excel, how to do data cleansing in excel, data cleansing, cleaning data in excel, cleaning up excel spreadsheet, Excel, Excel Tips, Excel Data, how to clean raw data in excel, clean raw data in excel, excel tutorial, clean raw data, Data cleaning formulas
Id: e0TfIbZXPeA
Channel Id: undefined
Length: 18min 46sec (1126 seconds)
Published: Sun Aug 24 2014
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.