3 EASY Ways to Find and Remove Duplicates in Excel

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
today we're going to take a look at a very common task when it comes to cleaning data and it's also a very common interview question that you might get if you're applying for a data or financial analyst type of job how can you remove duplicates in your data i'm going to show you three methods it's important that you understand the advantages and disadvantages of the different methods and why one of these methods might return a different result to the other ones let's take a look [Music] okay so i have this table with sales agent region and sales value i want to remove the duplicates that occur in this table but first of all what are the duplicates well if we take a look at this row for example and take a look at this one is this a duplicate no right because the sales value is different but what about this one and this one these are duplicates what i want to happen is that every other occurrence of this line is removed i just keep it once in the end result this means that i need to take a look at the combination of all the columns in order to be able to identify duplicates now the first example that i'm going to show you is going to work in all excel versions but first you need to make a decision do you want to keep the original data set or not if you want to keep it so you can compare before and after you should create a copy of it so i'm going to press ctrl a let's press it again to also include the headers and press ctrl c to copy this now let's go to the side here i'll paste these values and also paste the formatting as well okay so now i can just click somewhere inside this data set go to the data tab under data tools here there is a feature called remove duplicates just click on it it's going to try and pick out your data range in my case it's correct there is an automatic check mark here for me to specify that the data has headers if yours doesn't pick up the headers properly you can just place a check mark here yourself the columns involved are fine so i'm going to click on ok five duplicate values found and removed 14 unique values remain now let's take a look i just have one occurrence of this combination okay so in case you just want to remove the duplicates of one column so let's say you want a distinct list of sales agents you can repeat this process just on the sales agent column so i'll just show you how that's done i'm going to copy this go to the side here let's just paste this in you also don't really need the header in this case you can select your range or just click somewhere inside the range and go to remove duplicates i'm going to place a check mark here my data has headers and click on ok and this time 15 duplicate values were found and removed and i end up with four sales agents okay so that's how you can apply it to a specific column that was method one now let's take a look at method two this method uses formulas and it's something you can use if you have office 365 this of course works on the web version of excel so if you have a free office account you can use these formulas on excel on the web start with equals the function we need is the unique function so this is one of the newer functions in office 365 all we need is the array and the array is our table so just click to the side of the table until you see that tilted arrow then click it's going to pick up your entire table mine is called t sales that's all i need close bracket press enter and i have my unique list i've removed duplicates from this data set now i'm just going to copy and paste the headers here and let's just apply the number formatting to these as well now the advantage of using a function instead of the first method that i showed you is that the second method here is dynamic this means if i add data to the bottom of my table it's automatically going to be reflected with this version let's test this i'm going to copy this and paste it here well right now we're not going to see anything because duplicates are automatically removed but if i change this number to let's say 40 it automatically shows up here whereas with this method i would have to repeat the steps to get my unique list okay so now let's take a look at the third way of doing this i'm just going to press ctrl z a few times to go back to our data set the third way is with power query now this method is semi-dynamic because all you have to do is press refresh let's take a look i'm going to go to the data tab here this time instead of going to the data tools section we're going to go to the get and transform data first of all power query needs to get the data and then we can transform it our data is right here so we're going to go with from sheet this is going to bring up the power query editor we have our source step here which was just our table and power query automatically added a change type step for the headers it just transformed them to text for this and to a whole number for our sales values which is fine now to remove duplicates i'm going to hold down shift so the first column is selected i'm holding down shift and i'm going to click on the last column then right mouse click and remove duplicates okay so everything looks good let's call this t cells report and send this back to the worksheet so close and load two i want to put it on the existing worksheet let's put it right here so we can compare the results and click on ok it says 15 rows were loaded now something is different can you spot the additional row it's right here in this case bridge was written with a small b instead of a capital b so this line was not seen as a duplicate even though it exists up here this is something you have to keep in mind when you use power query and you're removing duplicates there because the formula version sees these as the same thing it's not case sensitive the remove duplicates feature from the data tab is also not case sensitive but power query is case sensitive so if we want to remove the duplicate right here we need to add an additional step in power query to get this to work correctly so let's go back to our query so bring up queries and connections i'm just going to double click to bring up the power query editor and jump back a step to change type so right before removing duplicates i need to make sure that everything has the same case in this example the problem applies to sales agent not to region but if you have multiple columns that could have a different letter case you need to apply this step that i'm going to show you to the other columns as well and that step is to transform the column to make sure the content of the column is in the same case so i clicked on the transform tab i have this already selected i'm going to go to format here and select a format that's common to the whole column we can put everything in uppercase lowercase in this case it makes sense to capitalize each word so i'm gonna go with that it asks me if i want to insert a step in between these two yes i do okay so now our bridge here is capitalized so when we go to remove duplicates it's automatically removed so let's go and send back the result to excel just click on close and load and now we get the same results okay now we can also test how this updates when we add something new to this so i'm gonna copy and paste this here let's change this to 50. formula version automatically updates this version doesn't automatically update i have to right mouse click select refresh and i get the new data show up here as you can see there are different ways you can remove duplicates use the one that suits your excel version and also your process also remember that power query is case sensitive okay so that's it for today give this video a like do subscribe if you aren't subscribed already and i'm gonna see you in the next video you
Info
Channel: Leila Gharani
Views: 182,117
Rating: 4.97644 out of 5
Keywords: XelplusVis, Leila Gharani, Advanced Excel tricks, Excel online course, Excel tips and tricks, Excel for analysts, Microsoft Excel tutorials, Microsoft Excel, Excel 2016, Excel 2013, Excel 2019, XelPlus, Microsoft 365, Excel 365, excel remove duplicates, remove duplicates, excel formula unique list, excel get unique list, excel remove duplicates keep first, excel remove duplicates from column, excel remove duplicates formula, excel remove duplicates multiple columns, duplicates
Id: ADArCWLz55Y
Channel Id: undefined
Length: 8min 56sec (536 seconds)
Published: Thu Jul 22 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.