Delete Duplicate Rows from SQL Table using 3 ways

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everyone i'm amythest at gates at SQL and in this tutorial we will learn how to delete duplicate rows so in SQL Server there are three ways we can use the lead duplicate rows we'll discuss each and every example here in March today let's start and stay tuned well it happens sometimes like we receive the more than one occurrence of the same row into the database which create a problem with the result sets so we print was in added to delete those duplicate from the data and retain only one existence of red okay so how we can do this let me show you the very first example using the sub query so in sub query I already have an example I would go to sample database and I have created a sample table that I will recreate again just to give you a clear idea a new table is created with four columns you can see ID first name last name and address right where ID is an identity follow okay now table has been created will insert some value into it fine so nine rows are inserted and you will see deliberately I have insert I have inserted few repeated values like let John then say retain then Linda Albert Europe three times then pretty soya India four times right if I show you the table you can see that Rose has been repeated in this table but I want to keep each existence of first name last name and address I don't want the repetition here so how would I delete this fine you will see I have a ID column which is identity enabled and and every row I have a unique number of consecutive number here so based on this I can identify the maximum ID of each row set okay like John Linda retain repeating voice but the maximum idea is to in a same way Linda Albert Europe repeating thrice but the maximum ID for this rosette is 5 in the same way pretty joy India has the maximum aidid 9 so first using correlated sub-query I will identify the maximum ID values and then we'll see how to delete this so I have written up correlated sub-query now what is corrected sub-queries I have already given a few bullet points here that query nested within another query is called sub query you can see one select statement is written here and then the where clause I have written up another select statement right now this is a sub query it is also a coolant at sub query way because the innermost queries output is depend on the outermost query because the outermost query is giving reference to the innermost query so we are the first name last name and address matches only those ID should be matched with the outer query ok but in innermost query I am asking for the maximum ID of each row set for first name last name and address and that should be matching with the outer query only those sets of maximum ID would match with the outer queries ID and then will be produced as an output okay so the John Linda retain it having a maximum ID - so maximum ID for that tag that first name last name and address will be matched here - so that row only for this row set will be produce a now as an output in a same way the fifth and night let me show you in the example query executed and the maximum ID values for each first name last name and address is displaying here right so in a way I have identified the rows now on the same basis if once I have identified the row then I can also delete those but I am this time not deleting the maximum row I am saying retaining those roles and deleting only those from the student where ID should not be maximum for the same row set okay so enormous query an outermost query would run for the same first name last name and address but their ID should not be maximum whatever the maximum ID is produced by the inner query should not match with the ID of outer query and those all rows should be deleted okay so that means nine for pretty fight for Linda and two for John should not be deleted a rest all should be deleted okay cuz ID should not be maximum here should be lesser than the maximum ID that I am trying to say okay once I run this query you can see six row has been deleted and if I go back to the table again you will see maximum ID for each row set has been retained and all deleted so that is the first way now let's see how we can delete repeated values using common table expressions and row number function in this example we'll be deleting rows using row number and common table expression let me first prop that table again and recreate it right and then insert repeated value again now the table is fresh with all new reports and this time you can see the same row set John let's save retain Linda Albert Europe and three TGIF India is repeating again but this time we don't have any unique column to identify the sequence of the rows okay or minimum or maximum values of each row set so this time all roles are very identical with data sets so this time I will use row number function and I will manipulate create a ID in the query itself if the ID column is not given here then I'll by myself using the row number function but create an ID but in the Select statement what I am doing first we need to understand what row number function does so row number function basically assign a consecutive number to each rule based on the over Clause now what this over Clause is doing it is partitioning the complete row set of that data based on first name last name and address so it means that row set would be partitioned separately that rosette would partition again separately and then this would have a separate partition and each partitioning would assign a consecutive value it means this is a separate partition so this row number function will assign a consecutive values 1 2 and then this partition is separate so will reassign a value starting from 1 2 & 3 and then for third partition it would be 1 2 3 4 right let me show you this query I had run this and you can see the new column which is assigning subsequent number to each row set starting from 1 every time it is starting from 1 because this is a separate partition so for each partition it is creating a sequence number okay now this query we can use into CTE as expression and further this CT can be used in the delete statement so what I will be doing I have put this expression into CP expression and will delete the rows from the CDE directly where our n is greater than 1 so if I execute this statement you'll see again 6 was booted and if I play the table you can see only single existence of each rosette has been retained else everything has been deleted so this was the second way of doing this now let's see how we can delete the duplicate roles using temp table deleting rows using the temp table is the most easiest way okay let me just drop and recreate the table again and reassign the duplicate rows instead so again we have a fresh table with the repeated rows now in this method what I will do is to create a temporary table using the distinct keyword okay so I will query this table using the distinct keyword and you will see only the rows without repetition will be produced here and this result set I will dump into a temp table using into statement okay so I am selecting distinct values first name last name and address from the student table and using into statement I am dumping that result set into a temporary table let me make you understand that what the temporary table temporary table is basically a table which remain available only for the session okay whenever we create a table with adding hash with this name then this table becomes our temporary table it means it remains available for the session and as I lost the connection from the session the table will be deleted automatically or otherwise by force using drop statement I can drop that table fine so the Select statement will perform three activities what these three activities are it will first retrieve the distinct row sets from the student table then in the second step it will create a temporary table with the same structure of the student table and in this third step using into keyword it will assign all that distinct values into that temporary table fine so this single selected statement is actually doing three things retrieving the data creating a temp table and dumping the data into that comfortable I will execute this statement the three rows has been inserted now once I have taken the back of of all distinct rows from the table then I can flush out the main table but before that let me show you the main table and temp table you can see the main table here still has repeated values but the temp table has only the unique hose now I will flush out that mean table okay Paul Rowles deleted and now I will insert the distinct backup rows from the temp table into the main table inserted and this time you can see this table has only unique lows not if I wish I can prop that am able otherwise if I block out from the sequel session that table would automatically be deleted so that is all for today's session in this session we have learned that how to delete duplicate rows from the table in three ways first way is using correlated subqueries in the second way deleting rows using row number function and suti expression and in the third way using temporary table we can delete duplicate rows if you wanna read out more about sequel server blocks you can visit my website www.marykay.co.uk/awilliam that this tutorial has added some value into your knowledge please like this video and subscribe my channel I will come soon with more interesting and learning videos till then please take care stay blessed thank you
Info
Channel: GetSet SQL
Views: 4,980
Rating: 4.9452057 out of 5
Keywords: delete duplicate rows in sql, sql server query, sql server basic query, delete duplicate rows from table
Id: 0uQVK_3Q_mU
Channel Id: undefined
Length: 12min 21sec (741 seconds)
Published: Wed Apr 22 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.