Tuesday, 12 May 2015

Interview Question - How to delete duplicate rows from the table using T-SQL ?

In my opinion, every SQL developer is asked about the way to remove one of the duplicate records from a table data, at least once in his/her career. And there are multiple way to achieve this but the easiest of lot is using CTE (Common table expression)
Let's create a table and populate it with duplicate data first


As you can see, we have a duplicate rows for Id and Name combination for (2,'J') and (3,'D') values.

First let us try to identify the duplicate records here. This can be done using ROW_NUMBER() function quite easily. 

;WITH DetectDuplicateRecords AS(SELECT

As you can see ROW_NUMBER() with PARTITION BY is used to identify and number the rows with duplicate data.

Now we know that the rows with RowNum = 2 are the duplicates. We also understand that whenever particular rows has to be removed from the table we use DELETE statement. Interestingly, DELETE can be written for CTE's as well and on the top of that it functions (works) on the underlying table that has featured in CTE.

This means that if we try to delete data from the CTE then it will delete from our source table. But before doing so how how about creating a table to log (hold) the deleted records.

Let's create temp table using 


and then use this temp table to hold the deleted records which are inserted into it using OUTPUT clause. 

;WITH DeleteDuplicateRecords AS(SELECT
OUTPUT deleted.id
,deleted.Name INTO #TEMPWHERE RowNum 2

Let us check whether data has been removed from the table or not


Duplicate data has been removed from the table. But what about temp table that we had created to hold the deleted data ?  Let's check that as well. 



So this way we can Delete and Log the duplicate data from SQL table.

No comments:

Post a Comment