Tuesday 12 May 2015

Interview Question - How to delete duplicate rows from the table using T-SQL ?

In my opinion, every SQL developer is asked about the way to remove one of the duplicate records from a table data, at least once in his/her career. And there are multiple way to achieve this but the easiest of lot is using CTE (Common table expression)
Let's create a table and populate it with duplicate data first

DECLARE @TAB TABLE(Id TINYINT,Name CHAR(1)
)
INSERT INTO @TABSELECT 1,'C' UNION ALLSELECT 2,'J' UNION ALLSELECT 3,'D' UNION ALLSELECT 2,'J' UNION ALLSELECT 3,'D'
 




As you can see, we have a duplicate rows for Id and Name combination for (2,'J') and (3,'D') values.



First let us try to identify the duplicate records here. This can be done using ROW_NUMBER() function quite easily. 

;WITH DetectDuplicateRecords AS(SELECT
  
ID
  
,NAME
  
,ROW_NUMBER() OVER (PARTITION BY ID,NAME ORDER BY IDAS RowNumFROM @TAB)SELECT FROM DetectDuplicateRecords

As you can see ROW_NUMBER() with PARTITION BY is used to identify and number the rows with duplicate data.


Now we know that the rows with RowNum = 2 are the duplicates. We also understand that whenever particular rows has to be removed from the table we use DELETE statement. Interestingly, DELETE can be written for CTE's as well and on the top of that it functions (works) on the underlying table that has featured in CTE.

This means that if we try to delete data from the CTE then it will delete from our source table. But before doing so how how about creating a table to log (hold) the deleted records.

Let's create temp table using 

SELECT INTO #TEMP FROM @TAB WHERE 2

and then use this temp table to hold the deleted records which are inserted into it using OUTPUT clause. 

;WITH DeleteDuplicateRecords AS(SELECT
  
ID
  
,NAME
  
,ROW_NUMBER() OVER (PARTITION BY ID,NAME ORDER BY IDAS RowNumFROM @TAB)DELETE FROM DeleteDuplicateRecords
OUTPUT deleted.id
,deleted.Name INTO #TEMPWHERE RowNum 2

Let us check whether data has been removed from the table or not

SELECT FROM @TAB


Duplicate data has been removed from the table. But what about temp table that we had created to hold the deleted data ?  Let's check that as well. 

SELECT FROM #TEMP


DROP TABLE #TEMP

So this way we can Delete and Log the duplicate data from SQL table.

No comments:

Post a Comment

bloggerwidgets