In my opinion, every SQL developer is asked about the way to remove one of the duplicate records from a table data, at least once in his/her career. And there are multiple way to achieve this but the easiest of lot is using CTE (Common table expression)
Let's create a table and populate it with duplicate data first
As you can see, we have a duplicate rows for Id and Name combination for (2,'J') and (3,'D') values.
Let's create a table and populate it with duplicate data first
DECLARE @TAB TABLE(Id TINYINT,Name CHAR(1)
)
INSERT INTO @TABSELECT 1,'C' UNION ALLSELECT 2,'J' UNION ALLSELECT 3,'D' UNION ALLSELECT 2,'J' UNION ALLSELECT 3,'D'
As you can see, we have a duplicate rows for Id and Name combination for (2,'J') and (3,'D') values.
First let us try to identify the duplicate records here. This can be done using ROW_NUMBER() function quite easily.
;WITH DetectDuplicateRecords AS(SELECT
ID
,NAME
,ROW_NUMBER() OVER (PARTITION BY ID,NAME ORDER BY ID) AS RowNumFROM @TAB)SELECT * FROM DetectDuplicateRecords
ID
,NAME
,ROW_NUMBER() OVER (PARTITION BY ID,NAME ORDER BY ID) AS RowNumFROM @TAB)SELECT * FROM DetectDuplicateRecords
As you can see ROW_NUMBER() with PARTITION BY is used to identify and number the rows with duplicate data.
Now we know that the rows with RowNum = 2 are the duplicates. We also understand that whenever particular rows has to be removed from the table we use DELETE statement. Interestingly, DELETE can be written for CTE's as well and on the top of that it functions (works) on the underlying table that has featured in CTE.
This means that if we try to delete data from the CTE then it will delete from our source table. But before doing so how how about creating a table to log (hold) the deleted records.
Let's create temp table using
SELECT * INTO #TEMP FROM @TAB WHERE 1 = 2
and then use this temp table to hold the deleted records which are inserted into it using OUTPUT clause.
;WITH DeleteDuplicateRecords AS(SELECT
ID
,NAME
,ROW_NUMBER() OVER (PARTITION BY ID,NAME ORDER BY ID) AS RowNumFROM @TAB)DELETE FROM DeleteDuplicateRecords
OUTPUT deleted.id,deleted.Name INTO #TEMPWHERE RowNum = 2
ID
,NAME
,ROW_NUMBER() OVER (PARTITION BY ID,NAME ORDER BY ID) AS RowNumFROM @TAB)DELETE FROM DeleteDuplicateRecords
OUTPUT deleted.id,deleted.Name INTO #TEMPWHERE RowNum = 2
Let us check whether data has been removed from the table or not
SELECT * FROM @TAB
Duplicate data has been removed from the table. But what about temp table that we had created to hold the deleted data ? Let's check that as well.
SELECT * FROM #TEMP
DROP TABLE #TEMP
So this way we can Delete and Log the duplicate data from SQL table.
No comments:
Post a Comment