Deleting duplicate rows from a table without a primary key

I have encountered several scenarios where it has been necessary to clean up duplicate rows in a table. When there is a primary key or some sort of unique identifier this is an easy task. However, when they are truly duplicates and there is no way to distinguish between the rows it becomes a little trickier.

Because of this. I have often seen people take the approach of:

  • Select distinct values from Table A into temporary table B
  • Delete from Table A where row exists in table B
  • Insert values from table B back into table A

While this works, I prefer fewer moving parts. Below are my preferred methods for cleaning up duplicates in both SQL Server and Oracle.

In SQL Server, my weapon of choice in this situation is the ROW_NUMBER window function. Let’s start by creating a table with duplicate rows.

Image 001

The ROW_NUMBER function will allow us to create that unique identifier we were looking for to distinguish between the duplicates. A simple select statement using the ROW_NUMBER function:

Image 003

Looking at the results we can see that deleting everything with a rownum greater than 1 would leave us one unique row each. We will do that next using a derived table and this same window function.

Image 002

And with that we have a duplicate free table.

In Oracle, we could simply use ROWID Pseudocolumn the to clean up the same data. If we were to select all columns from the employees table and the ROWID Pseudocolumn, we would see that we actually do have a unique identifier to work with.

rowidWe would then pick out the min ROWID for each unique row in the table grouping by all columns and delete everything else.

oradeldups

And once again we have a nice clean table.

Leave a Reply

Your email address will not be published. Required fields are marked *