There are various ways to find duplicate records in SQL Server. Like using group and count, partitioning result set etc.
This article talks about finding duplicate records using two of these methods. I will straight away jump into the SQL example for you to understand easily and quickly.
Create a table name with [STG].[TMS_TOP_PROGRAMS] with following entries into it. This table I have all the programs along with some other attributes like ID, Sort Order etc -
PROGRAM_ID | PROGRAM_NAME | SORT_ORDER | BATCH_ID |
11 | Align the Right Customers to the Right Solutions at Optimized cost | 1 | 100 |
11 | Align the Right Customers to the Right Solutions at Optimized cost | 1 | 100 |
12 | Create competitive advantage through business analytics and benchmarking | 6 | 100 |
13 | Deploy future state CSSG operating model | 2 | 100 |
13 | Deploy future state CSSG operating model | 2 | 100 |
14 | Enhance employee engagement | 7 | 100 |
14 | Enhance employee engagement | 7 | 100 |
15 | Expand Direct Sales Capabilities to Grow Revenue | 3 | 100 |
16 | Optimize Service Contacts to Create Sales Opportunities | 4 | 100 |
16 | Optimize Service Contacts to Create Sales Opportunities | 4 | 100 |
17 | Simplify End to End Customer Experience | 5 | 100 |
17 | Simplify End to End Customer Experience | 5 | 100 |
6 | OpEx Enterprise-Wide Programs | 6 | 100 |
6 | OpEx Enterprise-Wide Programs | 6 | 100 |
As you see there are lot of duplicate programs in this table. One ways to find duplicate records is by using Group clause as under -
SELECT
PROGRAM_ID, [PROGRAM_NAME], KEY_PROGRAM,
SORT_ORDER, [BATCH_ID], COUNT(PROGRAM_ID)
FROM [STG].[TMS_TOP_PROGRAMS]
GROUP BY PROGRAM_ID, [PROGRAM_NAME], KEY_PROGRAM, PROGRAM_SORT_ORDER, [BATCH_ID]
HAVING COUNT(PROGRAM_ID) > 1
Another and yet more efficient way to find duplicate by partitioning the table based on the key, here is what I mean..
SELECT PROGRAM_ID, [PROGRAM_NAME], KEY_PROGRAM, PROGRAM_SORT_ORDER, [BATCH_ID], RANK_ID FROM
(
SELECT PROGRAM_ID, [PROGRAM_NAME], KEY_PROGRAM,
SORT_ORDER, [BATCH_ID],
ROW_NUMBER() OVER (
PARTITION BY PROGRAM_ID ORDER BY PROGRAM_ID
) AS RANK_ID
FROM [STG].[TMS_TOP_PROGRAMS]
) AS INTBL WHERE RANK_ID > 1
No comments:
Post a Comment