Sql-server – Working out Duplicated Data in a Customer Data Base

duplicationsql serversql-server-2008

I have a database in SQL2008 of customers, There is an ID number for the customer. What im trying to do is to create a NEWID field.

We have duplicated customers that have different ID fields, I am combining the Surname and Postcode to find the duplicated customers.

As you can see in the image with an example, i need to do the following.

1.Check if PostSur field have a match,

2.Check if there is more than 1 IDNO,

3.Check the next IDNO against the First IDNO and if there is a unique entry in the Name field. Then populate that record with the First IDNO in a new field called NewCode.

4.Otherwise the Next IDNO populate the NewCode with 'Dupe'

5.Populate all other IDNO in the NewCode also.

So i can then hide the Dupes and i should then have unique customers using NewCode.

I have the following SQL which uses ROW_NUMBER but cant figure out how to do the above with it?

   select *
 from (
  select *,
      row_number() over (partition by PostSur  order by Postsur) as rowID 
 from Aron_Reporting.dbo.Customer_NewSort
   ) as rowID
 order by PostSUr

any help would be great

enter image description here

Best Answer

Here are your duplicates, then everything else is your non-duplicate.

with c as (
    select *,
    row_number() over (partition by PostSur, Name  order by ID) as rowID 
    from Aron_Reporting.dbo.Customer_NewSort
select c.*
    from c 
    inner join c c2 on c.PostSur = c2.PostSur AND c.rowID > 1 AND c2.rowID = 1 as rowID
     order by PostSur

This code partitions the row numbers by PostSur but also name, that way every person with a different name, or PostSur starts the numbering over. I joined the results back to the anchor, but you really don't need that necessarily.

I think I understood your requirements, hopefully this at least gets you close.