Sql-server – Identify co-related record between two columns with in the group

sql server

I need to identify and update co-related records under Co_Related_Item column as depicted below. Table name is CoRelated.

Scenario explaination:
In the first group i.e. [GroupID] = 1 we can see b=c, c=d and e=d, hence b=c=d=e and need to be flagged 'YES'. They are co-related. But x is not co-related to any item with in the group and hence it is a different item. So we can ignore this record.

+----------+---------+-------+-------+-----------------+
|                                       Required result|
+----------+---------+-------+-------+-----------------+
| UniqueID | GroupID | Item1 | Item2 | Co_Related_Item |
+----------+---------+-------+-------+-----------------+
|        1 |       1 | x     |       |                 |
|        2 |       1 | b     | c     | YES             |
|        3 |       1 | c     | d     | YES             |
|        4 |       1 | d     |       | YES             |
|        5 |       1 | e     | d     | YES             |
|        6 |       1 | e     |       | YES             |
|        7 |       1 | e     | c     | YES             |
--------------------------------------------------------
|        8 |       2 | a     |       |                 |
|        9 |       2 | b     |       |                 |
--------------------------------------------------------
|       10 |       3 | x     |       |                 |
|       11 |       3 | y     |       |                 |
|       12 |       3 | a     | b     | YES             |
|       13 |       3 | b     |       | YES             |
--------------------------------------------------------
|       14 |       4 | x     |       |                 |
|       15 |       4 | y     |       |                 |
|       16 |       4 | x     |       |                 |
|       17 |       4 | c     |       | YES             |
|       18 |       4 | d     |       | YES             |
|       19 |       4 | d     | c     | YES             |
--------------------------------------------------------
|       20 |       5 | a     | c     | YES             |
|       21 |       5 | b     | c     | YES             |
|       22 |       5 | b     |       | YES             |
|       23 |       5 | c     |       | YES             |
+----------+---------+-------+-------+-----------------+

My first try:

Update a SET a.Co_Related_item = 'Y' from CoRelated a inner join CoRelated b on a.[GroupID] = b.[GroupID] and (a.Item1 = b.Item2 or b.item1 = a.Item2)

My Second try:
I added a new column Flag and performed two step query as posted below.

1)
Update a SET a.Flag = b.Item1 from CoRelated a inner join CoRelated b on a.[GroupID] = b.[GroupID] where (a.Item1 = b.Item2 or b.item1 = a.Item2)

2)
Update a SET a.Flag = b.Item1 from CoRelated a inner join CoRelated b on a.[GroupID] = b.[GroupID] and (a.Item1 = b.Flag or a.item2 = b.Flag)
where a.Flag is null

But I am missing few items with my queries.

Note: I have 6 million records with 300K GroupID's in my table. Please give an efficient query.

Best Answer

Your update statement, is not covering all possibilities. See record id 6 and 22 ; these records are not marked.

To have an efficient query , you need also some indexes on your table. One that starts with column GroupID ...

-- preparation part

create table dbo.tblSource
(UniqueID int not null
,GroupID int not null
,Item1 char(1) not null
,Item2 char(1) null
,Result char(1) null
,Flag char(1) null
,FlagRes char(1) null)

insert into dbo.tblSource(UniqueID,GroupID,Item1,Item2,Result)
 select 1 , 1, 'x',NULL, '' UNION ALL
 select 2 , 1, 'b','c', 'Y' UNION ALL
 select 3 , 1, 'c','d', 'Y' UNION ALL
 select 4 , 1, 'd',NULL, 'Y' UNION ALL
 select 5 , 1, 'e','d', 'Y' UNION ALL
 select 6 , 1, 'e',NULL, 'Y' UNION ALL
 select 7 , 1, 'e','c', 'Y' UNION ALL
 select 8 , 2, 'a',NULL, '' UNION ALL
 select 9 , 2, 'b',NULL, '' UNION ALL
 select 10 , 3, 'x',NULL, '' UNION ALL
 select 11 , 3, 'y',NULL, '' UNION ALL
 select 12 , 3, 'a','b', 'Y' UNION ALL
 select 13 , 3, 'b',NULL, 'Y' UNION ALL
 select 14 , 4, 'x',NULL, '' UNION ALL
 select 15 , 4, 'y',NULL, '' UNION ALL
 select 16 , 4, 'x',NULL, '' UNION ALL
 select 17 , 4, 'c',NULL, 'Y' UNION ALL
 select 18 , 4, 'd',NULL, 'Y' UNION ALL
 select 19 , 4, 'd','c', 'Y' UNION ALL
 select 20 , 5, 'a','c', 'Y' UNION ALL
 select 21 , 5, 'b','c', 'Y' UNION ALL
 select 22 , 5, 'b',NULL, 'Y' UNION ALL
 select 23 , 5, 'c',NULL, 'Y'

your's updates

Update a SET a.Flag = b.Item1 from tblSource a inner join tblSource b on a.[GroupID] = b.[GroupID] where (a.Item1 = b.Item2 or b.item1 = a.Item2)


Update a SET a.Flag = b.Item1 from tblSource a inner join tblSource b on a.[GroupID] = b.[GroupID] and (a.Item1 = b.Flag or a.item2 = b.Flag)
where a.Flag is null

this is what I come with:

update s1
set flagRes = case when exists(select 1 from dbo.tblSource as s2 
                           where s2.GroupID  = s1.GroupID 
                             and s2.UniqueID <> s1.UniqueID 
                             and 
                             (
                             (s1.item2 is null and s1.item1 = s2.item1 and s2.item2 is not null) 
                             or
                             (s1.item2 is null and s1.item1 = s2.item2) 
                             or
                             (s1.item2 is not null and (s1.item2 = s2.item1 or s1.item2 = s2.item2)) 
                             )

                 ) then 'y'else '' end
from tblSource as s1

select of it, with the output:

select UniqueID,GroupID,Item1,Item2,Result as yourResult
,case when exists(select 1 from dbo.tblSource as s2 
                           where s2.GroupID  = s1.GroupID 
                             and s2.UniqueID <> s1.UniqueID 
                             and 
                             (
                             (s1.item2 is null and s1.item1 = s2.item1 and s2.item2 is not null) 
                             or
                             (s1.item2 is null and s1.item1 = s2.item2) 
                             or
                             (s1.item2 is not null and (s1.item2 = s2.item1 or s1.item2 = s2.item2)) 
                             )

                 ) then 'y'else '' end as queryRes
        ,flag
        ,flagRes
from dbo.tblSource as S1


UniqueID    GroupID Item1   Item2   yourResult  queryRes    flag    flagRes                             
1           1       x       null                            null    null
2           1       b       c       Y           y           c       y
3           1       c       d       Y           y           b       y
4           1       d       null    Y           y           c       y
5           1       e       d       Y           y           d       y
6           1       e       null    Y           y           null    y
7           1       e       c       Y           y           c       y
8           2       a       null                            null    null
9           2       b       null                            null    null
10          3       x       null                            null    null
11          3       y       null                            null    null
12          3       a       b       Y           y           b       y
13          3       b       null    Y           y           a       y
14          4       x       null                            null    null
15          4       y       null                            null    null
16          4       x       null                            null    null
17          4       c       null    Y           y           d       y
18          4       d       null    Y           y           c       y
19          4       d       c       Y           y           c       y
20          5       a       c       Y           y           c       y
21          5       b       c       Y           y           c       y
22          5       b       null    Y           y           null    y
23          5       c       null    Y           y           a       y

dbfiddle here