# Sql-server – Identify co-related record between two columns with in the group

sql server

I need to identify and update co-related records under Co_Related_Item column as depicted below. Table name is CoRelated.

Scenario explaination:
In the first group i.e. [GroupID] = 1 we can see b=c, c=d and e=d, hence b=c=d=e and need to be flagged 'YES'. They are co-related. But x is not co-related to any item with in the group and hence it is a different item. So we can ignore this record.

+----------+---------+-------+-------+-----------------+
|                                       Required result|
+----------+---------+-------+-------+-----------------+
| UniqueID | GroupID | Item1 | Item2 | Co_Related_Item |
+----------+---------+-------+-------+-----------------+
|        1 |       1 | x     |       |                 |
|        2 |       1 | b     | c     | YES             |
|        3 |       1 | c     | d     | YES             |
|        4 |       1 | d     |       | YES             |
|        5 |       1 | e     | d     | YES             |
|        6 |       1 | e     |       | YES             |
|        7 |       1 | e     | c     | YES             |
--------------------------------------------------------
|        8 |       2 | a     |       |                 |
|        9 |       2 | b     |       |                 |
--------------------------------------------------------
|       10 |       3 | x     |       |                 |
|       11 |       3 | y     |       |                 |
|       12 |       3 | a     | b     | YES             |
|       13 |       3 | b     |       | YES             |
--------------------------------------------------------
|       14 |       4 | x     |       |                 |
|       15 |       4 | y     |       |                 |
|       16 |       4 | x     |       |                 |
|       17 |       4 | c     |       | YES             |
|       18 |       4 | d     |       | YES             |
|       19 |       4 | d     | c     | YES             |
--------------------------------------------------------
|       20 |       5 | a     | c     | YES             |
|       21 |       5 | b     | c     | YES             |
|       22 |       5 | b     |       | YES             |
|       23 |       5 | c     |       | YES             |
+----------+---------+-------+-------+-----------------+


My first try:

Update a SET a.Co_Related_item = 'Y' from CoRelated a inner join CoRelated b on a.[GroupID] = b.[GroupID] and (a.Item1 = b.Item2 or b.item1 = a.Item2)


My Second try:
I added a new column Flag and performed two step query as posted below.

1)
Update a SET a.Flag = b.Item1 from CoRelated a inner join CoRelated b on a.[GroupID] = b.[GroupID] where (a.Item1 = b.Item2 or b.item1 = a.Item2)

2)
Update a SET a.Flag = b.Item1 from CoRelated a inner join CoRelated b on a.[GroupID] = b.[GroupID] and (a.Item1 = b.Flag or a.item2 = b.Flag)
where a.Flag is null


But I am missing few items with my queries.

Note: I have 6 million records with 300K GroupID's in my table. Please give an efficient query.

Your update statement, is not covering all possibilities. See record id 6 and 22 ; these records are not marked.

To have an efficient query , you need also some indexes on your table. One that starts with column GroupID ...

-- preparation part

create table dbo.tblSource
(UniqueID int not null
,GroupID int not null
,Item1 char(1) not null
,Item2 char(1) null
,Result char(1) null
,Flag char(1) null
,FlagRes char(1) null)

insert into dbo.tblSource(UniqueID,GroupID,Item1,Item2,Result)
select 1 , 1, 'x',NULL, '' UNION ALL
select 2 , 1, 'b','c', 'Y' UNION ALL
select 3 , 1, 'c','d', 'Y' UNION ALL
select 4 , 1, 'd',NULL, 'Y' UNION ALL
select 5 , 1, 'e','d', 'Y' UNION ALL
select 6 , 1, 'e',NULL, 'Y' UNION ALL
select 7 , 1, 'e','c', 'Y' UNION ALL
select 8 , 2, 'a',NULL, '' UNION ALL
select 9 , 2, 'b',NULL, '' UNION ALL
select 10 , 3, 'x',NULL, '' UNION ALL
select 11 , 3, 'y',NULL, '' UNION ALL
select 12 , 3, 'a','b', 'Y' UNION ALL
select 13 , 3, 'b',NULL, 'Y' UNION ALL
select 14 , 4, 'x',NULL, '' UNION ALL
select 15 , 4, 'y',NULL, '' UNION ALL
select 16 , 4, 'x',NULL, '' UNION ALL
select 17 , 4, 'c',NULL, 'Y' UNION ALL
select 18 , 4, 'd',NULL, 'Y' UNION ALL
select 19 , 4, 'd','c', 'Y' UNION ALL
select 20 , 5, 'a','c', 'Y' UNION ALL
select 21 , 5, 'b','c', 'Y' UNION ALL
select 22 , 5, 'b',NULL, 'Y' UNION ALL
select 23 , 5, 'c',NULL, 'Y'


Update a SET a.Flag = b.Item1 from tblSource a inner join tblSource b on a.[GroupID] = b.[GroupID] where (a.Item1 = b.Item2 or b.item1 = a.Item2)

Update a SET a.Flag = b.Item1 from tblSource a inner join tblSource b on a.[GroupID] = b.[GroupID] and (a.Item1 = b.Flag or a.item2 = b.Flag)
where a.Flag is null


this is what I come with:

update s1
set flagRes = case when exists(select 1 from dbo.tblSource as s2
where s2.GroupID  = s1.GroupID
and s2.UniqueID <> s1.UniqueID
and
(
(s1.item2 is null and s1.item1 = s2.item1 and s2.item2 is not null)
or
(s1.item2 is null and s1.item1 = s2.item2)
or
(s1.item2 is not null and (s1.item2 = s2.item1 or s1.item2 = s2.item2))
)

) then 'y'else '' end
from tblSource as s1


select of it, with the output:

select UniqueID,GroupID,Item1,Item2,Result as yourResult
,case when exists(select 1 from dbo.tblSource as s2
where s2.GroupID  = s1.GroupID
and s2.UniqueID <> s1.UniqueID
and
(
(s1.item2 is null and s1.item1 = s2.item1 and s2.item2 is not null)
or
(s1.item2 is null and s1.item1 = s2.item2)
or
(s1.item2 is not null and (s1.item2 = s2.item1 or s1.item2 = s2.item2))
)

) then 'y'else '' end as queryRes
,flag
,flagRes
from dbo.tblSource as S1

UniqueID    GroupID Item1   Item2   yourResult  queryRes    flag    flagRes
1           1       x       null                            null    null
2           1       b       c       Y           y           c       y
3           1       c       d       Y           y           b       y
4           1       d       null    Y           y           c       y
5           1       e       d       Y           y           d       y
6           1       e       null    Y           y           null    y
7           1       e       c       Y           y           c       y
8           2       a       null                            null    null
9           2       b       null                            null    null
10          3       x       null                            null    null
11          3       y       null                            null    null
12          3       a       b       Y           y           b       y
13          3       b       null    Y           y           a       y
14          4       x       null                            null    null
15          4       y       null                            null    null
16          4       x       null                            null    null
17          4       c       null    Y           y           d       y
18          4       d       null    Y           y           c       y
19          4       d       c       Y           y           c       y
20          5       a       c       Y           y           c       y
21          5       b       c       Y           y           c       y
22          5       b       null    Y           y           null    y
23          5       c       null    Y           y           a       y


dbfiddle here