Ubuntu – How can i compare data in 2 files to identify common and unique data?

command linefilestext;

How can I compare data in 2 files to identify common and unique data ? I can't do it line by line because I have file 1 which contains say 100 id/codes/number-set and I want to compare a file 2 to file 1.

The thing is that file 2 contains a subset of data in file 1 and also data unique to file 2, for example:

file 1      file 2
1            1
2            a
3            2
4            b
5            3 
6            c

How can I compare both files to identify data that is common and unique to each files? diff can't seem to do the job.

Best Answer

  • No matter if your file1 and file2 are sorted or not, use command as follows:

    unique data in file1:

    awk 'NR==FNR{a[$0];next}!($0 in a)' file2 file1
    4
    5
    6
    

    unique data in file2:

    awk 'NR==FNR{a[$0];next}!($0 in a)' file1 file2
    a
    b
    c
    

    common data:

    awk 'NR==FNR{a[$0];next} ($0 in a)' file1 file2
    1
    2
    3
    

    Explanation:

    NR==FNR    - Execute next block for 1st file only
    a[$0]      - Create an associative array with key as '$0' (whole line) and copy that into it as its content.
    next       - move to next row
    ($0 in a)  - For each line saved in `a` array:
                 print the common lines from 1st and 2nd file "($0 in a)' file1 file2"
                 or unique lines in 1st file only "!($0 in a)' file2 file1"
                 or unique lines in 2nd file only "!($0 in a)' file1 file2"
    
  • Related Question