diff two files and output lines not seen in file 2

March 11, 2015

Problem

You need two diff two files and only output what is unique to file one.

text file 1 contains:

1
2
3
4
5

text file 2 contains:

6
7
1
2
3
4

Solution

$ awk 'FNR==NR{a[$0]++;next}!a[$0]' file1 file2
6
7

Explanation of how the code works:

  • If we’re working on file1, track each line of text we see.
  • If we’re working on file2, and have not seen the line text, then print it.

Explanation of details:

  • FNR is the current file’s record number
  • NR is the current overall record number from all input files
  • FNR==NR is true only when we are reading file1
  • $0 is the current line of text
  • a[$0] is a hash with the key set to the current line of text
  • a[$0]++ tracks that we’ve seen the current line of text
  • !a[$0] is true only when we have not seen the line text
  • Print the line of text if the above pattern returns true, this is the default awk behavior when no explicit action is given

Source: http://stackoverflow.com/questions/4717250/extracting-unique-values-between-2-sets-files