Count & Remove Repeating / Duplicate lines in Linux/Unix file

advertisements

_____________________________________________________________________________________________________________________

I have a file called dat.txt with few lines. I wanted to count and remove the duplicate lines.

$ cat dat.txt

xml.1-lkj<mn1-1lkjmg1-1w13lg.rec

xml.1-CCJGL1-CCJGL1-CCJGL.rec

xml.1-BSDF0Q1-BW;LKJ1-BWP30Q.rec

xml.1-LKJ<MN1-1LKJMG1-1W13LG.rec

xml.1-2<MBMV1-NVNBVKJH21HMRE.rec

xml.1-2EW*&Y1-(878761-2AJKGY.rec

uniq is the command is used to find out the duplicates. You have to pass the sorted file to the uniq command.

$ sort dat.txt|uniq

xml.1-2EW*&Y1-(878761-2AJKGY.rec

xml.1-2<MBMV1-NVNBVKJH21HMRE.rec

xml.1-BSDF0Q1-BW;LKJ1-BWP30Q.rec

xml.1-CCJGL1-CCJGL1-CCJGL.rec

xml.1-lkj<mn1-1lkjmg1-1w13lg.rec

xml.1-LKJ<MN1-1LKJMG1-1W13LG.rec

Ignore case using –i option

$ sort dat.txt|uniq -i

xml.1-2EW*&Y1-(878761-2AJKGY.rec

xml.1-2<MBMV1-NVNBVKJH21HMRE.rec

xml.1-BSDF0Q1-BW;LKJ1-BWP30Q.rec

xml.1-CCJGL1-CCJGL1-CCJGL.rec

xml.1-lkj<mn1-1lkjmg1-1w13lg.rec

Count the occurrence of duplicates using –c option.

$ sort dat.txt|uniq -ic

1 xml.1-2EW*&Y1-(878761-2AJKGY.rec

1 xml.1-2<MBMV1-NVNBVKJH21HMRE.rec

2 xml.1-BSDF0Q1-BW;LKJ1-BWP30Q.rec

1 xml.1-CCJGL1-CCJGL1-CCJGL.rec

3 xml.1-lkj<mn1-1lkjmg1-1w13lg.rec

awk script to perform the same task

$ awk '!x[$0]++' dat.txt

xml.1-lkj<mn1-1lkjmg1-1w13lg.rec

xml.1-CCJGL1-CCJGL1-CCJGL.rec

xml.1-BSDF0Q1-BW;LKJ1-BWP30Q.rec

xml.1-LKJ<MN1-1LKJMG1-1W13LG.rec

xml.1-2<MBMV1-NVNBVKJH21HMRE.rec

xml.1-2EW*&Y1-(878761-2AJKGY.rec

_____________________________________________________________________________________________________________________

Archive