Count & Remove Repeating / Duplicate lines in Linux/Unix file


I have a file called dat.txt with few lines. I wanted to count and remove the duplicate lines.

$ cat dat.txt

xml.1-lkj<mn1-1lkjmg1-1w13lg.rec
xml.1-CCJGL1-CCJGL1-CCJGL.rec
xml.1-BSDF0Q1-BW;LKJ1-BWP30Q.rec
xml.1-BSDF0Q1-BW;LKJ1-BWP30Q.rec
xml.1-LKJ<MN1-1LKJMG1-1W13LG.rec
xml.1-LKJ<MN1-1LKJMG1-1W13LG.rec
xml.1-2<MBMV1-NVNBVKJH21HMRE.rec
xml.1-2EW*&Y1-(878761-2AJKGY.rec

uniq is the command is used to find out the duplicates. You have to pass the sorted file to the uniq command.

$ sort dat.txt|uniq
xml.1-2EW*&Y1-(878761-2AJKGY.rec
xml.1-2<MBMV1-NVNBVKJH21HMRE.rec
xml.1-BSDF0Q1-BW;LKJ1-BWP30Q.rec
xml.1-CCJGL1-CCJGL1-CCJGL.rec
xml.1-lkj<mn1-1lkjmg1-1w13lg.rec
xml.1-LKJ<MN1-1LKJMG1-1W13LG.rec

Ignore case using –i option

$ sort dat.txt|uniq -i
xml.1-2EW*&Y1-(878761-2AJKGY.rec
xml.1-2<MBMV1-NVNBVKJH21HMRE.rec
xml.1-BSDF0Q1-BW;LKJ1-BWP30Q.rec
xml.1-CCJGL1-CCJGL1-CCJGL.rec
xml.1-lkj<mn1-1lkjmg1-1w13lg.rec

Count the occurrence of duplicates using –c option.
$ sort dat.txt|uniq -ic
      1 xml.1-2EW*&Y1-(878761-2AJKGY.rec
      1 xml.1-2<MBMV1-NVNBVKJH21HMRE.rec
      2 xml.1-BSDF0Q1-BW;LKJ1-BWP30Q.rec
      1 xml.1-CCJGL1-CCJGL1-CCJGL.rec
      3 xml.1-lkj<mn1-1lkjmg1-1w13lg.rec

awk script to perform the same task

$ awk '!x[$0]++' dat.txt
xml.1-lkj<mn1-1lkjmg1-1w13lg.rec
xml.1-CCJGL1-CCJGL1-CCJGL.rec
xml.1-BSDF0Q1-BW;LKJ1-BWP30Q.rec
xml.1-LKJ<MN1-1LKJMG1-1W13LG.rec
xml.1-2<MBMV1-NVNBVKJH21HMRE.rec
xml.1-2EW*&Y1-(878761-2AJKGY.rec
DBA Tips Data Pump Reference

0 comments:

Post a Comment

 

dba topics. Copyright 2011-16 All Rights Reserved | Site Map | Contact | Disclaimer | Google