Question

Remove lines if duplicated values in column 1 has specific condition for column 2

1

Entering edit mode

7.3 years ago

waqasnayab ▴ 250

Hi Community,

I have a multi-column file, but here I am mentioning just two columns:

Contig0.100010-snap.2                                   93.4065934066
Contig0.100014-snap.1                                   67.3611111111
Contig0.100025-snap.1                                   72.9411764706
Contig0.100025-snap.1                                   74.1176470588
Contig0.100025-snap.1                                   82.3529411765
Contig0.100051-snap.1                                   95

Column one has three duplicated values, that is, Contig0.100025-snap.1. So, I want that if column one has duplicated values, remove those line in which the corresponding values in column 2 are smaller. My desired output would be:

Contig0.100010-snap.2                                   93.4065934066
Contig0.100014-snap.1                                   67.3611111111
Contig0.100025-snap.1                                   82.35229411765
Contig0.100051-snap.1                                   95

Any unix sed awk etc or help from excel???

Thanks,

Waqas.

SNP sequence alignment blast • 2.9k views

ADD COMMENT • link updated 7.3 years ago by Sej Modha 5.3k • written 7.3 years ago by waqasnayab ▴ 250

0

Entering edit mode

datamash -g 1 min 2 <input.txt

columns are tab separated. Datamash is gnu tool and is available in most of the linux distro repos.

ADD REPLY • link 7.3 years ago by cpad0112 21k

score 7 · Accepted Answer · 2017-08-14

7

Entering edit mode

7.3 years ago

Sej Modha 5.3k

Simple bash sort solution:

 sort -k1,1 -k2,2nr test_file | sort -u -k1,1

ADD COMMENT • link 7.3 years ago by Sej Modha 5.3k

0

Entering edit mode

GREAT Thanks, it worked perfectly for me,

Regards,

Waqas.

ADD REPLY • link 7.3 years ago by waqasnayab ▴ 250