Remove lines if duplicated values in column 1 has specific condition for column 2
1
1
Entering edit mode
7.3 years ago
waqasnayab ▴ 250

Hi Community,

I have a multi-column file, but here I am mentioning just two columns:

Contig0.100010-snap.2                                   93.4065934066
Contig0.100014-snap.1                                   67.3611111111
Contig0.100025-snap.1                                   72.9411764706
Contig0.100025-snap.1                                   74.1176470588
Contig0.100025-snap.1                                   82.3529411765
Contig0.100051-snap.1                                   95

Column one has three duplicated values, that is, Contig0.100025-snap.1. So, I want that if column one has duplicated values, remove those line in which the corresponding values in column 2 are smaller. My desired output would be:

Contig0.100010-snap.2                                   93.4065934066
Contig0.100014-snap.1                                   67.3611111111
Contig0.100025-snap.1                                   82.35229411765
Contig0.100051-snap.1                                   95

Any unix sed awk etc or help from excel???

Thanks,

Waqas.

SNP sequence alignment blast • 2.9k views
ADD COMMENT
0
Entering edit mode
datamash -g 1 min 2 <input.txt

columns are tab separated. Datamash is gnu tool and is available in most of the linux distro repos.

ADD REPLY
7
Entering edit mode
7.3 years ago
Sej Modha 5.3k

Simple bash sort solution:

 sort -k1,1 -k2,2nr test_file | sort -u -k1,1
ADD COMMENT
0
Entering edit mode

GREAT Thanks, it worked perfectly for me,

Regards,

Waqas.

ADD REPLY

Login before adding your answer.

Traffic: 2288 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6