Entering edit mode
7.3 years ago
waqasnayab
▴
250
Hi Community,
I have a multi-column file, but here I am mentioning just two columns:
Contig0.100010-snap.2 93.4065934066
Contig0.100014-snap.1 67.3611111111
Contig0.100025-snap.1 72.9411764706
Contig0.100025-snap.1 74.1176470588
Contig0.100025-snap.1 82.3529411765
Contig0.100051-snap.1 95
Column one has three duplicated values, that is, Contig0.100025-snap.1. So, I want that if column one has duplicated values, remove those line in which the corresponding values in column 2 are smaller. My desired output would be:
Contig0.100010-snap.2 93.4065934066
Contig0.100014-snap.1 67.3611111111
Contig0.100025-snap.1 82.35229411765
Contig0.100051-snap.1 95
Any unix sed awk etc or help from excel???
Thanks,
Waqas.
columns are tab separated. Datamash is gnu tool and is available in most of the linux distro repos.