how to remove rows based on certain characters
2
0
Entering edit mode
7.8 years ago
zqwu • 0

Dear all ,

I have a file over 30000 rows (\t as the space), I want to remove some based on certain characters:

for example:

Name    Len Name2   Order

KCNQ2_32937 2535    KCNQ2   32937

KCNQ2_32938 2733    KCNQ2   32938

KCNQ2_32939 2616    KCNQ2   32939

KCNQ2_32940 2544    KCNQ2   32940

KCNQ2_32941 1809    KCNQ2   32941

.
.
.

the filter is like this:

In Name2 column, if the name of each cell is the same, I want keep the largest one in Len column:

Name    Len Name2   Order

KCNQ2_32938 2733    KCNQ2   32938

...

How can I do it like this?

TJ

R • 1.4k views
ADD COMMENT
7
Entering edit mode
7.8 years ago

sort column 3 and then column 2 (reverse number) , followed by a stable sort/uniq on column 3

sort -t $'\t ' -k3,3 -k2,2rn input.tsv | sort -t $'\t ' -k3,3 -u --stable
ADD COMMENT
0
Entering edit mode

thanks. It is fast and exactly what I need.

ADD REPLY
0
Entering edit mode

If this answer solved your problem then go ahead and "accept" (green check mark). @5heikki's answer which appears to have been written almost at the same time may also be fine and can be accepted in addition to @Pierre's.

ADD REPLY
4
Entering edit mode
7.8 years ago
5heikki 11k
sort -t $'\t' -k3,3 -k2,2gr file | sort -t $'\t' -u -k3,3

Also: man sort

ADD COMMENT

Login before adding your answer.

Traffic: 2000 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6