Question

how to choose the biggest from multiple lines

0

Entering edit mode

5.5 years ago

wu.zhiqiang.1020 ▴ 50

Hi all, I have some gene name data like this. The title as GeneName Isoform Length

Zm00001d000001  T001    438

Zm00001d000001  T002    1842

Zm00001d000001  T005    1842 

Zm00001d000001  T006    1503

Zm00001d000002  T001    5025

Zm00001d000002  T002    5034

Zm00001d000002  T005    4551

Zm00001d000002  T007    3432

I want to choose the longest one from them. as

Zm00001d000002  T002    5034;

But some isoforms have the same length, as

Zm00001d000001  T002    1842

Zm00001d000001  T005    1842

I will choose the one based the second column as the smallest (or randomly choose one)

Zm00001d000001  T002    1842

is there a best way to do this?

thanks

gene • 1.3k views

ADD COMMENT • link updated 5.5 years ago by GenoMax 150k • written 5.5 years ago by wu.zhiqiang.1020 ▴ 50

1

Entering edit mode

What have you tried? This can be done in a straightforward way with R or python, and in a more complicated way with awk. Please tell us what you've tried and the exact problem you're facing, and we can help you solve it. Without that, this is just asking us to do your work for you.

ADD REPLY • link 5.5 years ago by Ram 45k

0

Entering edit mode

the input is like this:

Zm00001d000001 T001 438

Zm00001d000001 T002 1842

Zm00001d000001 T005 1842

Zm00001d000002 T001 5025

Zm00001d000002 T002 5034

Zm00001d000002 T005 4551

the final result like this:

Zm00001d000001 T002 1842

Zm00001d000002 T002 5034

for each gene, I just want to choose the longest one. this is what I want. I hope I make it clear

ADD REPLY • link updated 5.5 years ago by GenoMax 150k • written 5.5 years ago by wu.zhiqiang.1020 ▴ 50

0

Entering edit mode

Your requirements were clear. What was not clear was what you'd tried by yourself. That is not a point addressed in your question or your comment. Please be informed that it is good practice to try and solve something by yourself before asking for help.

ADD REPLY • link 5.5 years ago by Ram 45k

0

Entering edit mode

thanks. I am not good at those computing stuff. I am just starting now. thanks

ADD REPLY • link 5.5 years ago by wu.zhiqiang.1020 ▴ 50

score 5 · Accepted Answer · 2019-09-22

5

Entering edit mode

5.5 years ago

Pierre Lindenbaum 165k

asuming a tab-delimited file

sort -t $'\t' -k1,1 -k3,3rn  input.tsv |  sort -t $'\t' -k1,1  -u --stable 
Zm00001d000001  T002    1842
Zm00001d000002  T002    5034

ADD COMMENT • link 5.5 years ago by Pierre Lindenbaum 165k

0

Entering edit mode

yes, this is exactly what I want. Just save the longest. thanks.

ADD REPLY • link 5.5 years ago by wu.zhiqiang.1020 ▴ 50

0

Entering edit mode

If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one answer if they all work.

Upvote|Bookmark|Accept

ADD REPLY • link 5.5 years ago by Ram 45k