Entering edit mode
5.2 years ago
wu.zhiqiang.1020
▴
50
Hi all, I have some gene name data like this. The title as GeneName Isoform Length
Zm00001d000001 T001 438
Zm00001d000001 T002 1842
Zm00001d000001 T005 1842
Zm00001d000001 T006 1503
Zm00001d000002 T001 5025
Zm00001d000002 T002 5034
Zm00001d000002 T005 4551
Zm00001d000002 T007 3432
I want to choose the longest one from them. as
Zm00001d000002 T002 5034;
But some isoforms have the same length, as
Zm00001d000001 T002 1842
Zm00001d000001 T005 1842
I will choose the one based the second column as the smallest (or randomly choose one)
Zm00001d000001 T002 1842
is there a best way to do this?
thanks
What have you tried? This can be done in a straightforward way with R or python, and in a more complicated way with
awk
. Please tell us what you've tried and the exact problem you're facing, and we can help you solve it. Without that, this is just asking us to do your work for you.the input is like this:
the final result like this:
for each gene, I just want to choose the longest one. this is what I want. I hope I make it clear
Your requirements were clear. What was not clear was what you'd tried by yourself. That is not a point addressed in your question or your comment. Please be informed that it is good practice to try and solve something by yourself before asking for help.
thanks. I am not good at those computing stuff. I am just starting now. thanks