I extracted ORFs from a initial fasta file and now I want to get the longest ORF for each transcript.
After having extracted the size of the ORFs with faSize and sorted them by size, the code I was used to use is:
perl -ane'print unless $x{$F[0]}++'
This time I have a problem using the perl command.
After having extracted the size and sorted the transcripts I have something like this:
Singlet_1000_61 3844
Singlet_2000_73 3508
Singlet_1000_62 3081
Singlet_2000_62 3008
Singlet_3500_48 2973
Singlet_4000_48 2964
Singlet_3500_54 2863
What I want is:
Singlet_1000_61 3844
Singlet_2000_73 3508
Singlet_3500_48 2973
...
The perl command is not working in this case.
Do you have any suggestions on how I can make it work?
Or a awk command?
Thanks for help