I have input file contains multiple nucleotide sequences in fasta format. I am using a standalone Linux version of ORFfinder using following command:
/Path/ORFfinder -in /Path_to_input_file/input_file.fasta -s 0 -out output_fasta -outfmt 0
This generates output file containing all ORF, but I am only interested in the longest ORF (ORF having maximum length). For example, one of the nucleotide sequence from my input file is:
BnaA03g18710D ACCAACATCTATTTTCCATCTTTTCCGATCAAAATCTCTCTCTCTCTCTCAGCTTTTTGTGTGACGCAACACTCGTGGGGAAATGGCCGCCGCAGTTTCCACCGTCGGTGCCATCAACAGAGCTCCGTTGAGCTTGAACGGGTCAGGAGCAGGAGCTGCTTCAGTCCCAGCTACGACCTTCTTGGGAAAGAAAGTTGTAACCGCGTCGAGATTCACACAGAGCAACAACAAGAAGAGCAACGGATCATTCAAAGTGGTCGCTGTCAAAGAAGACAAACAAACCGATGGAGACAGATGGAGGGGACTTGCCTACGACACGTCTGATGATCAACAAGACATCACCAGAGGCAAAGGTATGGTTGACTCTGTCTTCCAAGCTCCCATGGGAACCGGAACTCACAATGCCGTTCTTAGCTCCTATGAGTACATTAGCCAAGGTCTTAAGCAGTACAACTTGGACAACATGATGGATGGGCTTTACATTGCTCCTGCATTCATGGACAAGCTTGTTGTTCACATCACCAAGAACTTCTTGACTTTACCTAACATCAAGGTTCCACTTATTTTGGGTATTTGGGGAGGCAAAGGTCAAGGTAAATCCTTCCAGTGTGAGCTTGTCATGGCCAAGATGGGCATTAACCCAATCATGATGAGTGCTGGAGAGCTTGAGAGTGGAAACGCAGGAGAACCAGCCAAGCTGATCCGTCAAAGGTACCGTGAAGCAGCAGACATGATCAAAAAGGGAAAAATGTGTTGTCTATTCATCAACGATCTCGACGCTGGTGCTGGTCGTATGGGTGGTACTACTCAGTACACAGTCAACAACCAGATGGTTAACGCAACCCTCATGAACATTGCTGATAACCCAACCAACGTCCAGCTCCCGGGAATGTACAACAAGGAAGAAAACGCACGTGTCCCCATCATCGTCACCGGTAACGATTTCTCCACTCTCTACGCACCTCTCATCCGTGACGGGCGTATGGAGAAATTCTACTGGGCACCCACACGTGAGGACCGTATTGGTGTCTGCAAGGGTATCTTCAGGACTGATAACGTTAAGGATGAAGACATTGTCACGCTTGTTGACCAGTTCCCTGGACAATCTATCGATTTCTTTGGTGCATTGAGGGCGAGAGTGTACGATGATGAAGTGAGGAAGTTCGTTGAGGGACTTGGAGTTGAGAAGATAGGAAAGAGGCTGGTGAACTCTAGGGAAGGTCCTCCAGTGTTCGAGCAGCCAGCGATGACTCTTGAGAAGCTTATGGAGTACGGAAACATGCTTGTGATGGAACAAGAGAACGTCAAGAGAGTCCAACTTGCTGACCAATACCTTAACGAGGCTGCCTTGGGAGACGCAAACGCGGACGCCATTGGCCGCGGAACTTTCTATGGGAAAGCAGCACAGCAAGTGAACCTCCCTGTTCCAGAAGGGTGTACTGATCCTCAAGCAGACAACTTTGATCCAACAGCTAGAAGTGATGATGGAACTTGTGTCTACAACTTTTGAGTTTCCCCTTTGTTAAGTTGCTGTGTTTCTACTACTGTCTCTTTTTTTTGTTGCCTTTTGTGTAATTTTGGATTGCTTCATGTACTCTCTTTTTTTGTGATCATGTGCAAACATTAATATTGTAAGATTCCCTTGTCATAAACCATTTCTCAACTTTTTGTTTGCTTTATTAAGTAGATGGCATTCCAACTATAGTTCTTTGGCCATAGTCTCGGAA
This generates following output in the form of 5 ORF:
lcl|ORF1_BnaA03g18710D:82:1509 unnamed protein product
MAAAVSTVGAINRAPLSLNGSGAGAASVPATTFLGKKVVTASRFTQSNNKKSNGSFKVVAVKEDKQTDGD RWRGLAYDTSDDQQDITRGKGMVDSVFQAPMGTGTHNAVLSSYEYISQGLKQYNLDNMMDGLYIAPAFMD KLVVHITKNFLTLPNIKVPLILGIWGGKGQGKSFQCELVMAKMGINPIMMSAGELESGNAGEPAKLIRQR YREAADMIKKGKMCCLFINDLDAGAGRMGGTTQYTVNNQMVNATLMNIADNPTNVQLPGMYNKEENARVP IIVTGNDFSTLYAPLIRDGRMEKFYWAPTREDRIGVCKGIFRTDNVKDEDIVTLVDQFPGQSIDFFGALR ARVYDDEVRKFVEGLGVEKIGKRLVNSREGPPVFEQPAMTLEKLMEYGNMLVMEQENVKRVQLADQYLNE AALGDANADAIGRGTFYGKAAQQVNLPVPEGCTDPQADNFDPTARSDDGTCVYNF
lcl|ORF2_BnaA03g18710D:284:466 unnamed protein product
METDGGDLPTTRLMINKTSPEAKVWLTLSSKLPWEPELTMPFLAPMSTLAKVLSSTTWTT
lcl|ORF3_BnaA03g18710D:1623:1522 unnamed protein product
MFAHDHKKREYMKQSKITQKATKKRDSSRNTAT
lcl|ORF4_BnaA03g18710D:1373:993 unnamed protein product
MASAFASPKAASLRYWSASWTLLTFSCSITSMFPYSISFSRVIAGCSNTGGPSLEFTSLFPIFSTPSPST
NFLTSSSYTLALNAPKKSIDCPGNWSTSVTMSSSLTLSVLKIPLQTPIRSSRVGAQ
lcl|ORF5_BnaA03g18710D:926:807 unnamed protein product
MMGTRAFSSLLYIPGSWTLVGLSAMFMRVALTIWLLTVY
But I am only interested in Longest ORF which is:
lcl|ORF1_BnaA03g18710D:82:1509 unnamed protein product
MAAAVSTVGAINRAPLSLNGSGAGAASVPATTFLGKKVVTASRFTQSNNKKSNGSFKVVAVKEDKQTDGD RWRGLAYDTSDDQQDITRGKGMVDSVFQAPMGTGTHNAVLSSYEYISQGLKQYNLDNMMDGLYIAPAFMD KLVVHITKNFLTLPNIKVPLILGIWGGKGQGKSFQCELVMAKMGINPIMMSAGELESGNAGEPAKLIRQR YREAADMIKKGKMCCLFINDLDAGAGRMGGTTQYTVNNQMVNATLMNIADNPTNVQLPGMYNKEENARVP IIVTGNDFSTLYAPLIRDGRMEKFYWAPTREDRIGVCKGIFRTDNVKDEDIVTLVDQFPGQSIDFFGALR ARVYDDEVRKFVEGLGVEKIGKRLVNSREGPPVFEQPAMTLEKLMEYGNMLVMEQENVKRVQLADQYLNE AALGDANADAIGRGTFYGKAAQQVNLPVPEGCTDPQADNFDPTARSDDGTCVYNF
How can I output or extract only the longest ORF?
Any help will be highly appreciated.