Extract longest ORF
1
0
Entering edit mode
4.7 years ago

I have input file contains multiple nucleotide sequences in fasta format. I am using a standalone Linux version of ORFfinder using following command:

/Path/ORFfinder -in /Path_to_input_file/input_file.fasta -s 0 -out output_fasta -outfmt 0

This generates output file containing all ORF, but I am only interested in the longest ORF (ORF having maximum length). For example, one of the nucleotide sequence from my input file is:

BnaA03g18710D ACCAACATCTATTTTCCATCTTTTCCGATCAAAATCTCTCTCTCTCTCTCAGCTTTTTGTGTGACGCAACACTCGTGGGGAAATGGCCGCCGCAGTTTCCACCGTCGGTGCCATCAACAGAGCTCCGTTGAGCTTGAACGGGTCAGGAGCAGGAGCTGCTTCAGTCCCAGCTACGACCTTCTTGGGAAAGAAAGTTGTAACCGCGTCGAGATTCACACAGAGCAACAACAAGAAGAGCAACGGATCATTCAAAGTGGTCGCTGTCAAAGAAGACAAACAAACCGATGGAGACAGATGGAGGGGACTTGCCTACGACACGTCTGATGATCAACAAGACATCACCAGAGGCAAAGGTATGGTTGACTCTGTCTTCCAAGCTCCCATGGGAACCGGAACTCACAATGCCGTTCTTAGCTCCTATGAGTACATTAGCCAAGGTCTTAAGCAGTACAACTTGGACAACATGATGGATGGGCTTTACATTGCTCCTGCATTCATGGACAAGCTTGTTGTTCACATCACCAAGAACTTCTTGACTTTACCTAACATCAAGGTTCCACTTATTTTGGGTATTTGGGGAGGCAAAGGTCAAGGTAAATCCTTCCAGTGTGAGCTTGTCATGGCCAAGATGGGCATTAACCCAATCATGATGAGTGCTGGAGAGCTTGAGAGTGGAAACGCAGGAGAACCAGCCAAGCTGATCCGTCAAAGGTACCGTGAAGCAGCAGACATGATCAAAAAGGGAAAAATGTGTTGTCTATTCATCAACGATCTCGACGCTGGTGCTGGTCGTATGGGTGGTACTACTCAGTACACAGTCAACAACCAGATGGTTAACGCAACCCTCATGAACATTGCTGATAACCCAACCAACGTCCAGCTCCCGGGAATGTACAACAAGGAAGAAAACGCACGTGTCCCCATCATCGTCACCGGTAACGATTTCTCCACTCTCTACGCACCTCTCATCCGTGACGGGCGTATGGAGAAATTCTACTGGGCACCCACACGTGAGGACCGTATTGGTGTCTGCAAGGGTATCTTCAGGACTGATAACGTTAAGGATGAAGACATTGTCACGCTTGTTGACCAGTTCCCTGGACAATCTATCGATTTCTTTGGTGCATTGAGGGCGAGAGTGTACGATGATGAAGTGAGGAAGTTCGTTGAGGGACTTGGAGTTGAGAAGATAGGAAAGAGGCTGGTGAACTCTAGGGAAGGTCCTCCAGTGTTCGAGCAGCCAGCGATGACTCTTGAGAAGCTTATGGAGTACGGAAACATGCTTGTGATGGAACAAGAGAACGTCAAGAGAGTCCAACTTGCTGACCAATACCTTAACGAGGCTGCCTTGGGAGACGCAAACGCGGACGCCATTGGCCGCGGAACTTTCTATGGGAAAGCAGCACAGCAAGTGAACCTCCCTGTTCCAGAAGGGTGTACTGATCCTCAAGCAGACAACTTTGATCCAACAGCTAGAAGTGATGATGGAACTTGTGTCTACAACTTTTGAGTTTCCCCTTTGTTAAGTTGCTGTGTTTCTACTACTGTCTCTTTTTTTTGTTGCCTTTTGTGTAATTTTGGATTGCTTCATGTACTCTCTTTTTTTGTGATCATGTGCAAACATTAATATTGTAAGATTCCCTTGTCATAAACCATTTCTCAACTTTTTGTTTGCTTTATTAAGTAGATGGCATTCCAACTATAGTTCTTTGGCCATAGTCTCGGAA

This generates following output in the form of 5 ORF:

lcl|ORF1_BnaA03g18710D:82:1509 unnamed protein product

MAAAVSTVGAINRAPLSLNGSGAGAASVPATTFLGKKVVTASRFTQSNNKKSNGSFKVVAVKEDKQTDGD RWRGLAYDTSDDQQDITRGKGMVDSVFQAPMGTGTHNAVLSSYEYISQGLKQYNLDNMMDGLYIAPAFMD KLVVHITKNFLTLPNIKVPLILGIWGGKGQGKSFQCELVMAKMGINPIMMSAGELESGNAGEPAKLIRQR YREAADMIKKGKMCCLFINDLDAGAGRMGGTTQYTVNNQMVNATLMNIADNPTNVQLPGMYNKEENARVP IIVTGNDFSTLYAPLIRDGRMEKFYWAPTREDRIGVCKGIFRTDNVKDEDIVTLVDQFPGQSIDFFGALR ARVYDDEVRKFVEGLGVEKIGKRLVNSREGPPVFEQPAMTLEKLMEYGNMLVMEQENVKRVQLADQYLNE AALGDANADAIGRGTFYGKAAQQVNLPVPEGCTDPQADNFDPTARSDDGTCVYNF

lcl|ORF2_BnaA03g18710D:284:466 unnamed protein product

METDGGDLPTTRLMINKTSPEAKVWLTLSSKLPWEPELTMPFLAPMSTLAKVLSSTTWTT

lcl|ORF3_BnaA03g18710D:1623:1522 unnamed protein product

MFAHDHKKREYMKQSKITQKATKKRDSSRNTAT

lcl|ORF4_BnaA03g18710D:1373:993 unnamed protein product

MASAFASPKAASLRYWSASWTLLTFSCSITSMFPYSISFSRVIAGCSNTGGPSLEFTSLFPIFSTPSPST

NFLTSSSYTLALNAPKKSIDCPGNWSTSVTMSSSLTLSVLKIPLQTPIRSSRVGAQ

lcl|ORF5_BnaA03g18710D:926:807 unnamed protein product

MMGTRAFSSLLYIPGSWTLVGLSAMFMRVALTIWLLTVY

But I am only interested in Longest ORF which is:

lcl|ORF1_BnaA03g18710D:82:1509 unnamed protein product

MAAAVSTVGAINRAPLSLNGSGAGAASVPATTFLGKKVVTASRFTQSNNKKSNGSFKVVAVKEDKQTDGD RWRGLAYDTSDDQQDITRGKGMVDSVFQAPMGTGTHNAVLSSYEYISQGLKQYNLDNMMDGLYIAPAFMD KLVVHITKNFLTLPNIKVPLILGIWGGKGQGKSFQCELVMAKMGINPIMMSAGELESGNAGEPAKLIRQR YREAADMIKKGKMCCLFINDLDAGAGRMGGTTQYTVNNQMVNATLMNIADNPTNVQLPGMYNKEENARVP IIVTGNDFSTLYAPLIRDGRMEKFYWAPTREDRIGVCKGIFRTDNVKDEDIVTLVDQFPGQSIDFFGALR ARVYDDEVRKFVEGLGVEKIGKRLVNSREGPPVFEQPAMTLEKLMEYGNMLVMEQENVKRVQLADQYLNE AALGDANADAIGRGTFYGKAAQQVNLPVPEGCTDPQADNFDPTARSDDGTCVYNF

How can I output or extract only the longest ORF?

Any help will be highly appreciated.

RNA-Seq • 1.9k views
ADD COMMENT
0
Entering edit mode
4.7 years ago
Fatima ▴ 1000

These might help:

C: Trouble Finding ORFs in DNA Sequence

How to extract the longest orf?

You might be able to modify this script to pick the longest ORF among those with the similar header (ORF_ID)

https://stackoverflow.com/questions/29953448/python-finding-longest-sequence-from-fasta-file

ADD COMMENT

Login before adding your answer.

Traffic: 2258 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6