I am using getorf from EMBOSS. I want to include the STOP codon in my output file in the nucleotide format. But no matter what I try, the output is without STOP codon. Any suggestions? Example: My input file (named as input.fa) is:
>a
TAGATCCTTTCTTTCTTGTCTCTATATTACAAGGGAGTACAAAAAAGGATTATGAATATATGAAAAGAAAATTTTGAAGATAAATAAAACGGCAATTTACGTACCTAGAACAATGGCAGGACGTACTCAGGCCCTGTCCCAATCAAACATGCCATGCACATACATCTATTCTTGAAATCTCAAGGGAGACTATTTTCTAAAAAGCACCAGATTTTTTCAATTGAACATAACAGGCAACAAATGGAATAGCAAATCCAACAGCGAAGAACCCAAAATGAGAAAGAGCATAGGGCGTCTTTCTACCTTTGACTTTGAATGGAATATTCTCATACACACCATCTTTGAAATGGACTGATCTCTTCATGATAATTGGTCTAGCCATAATATTGCTACTTCTCTTTGCTGCGGTTCTAATCATTTGTTGGCACAACATTGTGTATTGAGTATGACTTCTTCTCTATTTAATTGATATGTTGTATGCTTTCTTGAAATCAGTAGACTATAAGATCGTTCTTGTAAATCATTAATCTAACCTTATGAGTTATGCTGTGGTCAATCTTTATTTTCTGTTTTTCTTGATCCCCTAGCTCTTCCGTAAACACCGAACACTTTCTCTCACATGATTGGTGCAAA
output file looks like (named as longest.fa) :
>a_3 [433 - 200] (REVERSE SENSE)
ATGTTGTGCCAACAAATGATTAGAACCGCAGCAAAGAGAAGTAGCAATATTATGGCTAGA
CCAATTATCATGAAGAGATCAGTCCATTTCAAAGATGGTGTGTATGAGAATATTCCATTC
AAAGTCAAAGGTAGAAAGACGCCCTATGCTCTTTCTCATTTTGGGTTCTTCGCTGTTGGA
TTTGCTATTCCATTTGTTGCCTGTTATGTTCAATTGAAAAAATCTGGTGCTTTT
as you can see, this does not have a stop codon in the end.
the command that I used is:
hmmer2go getorf -i input.fa -o longest.fa -t3
I want stop codon to be included.
Does it have such option?
I couldn't either. Weird they didn't give the option
I doubt many people use it for protein prediction these days..
If there's some other ORF finder to do this job, please suggest it to me.
I haven't done this for quite a few years, but back then I swore by https://github.com/hyattpd/Prodigal
are you looking for true ORFs as in genes or simply ORFs between two stopcodons?
Perhaps get_longest_orf, TransDecoder, FrameD, ... can be of use (most will look for true genes though)
I tried get_longest_orf.pl, it also does not include STOP codon in the output.
you are correct. and as it turns out it itself is based on EMBOSS so to be expected behavior (and thus my bad to list that one)
Prodigal is a great tool but for microbes.
Why do you think it's just for microbes?
It's not just the translation table, it's identifying TSS and other stuff. Take a look at the paper