Include STOP codon in getorf output
1
0
Entering edit mode
5.8 years ago
lokdeep17 • 0

I am using getorf from EMBOSS. I want to include the STOP codon in my output file in the nucleotide format. But no matter what I try, the output is without STOP codon. Any suggestions? Example: My input file (named as input.fa) is:

>a
TAGATCCTTTCTTTCTTGTCTCTATATTACAAGGGAGTACAAAAAAGGATTATGAATATATGAAAAGAAAATTTTGAAGATAAATAAAACGGCAATTTACGTACCTAGAACAATGGCAGGACGTACTCAGGCCCTGTCCCAATCAAACATGCCATGCACATACATCTATTCTTGAAATCTCAAGGGAGACTATTTTCTAAAAAGCACCAGATTTTTTCAATTGAACATAACAGGCAACAAATGGAATAGCAAATCCAACAGCGAAGAACCCAAAATGAGAAAGAGCATAGGGCGTCTTTCTACCTTTGACTTTGAATGGAATATTCTCATACACACCATCTTTGAAATGGACTGATCTCTTCATGATAATTGGTCTAGCCATAATATTGCTACTTCTCTTTGCTGCGGTTCTAATCATTTGTTGGCACAACATTGTGTATTGAGTATGACTTCTTCTCTATTTAATTGATATGTTGTATGCTTTCTTGAAATCAGTAGACTATAAGATCGTTCTTGTAAATCATTAATCTAACCTTATGAGTTATGCTGTGGTCAATCTTTATTTTCTGTTTTTCTTGATCCCCTAGCTCTTCCGTAAACACCGAACACTTTCTCTCACATGATTGGTGCAAA

output file looks like (named as longest.fa) :

>a_3 [433 - 200] (REVERSE SENSE) 
ATGTTGTGCCAACAAATGATTAGAACCGCAGCAAAGAGAAGTAGCAATATTATGGCTAGA
CCAATTATCATGAAGAGATCAGTCCATTTCAAAGATGGTGTGTATGAGAATATTCCATTC
AAAGTCAAAGGTAGAAAGACGCCCTATGCTCTTTCTCATTTTGGGTTCTTCGCTGTTGGA
TTTGCTATTCCATTTGTTGCCTGTTATGTTCAATTGAAAAAATCTGGTGCTTTT

as you can see, this does not have a stop codon in the end. the command that I used is: hmmer2go getorf -i input.fa -o longest.fa -t3 I want stop codon to be included.

emboss getorf orf STOP • 2.7k views
ADD COMMENT
0
Entering edit mode
hmmer2go getorf --man

Does it have such option?

ADD REPLY
0
Entering edit mode

I couldn't either. Weird they didn't give the option

ADD REPLY
0
Entering edit mode

I doubt many people use it for protein prediction these days..

ADD REPLY
0
Entering edit mode

If there's some other ORF finder to do this job, please suggest it to me.

ADD REPLY
0
Entering edit mode

I haven't done this for quite a few years, but back then I swore by https://github.com/hyattpd/Prodigal

ADD REPLY
0
Entering edit mode

are you looking for true ORFs as in genes or simply ORFs between two stopcodons?

Perhaps get_longest_orf, TransDecoder, FrameD, ... can be of use (most will look for true genes though)

ADD REPLY
0
Entering edit mode

I tried get_longest_orf.pl, it also does not include STOP codon in the output.

ADD REPLY
1
Entering edit mode

you are correct. and as it turns out it itself is based on EMBOSS so to be expected behavior (and thus my bad to list that one)

ADD REPLY
0
Entering edit mode

Prodigal is a great tool but for microbes.

ADD REPLY
0
Entering edit mode

Why do you think it's just for microbes?

-g:  Specify a translation table to use (default 11).
ADD REPLY
0
Entering edit mode

It's not just the translation table, it's identifying TSS and other stuff. Take a look at the paper

ADD REPLY
1
Entering edit mode
24 months ago
pmcarlton ▴ 30

I was just searching for the same question, and I'll post my solution here in case it helps anyone. It uses extractseq from EMBOSS and perl.

The output of getorf gives each sequence a name that includes the start and stop locations within the original file.

So, if you add 3 to the stop location, and send those coordinates to extractseq you should get the sequence plus the stop codon.

My code is here, assuming "ORFs.fa" is the fasta file generated by getorf and dna.fa is the file you originally passed to getorf:

grep '>' ORFs.fa | tr -d '>[]' | perl -ane '$n=$F[0];$n =~ s/_\d+$//; $s=$F[1];$e=$F[3]+3;print "extractseq -seq dna.fa:$n -reg $s,$e -out stdout\n"' > orfplusstop-get.sh```

then executing "sh orfplusstop-get.sh > orfsplusstop.fa" should get what you want.

ADD COMMENT

Login before adding your answer.

Traffic: 2312 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6