Program For Finding Orf And Corresponding Reading Frame
5
2
Entering edit mode
13.9 years ago
Woa ★ 2.9k

Hello All,

Can somebody recommend a standalone program that predicts the Open reading frames(ORF) from all six reading frames of a DNA sequence and also reports from which frame the ORFs are derived from? I think It can be parsed from teh FASTA header of EMBOSS SIXPACK's output. Please let me know if there are any better alternatives.

Thanks in advance

orf • 12k views
ADD COMMENT
0
Entering edit mode

This question could be a candidate for another "code golf" isn't it ? :-)

ADD REPLY
0
Entering edit mode

Sure, just need to open a new question and name it 'Code golf: Finding ORFs'. Want to add one? If you don't, I sure will ;)

ADD REPLY
0
Entering edit mode

Why is sixpack not suitable? Do you want the ORF DNA seq in FASTA with frame info in header?

ADD REPLY
0
Entering edit mode

Thanks all for the answers. I think SIXPACK is OK with me as it gives the ORF as well the frame info. in the FASTA header, as follows:

X13776_5_ORF15 Translation of X13776 in frame 5, ORF 15, threshold 1, 19aa QPTRNRTPRLRMKSSAHSR

However though GETORF gives the ORFS in all frames, the frame information is missing:

V00294_3 [465 - 49] (REVERSE SENSE) E. coli laci gene (codes for the lac repressor). RRNISAGSFHSNGILVIQRIVNDQPTDALREKIVHRRFTGFDAASFYHRHHHAGTQLIGA RFNRRDNLRRRVQGQTGGGNANQQRLFARQLLCHAVGNVIQLRHRRFHFFPRFRRNVAGL VHHAGNGLIRDTGILCDIV

ADD REPLY
0
Entering edit mode

Well, it's true, it is not given, but it is redundant. The reading frame is trivially computed from the the start,stop position. The difference between Sixpack and getORF is that sixpack is for prettyprinting short sequences while getOrf is for getting the Orf sequences of e.g. whole genome.

ADD REPLY
0
Entering edit mode

Thanks Michael, Can it be just computed from the start position like start_position % 3 =1->first frame, 2->second and 0->Third frame? The problem with SIXPACK is that it calculates for one sequence at a time and I've several thousands of them to calculate. Rather than creating thousands of files may be I'll be using GETORF and my FASTA library.

ADD REPLY
0
Entering edit mode

I would use the stop position, see my edit in my answer.

ADD REPLY
4
Entering edit mode
13.9 years ago
Michael 55k

Or use getORF from the EMBOSS package, available as an executable, web-service, or website.

There is nothing fancy about ORF finding, they don't need to be predicted, genes are predicted, they are simply found, as either any sequence that does not contain a stop codon and ends with a stop codon, or alternatively any sequence between a start and stop codon (in frame). The orf finding therefore automatically takes all 6 frames into account. getOrf supports both modes. Make sure to select the appropriate genetic code.

Edit: one simple way to calculate the frame in pseudo code given start and stop:

 if ( + strand, use the info in the header) 
   # start < stop would also work except for circular genome with orf spanning origin
   frame := (stop %modulo% 3) + 1
 else 
   frame := - (stop %modulo% 3)
   # actually with minus strand I am not 100% sure if that is the best way
ADD COMMENT
0
Entering edit mode

I think pretty much the same and use scripts of my own, including selenocysteine alternatives. But thing can get quite tricky if you're using eukaryotic genomic DNA.

ADD REPLY
3
Entering edit mode
13.9 years ago

Try ORFinder at http://www.bioinformatics.org/sms2/orf_find.html. This seems to give what you're requesting but I don't have test sequences handy to run a check.

ADD COMMENT
1
Entering edit mode
13.9 years ago

It might not be exactly what you want, but to find genes in prokaryotic DNA, glimmer3 works wonders. On eukaryotes...not so much.

Actually, this page has a pretty nifty list of candidates.

ADD COMMENT
0
Entering edit mode

It's sort of a bit more than requested, glimmer is a gene prediction program, only a small fraction of all ORFs are really protein coding

ADD REPLY
1
Entering edit mode
13.9 years ago

Orf finder will give all six possibilities of protein translations from your dna sequence. they are probabilities for coding. bt if you want to know the exact protein coding region from your gene sequence do a blastx search from nr or swissprot database.

http://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastx&BLAST_PROGRAMS=blastx&PAGE_TYPE=BlastSearch&SHOW_DEFAULTS=on&LINK_LOC=blasthome

ADD COMMENT
0
Entering edit mode
13.9 years ago
Elena ▴ 250

you can use GeneScan: A context independent gene finding program

ADD COMMENT
0
Entering edit mode

no, this is a gene prediction prog not a ORF finder

ADD REPLY

Login before adding your answer.

Traffic: 1380 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6