Hello All,
Can somebody recommend a standalone program that predicts the Open reading frames(ORF) from all six reading frames of a DNA sequence and also reports from which frame the ORFs are derived from? I think It can be parsed from teh FASTA header of EMBOSS SIXPACK's output. Please let me know if there are any better alternatives.
Thanks in advance
This question could be a candidate for another "code golf" isn't it ? :-)
Sure, just need to open a new question and name it 'Code golf: Finding ORFs'. Want to add one? If you don't, I sure will ;)
Why is sixpack not suitable? Do you want the ORF DNA seq in FASTA with frame info in header?
Thanks all for the answers. I think SIXPACK is OK with me as it gives the ORF as well the frame info. in the FASTA header, as follows:
However though GETORF gives the ORFS in all frames, the frame information is missing:
Well, it's true, it is not given, but it is redundant. The reading frame is trivially computed from the the start,stop position. The difference between Sixpack and getORF is that sixpack is for prettyprinting short sequences while getOrf is for getting the Orf sequences of e.g. whole genome.
Thanks Michael, Can it be just computed from the start position like start_position % 3 =1->first frame, 2->second and 0->Third frame? The problem with SIXPACK is that it calculates for one sequence at a time and I've several thousands of them to calculate. Rather than creating thousands of files may be I'll be using GETORF and my FASTA library.
I would use the stop position, see my edit in my answer.