I need to write a program to find open reading frames in a DNA sequence. The program should take as input the provided sequences in FASTA format (“sequence_A.fa” and “sequence_B.fa”), and supply as output:
(1) The sizes of the potential ORFs greater than 30 amino acids from all 3 forward reading frames. (2) The translations into protein of the ORFs. (3) The ORF does not have to begin with an ATG, but should be any sequence of nucleotides that encodes a polypeptide of >30 amino acids. (4) Output a peptide each line with this format: frame #: length_of_peptide sequence_of_peptide . I have a code snippet to set up a Python dictionary of codons in a file called “codondictionary.py” which I can copy into the program.
I will appreciate very much any help for a Python script.
Some questions:
The ORF does not have to begin with an ATG, I just want to obtain any sequence of nucleotides that encodes a polypeptide of >30 amino acids. The size of the input is 1040 nucleotides. 3 reading frames is fine, I just want to see the ones in the forward direction. Thanks