Question

How I can make an open reading frame finder with Python ?

0

Entering edit mode

8.7 years ago

Kevin_Smith ▴ 10

I need to write a program to find open reading frames in a DNA sequence. The program should take as input the provided sequences in FASTA format (“sequence_A.fa” and “sequence_B.fa”), and supply as output:

(1) The sizes of the potential ORFs greater than 30 amino acids from all 3 forward reading frames. (2) The translations into protein of the ORFs. (3) The ORF does not have to begin with an ATG, but should be any sequence of nucleotides that encodes a polypeptide of >30 amino acids. (4) Output a peptide each line with this format: frame #: length_of_peptide sequence_of_peptide . I have a code snippet to set up a Python dictionary of codons in a file called “codondictionary.py” which I can copy into the program.

I will appreciate very much any help for a Python script.

sequence gene python homework • 13k views

ADD COMMENT • link updated 8.7 years ago by WouterDeCoster 47k • written 8.7 years ago by Kevin_Smith ▴ 10

2

Entering edit mode

What have you tried? Post a link to your code.
You can do all of this with biopython.

ADD REPLY • link 8.7 years ago by Devon Ryan 104k

1

Entering edit mode

Some questions:

Why do you not consider an ATG important? I'm afraid you'll get a bunch of false positives.
Why only 3 reading frames and not 6?
Essentially you are looking for stretches of 90 or more nucleotides before a stop_codon and want these translated?
How big is your input data?

ADD REPLY • link 8.7 years ago by WouterDeCoster 47k

0

Entering edit mode

The ORF does not have to begin with an ATG, I just want to obtain any sequence of nucleotides that encodes a polypeptide of >30 amino acids. The size of the input is 1040 nucleotides. 3 reading frames is fine, I just want to see the ones in the forward direction. Thanks

ADD REPLY • link 8.7 years ago by Kevin_Smith ▴ 10

1

Entering edit mode

8.7 years ago

Daniel ★ 4.0k

Check out the "Python for Bioinformatics" book (look here for the section on protein translations) or Python for biologists here.

ADD COMMENT • link 8.7 years ago by Daniel ★ 4.0k

score 7 · Accepted Answer · 2016-03-29

7

Entering edit mode

8.7 years ago

WouterDeCoster 47k

I have my doubts about your biological question, but here you go: Save script as e.g. ORFfinder.py and execute as python ORFfinder.py yourinput.fasta

import sys
from Bio import SeqIO
from Bio import Seq

record = SeqIO.read(open(sys.argv[1]), "fasta")
#Create three reading frames in forward direction, offset 0, 1, 2
readingframes = [Seq.translate(record.seq[i:], table='Standard', stop_symbol='*', to_stop=False, cds=False) for i in range(3)]

results = []
for frame in readingframes:
    for peptide in frame.split('*'): #Split translation over stopcodons
        if len(peptide) > 30:
            results.append(peptide)

#Use PotentialORFs.txt as output, can be changed            
#Write length and translation to file
with open('PotentialORFs.txt', 'w') as output: 
    for peptide in results:
        output.write("{}\t{}\n".format(len(peptide), peptide))

ADD COMMENT • link 8.7 years ago by WouterDeCoster 47k

0

Entering edit mode

I got no module named Bio. What this mean? Thanks

ADD REPLY • link 8.7 years ago by Kevin_Smith ▴ 10

0

Entering edit mode

Is posible to do the program without biopython ?

ADD REPLY • link 8.7 years ago by Kevin_Smith ▴ 10

1

Entering edit mode

You absolutely require biopython for this code to work.

I hope you have pip installed to install python packages:

(sudo) pip install biopython

Alternatively, have a look here: http://biopython.org/wiki/Download

ADD REPLY • link 8.7 years ago by WouterDeCoster 47k

1

Entering edit mode

FYI, pip install --user biopython is usually preferable.

ADD REPLY • link 8.7 years ago by Devon Ryan 104k