Question

Python Script To Translate Rna Sequences To Protein Sequences

4

Entering edit mode

14.8 years ago

Studentguy ▴ 70

Write a Python script that translates two genes in an RNA sequence into their protein sequence and prints them. Each gene begins with an AUG from the left and ends in UAG and has a length that is a multiple of three. However, the RNA sequence length may not be a multiple of three and there may be more than one "UAG" or "AUG" in the sequence.

For example if the input is

human ACAUGCUAGAAUAGCCGCAUGUACUAGUUAA

with open ("p:/dna.txt", "r") as myfile:
    data=myfile.readlines()

map = {"UUU":"F", "UUC":"F", "UUA":"L", "UUG":"L",
    "UCU":"S", "UCC":"s", "UCA":"S", "UCG":"S",
    "UAU":"Y", "UAC":"Y", "UAA":"STOP", "UAG":"STOP",
    "UGU":"C", "UGC":"C", "UGA":"STOP", "UGG":"W",
    "CUU":"L", "CUC":"L", "CUA":"L", "CUG":"L",
    "CCU":"P", "CCC":"P", "CCA":"P", "CCG":"P",
    "CAU":"H", "CAC":"H", "CAA":"Q", "CAG":"Q",
    "CGU":"R", "CGC":"R", "CGA":"R", "CGG":"R",
    "AUU":"I", "AUC":"I", "AUA":"I", "AUG":"M",
    "ACU":"T", "ACC":"T", "ACA":"T", "ACG":"T",
    "AAU":"N", "AAC":"N", "AAA":"K", "AAG":"K",
    "AGU":"S", "AGC":"S", "AGA":"R", "AGG":"R",
    "GUU":"V", "GUC":"V", "GUA":"V", "GUG":"V",
    "GCU":"A", "GCC":"A", "GCA":"A", "GCG":"A",
    "GAU":"D", "GAC":"D", "GAA":"E", "GAG":"E",
    "GGU":"G", "GGC":"G", "GGA":"G", "GGG":"G",}

DNA=data[1]
flag = 1
while flag:
    start = DNA.find('AUG')
    if start == -1:
        flag = 0
    else:
        done = 0
        while done!= 0:
            i = start
            codon = DNA[1:i+3]
            if codon == "UAG":
                stop = i
                protein = translate(DNA(start))
                DNA = DNA[stop:]
                done = 1
                print(protein)

then the output should be

MLE MY

I have this so far... http://dpaste.org/v2e9/ can anyone help out?

python biopython translation • 71k views

ADD COMMENT • link updated 11.9 years ago by viv_bio ▴ 50 • written 14.8 years ago by Studentguy ▴ 70

2

Entering edit mode

@Simon: while I sometimes feel irked when I see a question that seems to be taken right out of a homework I think in the end is not our job to police this. Plus we may be wrong in our assumptions. So I would leave this up to everyone's individual opinion on whether they would want to answer it or not. A great solution to an answer lives on and will continue to provide value beyond the original poster's needs.

ADD REPLY • link 14.8 years ago by Istvan Albert 102k

1

Entering edit mode

I do agree, however there is for that question a partial solution if you follow the link to the OP's 'dpaste' page. Here is a thought. With enough googling, the StudentGuy will come up with an already made up solution anyways, most probably using Biopython, which he will likely not understand and which will be too much high order (using ready made package) to have much teaching value. At least here the OP did a part of the work and is ready to interact with people who likely will teach him something. A better developped question and including the code right here might have been better. Cheers

ADD REPLY • link 14.8 years ago by Eric Normandeau 11k

0

Entering edit mode

I'd be interested in other moderators opinion of homework questions? I think proof of a reasonable stab at a solution would be a good thing, rather than 'do my homework for me' style questions.

ADD REPLY • link 14.8 years ago by Simon Cockell 7.4k

0

Entering edit mode

just need to find an otherwise permissive license that prohibits copy-paste use into a homework solution

ADD REPLY • link 14.8 years ago by brentp 24k

0

Entering edit mode

@brentp: The people who are copy-pasting homework probably aren't reading enough to look at the licenses anyways.

ADD REPLY • link 14.8 years ago by Will 4.6k

0

Entering edit mode

some consolation: I appreciates the frankness to write it as HW!!

ADD REPLY • link 14.8 years ago by Rm 8.3k

0

Entering edit mode

I am not really looking for handouts but I am new to programming and am occasionally very confused with python. Most uni's now require all bio majors to be thrown into the pit with a BNFO class since even with limited knowledge in the subject it is a valuable skill to have. that being said all of our teachers here are pretty seasoned vets in several languages and sometimes its hard to connect to "my first programming class"

i posted more below in the answers section :D

ADD REPLY • link 14.8 years ago by Studentguy ▴ 70

Ram · Answer 1 · 2010-10-14

homework? well, i'll answer anyway using biopython.

from Bio.Seq import Seq
from Bio.Alphabet import generic_rna

# add your own logic here to parse the rna sequence from the file.
# split on start codon. drop the part preceding the 1st start codon, 
# then for each chunk, translate to the stop codon. then join and print.
print " ".join((str(Seq("AUG" + rest, generic_rna).translate(to_stop=True))
               for rest in "ACAUGCUAGAAUAGCCGCAUGUACUAGUUAA".split("AUG")[1:]))

Ram · Answer 2 · 2010-10-14

I'm not a python guy but the following script does the job with dna.txt=

>Human
ACAUGCUAGAAUAGCCGCAUGUACUAGUUAA

src: you just need to check if there is at least 3 bases available after the current position:

with open ("dna.txt", "r") as myfile:
    data=myfile.readlines()

map = {"UUU":"F", "UUC":"F", "UUA":"L", "UUG":"L",
       "UCU":"S", "UCC":"s", "UCA":"S", "UCG":"S",
       "UAU":"Y", "UAC":"Y", "UAA":"STOP", "UAG":"STOP",
       "UGU":"C", "UGC":"C", "UGA":"STOP", "UGG":"W",
       "CUU":"L", "CUC":"L", "CUA":"L", "CUG":"L",
       "CCU":"P", "CCC":"P", "CCA":"P", "CCG":"P",
       "CAU":"H", "CAC":"H", "CAA":"Q", "CAG":"Q",
       "CGU":"R", "CGC":"R", "CGA":"R", "CGG":"R",
       "AUU":"I", "AUC":"I", "AUA":"I", "AUG":"M",
       "ACU":"T", "ACC":"T", "ACA":"T", "ACG":"T",
       "AAU":"N", "AAC":"N", "AAA":"K", "AAG":"K",
       "AGU":"S", "AGC":"S", "AGA":"R", "AGG":"R",
       "GUU":"V", "GUC":"V", "GUA":"V", "GUG":"V",
       "GCU":"A", "GCC":"A", "GCA":"A", "GCG":"A",
       "GAU":"D", "GAC":"D", "GAA":"E", "GAG":"E",
       "GGU":"G", "GGC":"G", "GGA":"G", "GGG":"G",}

DNA=data[1].strip()
start = DNA.find('AUG')
if start!= -1:
    while start+2 < len(DNA):
        codon = DNA[start:start+3]
        if codon == "UAG": break;
        print(map[codon])
        start+=3

score 2 · Answer 3 · 2010-10-14

Thank you all for the fast responses, and to those who help i greatly appreciate it, i ended up with an alternate solution.

I am not really looking for handouts but I am new to programming and am occasionally very confused with python. Most uni's now require all bio majors to be thrown into the pit with a BNFO class since even with limited knowledge in the subject it is a valuable skill to have. that being said all of our teachers here are pretty seasoned vets in several languages and sometimes its hard to connect to "my first programming class"

heres the code i ended up using, it did away with that loop structure i couldnt get and kinda cheats by using find/rfind two narrow down two sequences, it wont work for more than two sequences but it does the job none the less

http://dpaste.org/RyUA/

Ram · Answer 4 · 2013-09-12

from Bio import SeqIO , Seq
from Bio.SeqRecord import SeqRecord

def make_trans_record(record):
    "Returns a new seqrecord with translated sequences"
    return SeqRecord(seq = record.seq[350:-103].translate(),\
                     id = record.id,\
                     description = "")

Input = raw_input("Enter File location of the nucleotide sequence :")
Output = raw_input("Output file location and name :")

records = map(make_trans_record,SeqIO.parse(Input,"fasta"))

SeqIO.write(records,Output,"fasta")

Ram · Answer 5 · 2011-10-27

0

Entering edit mode

13.8 years ago

User 4133 ▴ 150

Probably this can help.

It is in Italian, but the python code is in English.

Bye Wollo

ADD COMMENT • link updated 5.9 years ago by Ram 45k • written 13.8 years ago by User 4133 ▴ 150