Python Script To Translate Rna Sequences To Protein Sequences
5
4
Entering edit mode
14.1 years ago
Studentguy ▴ 70

Write a Python script that translates two genes in an RNA sequence into their protein sequence and prints them. Each gene begins with an AUG from the left and ends in UAG and has a length that is a multiple of three. However, the RNA sequence length may not be a multiple of three and there may be more than one "UAG" or "AUG" in the sequence.

For example if the input is

human ACAUGCUAGAAUAGCCGCAUGUACUAGUUAA

with open ("p:/dna.txt", "r") as myfile:
    data=myfile.readlines()

map = {"UUU":"F", "UUC":"F", "UUA":"L", "UUG":"L",
    "UCU":"S", "UCC":"s", "UCA":"S", "UCG":"S",
    "UAU":"Y", "UAC":"Y", "UAA":"STOP", "UAG":"STOP",
    "UGU":"C", "UGC":"C", "UGA":"STOP", "UGG":"W",
    "CUU":"L", "CUC":"L", "CUA":"L", "CUG":"L",
    "CCU":"P", "CCC":"P", "CCA":"P", "CCG":"P",
    "CAU":"H", "CAC":"H", "CAA":"Q", "CAG":"Q",
    "CGU":"R", "CGC":"R", "CGA":"R", "CGG":"R",
    "AUU":"I", "AUC":"I", "AUA":"I", "AUG":"M",
    "ACU":"T", "ACC":"T", "ACA":"T", "ACG":"T",
    "AAU":"N", "AAC":"N", "AAA":"K", "AAG":"K",
    "AGU":"S", "AGC":"S", "AGA":"R", "AGG":"R",
    "GUU":"V", "GUC":"V", "GUA":"V", "GUG":"V",
    "GCU":"A", "GCC":"A", "GCA":"A", "GCG":"A",
    "GAU":"D", "GAC":"D", "GAA":"E", "GAG":"E",
    "GGU":"G", "GGC":"G", "GGA":"G", "GGG":"G",}

DNA=data[1]
flag = 1
while flag:
    start = DNA.find('AUG')
    if start == -1:
        flag = 0
    else:
        done = 0
        while done!= 0:
            i = start
            codon = DNA[1:i+3]
            if codon == "UAG":
                stop = i
                protein = translate(DNA(start))
                DNA = DNA[stop:]
                done = 1
                print(protein)

then the output should be

MLE MY

I have this so far... http://dpaste.org/v2e9/ can anyone help out?

python biopython translation • 70k views
ADD COMMENT
2
Entering edit mode

@Simon: while I sometimes feel irked when I see a question that seems to be taken right out of a homework I think in the end is not our job to police this. Plus we may be wrong in our assumptions. So I would leave this up to everyone's individual opinion on whether they would want to answer it or not. A great solution to an answer lives on and will continue to provide value beyond the original poster's needs.

ADD REPLY
1
Entering edit mode

I do agree, however there is for that question a partial solution if you follow the link to the OP's 'dpaste' page. Here is a thought. With enough googling, the StudentGuy will come up with an already made up solution anyways, most probably using Biopython, which he will likely not understand and which will be too much high order (using ready made package) to have much teaching value. At least here the OP did a part of the work and is ready to interact with people who likely will teach him something. A better developped question and including the code right here might have been better. Cheers

ADD REPLY
0
Entering edit mode

I'd be interested in other moderators opinion of homework questions? I think proof of a reasonable stab at a solution would be a good thing, rather than 'do my homework for me' style questions.

ADD REPLY
0
Entering edit mode

just need to find an otherwise permissive license that prohibits copy-paste use into a homework solution

ADD REPLY
0
Entering edit mode

@brentp: The people who are copy-pasting homework probably aren't reading enough to look at the licenses anyways.

ADD REPLY
0
Entering edit mode

some consolation: I appreciates the frankness to write it as HW!!

ADD REPLY
0
Entering edit mode

I am not really looking for handouts but I am new to programming and am occasionally very confused with python. Most uni's now require all bio majors to be thrown into the pit with a BNFO class since even with limited knowledge in the subject it is a valuable skill to have. that being said all of our teachers here are pretty seasoned vets in several languages and sometimes its hard to connect to "my first programming class"

i posted more below in the answers section :D

ADD REPLY
3
Entering edit mode
14.1 years ago
brentp 24k

homework? well, i'll answer anyway using biopython.

from Bio.Seq import Seq
from Bio.Alphabet import generic_rna

# add your own logic here to parse the rna sequence from the file.
# split on start codon. drop the part preceding the 1st start codon, 
# then for each chunk, translate to the stop codon. then join and print.
print " ".join((str(Seq("AUG" + rest, generic_rna).translate(to_stop=True))
               for rest in "ACAUGCUAGAAUAGCCGCAUGUACUAGUUAA".split("AUG")[1:]))
ADD COMMENT
2
Entering edit mode
14.1 years ago

I'm not a python guy but the following script does the job with dna.txt=

>Human
ACAUGCUAGAAUAGCCGCAUGUACUAGUUAA

src: you just need to check if there is at least 3 bases available after the current position:

with open ("dna.txt", "r") as myfile:
    data=myfile.readlines()

map = {"UUU":"F", "UUC":"F", "UUA":"L", "UUG":"L",
       "UCU":"S", "UCC":"s", "UCA":"S", "UCG":"S",
       "UAU":"Y", "UAC":"Y", "UAA":"STOP", "UAG":"STOP",
       "UGU":"C", "UGC":"C", "UGA":"STOP", "UGG":"W",
       "CUU":"L", "CUC":"L", "CUA":"L", "CUG":"L",
       "CCU":"P", "CCC":"P", "CCA":"P", "CCG":"P",
       "CAU":"H", "CAC":"H", "CAA":"Q", "CAG":"Q",
       "CGU":"R", "CGC":"R", "CGA":"R", "CGG":"R",
       "AUU":"I", "AUC":"I", "AUA":"I", "AUG":"M",
       "ACU":"T", "ACC":"T", "ACA":"T", "ACG":"T",
       "AAU":"N", "AAC":"N", "AAA":"K", "AAG":"K",
       "AGU":"S", "AGC":"S", "AGA":"R", "AGG":"R",
       "GUU":"V", "GUC":"V", "GUA":"V", "GUG":"V",
       "GCU":"A", "GCC":"A", "GCA":"A", "GCG":"A",
       "GAU":"D", "GAC":"D", "GAA":"E", "GAG":"E",
       "GGU":"G", "GGC":"G", "GGA":"G", "GGG":"G",}

DNA=data[1].strip()
start = DNA.find('AUG')
if start!= -1:
    while start+2 < len(DNA):
        codon = DNA[start:start+3]
        if codon == "UAG": break;
        print(map[codon])
        start+=3
ADD COMMENT
0
Entering edit mode

hi i ran your code but there seem to be a problem with your code. When i read a data file n characters are also read. And the start variable seem to always get the value -1 even when the sub-string is present in DNA.

ADD REPLY
0
Entering edit mode

when I run your code I get this error:

start, end = next_transcript(mRNA, cur_pos)
TypeError: 'NoneType' object is not iterable

I also noticed that cur_pos isn't defined until later line 43. Could this be the problem? I am also not sure why in DNA=data[1].strip() you call for the second item 1 in the string? How is your dna.txt formated? This clarification would be much appreciated. Thanks

ADD REPLY
2
Entering edit mode
14.1 years ago
Studentguy ▴ 70

Thank you all for the fast responses, and to those who help i greatly appreciate it, i ended up with an alternate solution.

I am not really looking for handouts but I am new to programming and am occasionally very confused with python. Most uni's now require all bio majors to be thrown into the pit with a BNFO class since even with limited knowledge in the subject it is a valuable skill to have. that being said all of our teachers here are pretty seasoned vets in several languages and sometimes its hard to connect to "my first programming class"

heres the code i ended up using, it did away with that loop structure i couldnt get and kinda cheats by using find/rfind two narrow down two sequences, it wont work for more than two sequences but it does the job none the less

http://dpaste.org/RyUA/

ADD COMMENT
5
Entering edit mode

Nice work with this. Your logic is sound and the major remaining issue to tackle is repetitiveness in the code. Whenever you repeat code with a few changes, you should focus on making that code into a function. Here's a next version of your code generalized with functions and a while loop. It's a combination of your first and second attempts. Notice how the functions allow you to avoid re-writing the same code multiple times: http://gist.github.com/626765

ADD REPLY
1
Entering edit mode
11.2 years ago
viv_bio ▴ 50
from Bio import SeqIO , Seq
from Bio.SeqRecord import SeqRecord

def make_trans_record(record):
    "Returns a new seqrecord with translated sequences"
    return SeqRecord(seq = record.seq[350:-103].translate(),\
                     id = record.id,\
                     description = "")

Input = raw_input("Enter File location of the nucleotide sequence :")
Output = raw_input("Output file location and name :")

records = map(make_trans_record,SeqIO.parse(Input,"fasta"))

SeqIO.write(records,Output,"fasta")
ADD COMMENT
0
Entering edit mode
13.1 years ago
User 4133 ▴ 150

Probably this can help.

It is in Italian, but the python code is in English.

Bye Wollo

ADD COMMENT

Login before adding your answer.

Traffic: 2369 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6