Write a Python script that translates two genes in an RNA sequence into their protein sequence and prints them. Each gene begins with an AUG from the left and ends in UAG and has a length that is a multiple of three. However, the RNA sequence length may not be a multiple of three and there may be more than one "UAG" or "AUG" in the sequence.
For example if the input is
human ACAUGCUAGAAUAGCCGCAUGUACUAGUUAA
with open ("p:/dna.txt", "r") as myfile:
data=myfile.readlines()
map = {"UUU":"F", "UUC":"F", "UUA":"L", "UUG":"L",
"UCU":"S", "UCC":"s", "UCA":"S", "UCG":"S",
"UAU":"Y", "UAC":"Y", "UAA":"STOP", "UAG":"STOP",
"UGU":"C", "UGC":"C", "UGA":"STOP", "UGG":"W",
"CUU":"L", "CUC":"L", "CUA":"L", "CUG":"L",
"CCU":"P", "CCC":"P", "CCA":"P", "CCG":"P",
"CAU":"H", "CAC":"H", "CAA":"Q", "CAG":"Q",
"CGU":"R", "CGC":"R", "CGA":"R", "CGG":"R",
"AUU":"I", "AUC":"I", "AUA":"I", "AUG":"M",
"ACU":"T", "ACC":"T", "ACA":"T", "ACG":"T",
"AAU":"N", "AAC":"N", "AAA":"K", "AAG":"K",
"AGU":"S", "AGC":"S", "AGA":"R", "AGG":"R",
"GUU":"V", "GUC":"V", "GUA":"V", "GUG":"V",
"GCU":"A", "GCC":"A", "GCA":"A", "GCG":"A",
"GAU":"D", "GAC":"D", "GAA":"E", "GAG":"E",
"GGU":"G", "GGC":"G", "GGA":"G", "GGG":"G",}
DNA=data[1]
flag = 1
while flag:
start = DNA.find('AUG')
if start == -1:
flag = 0
else:
done = 0
while done!= 0:
i = start
codon = DNA[1:i+3]
if codon == "UAG":
stop = i
protein = translate(DNA(start))
DNA = DNA[stop:]
done = 1
print(protein)
then the output should be
MLE MY
I have this so far... http://dpaste.org/v2e9/ can anyone help out?
@Simon: while I sometimes feel irked when I see a question that seems to be taken right out of a homework I think in the end is not our job to police this. Plus we may be wrong in our assumptions. So I would leave this up to everyone's individual opinion on whether they would want to answer it or not. A great solution to an answer lives on and will continue to provide value beyond the original poster's needs.
I do agree, however there is for that question a partial solution if you follow the link to the OP's 'dpaste' page. Here is a thought. With enough googling, the StudentGuy will come up with an already made up solution anyways, most probably using Biopython, which he will likely not understand and which will be too much high order (using ready made package) to have much teaching value. At least here the OP did a part of the work and is ready to interact with people who likely will teach him something. A better developped question and including the code right here might have been better. Cheers
I'd be interested in other moderators opinion of homework questions? I think proof of a reasonable stab at a solution would be a good thing, rather than 'do my homework for me' style questions.
just need to find an otherwise permissive license that prohibits copy-paste use into a homework solution
@brentp: The people who are copy-pasting homework probably aren't reading enough to look at the licenses anyways.
some consolation: I appreciates the frankness to write it as HW!!
I am not really looking for handouts but I am new to programming and am occasionally very confused with python. Most uni's now require all bio majors to be thrown into the pit with a BNFO class since even with limited knowledge in the subject it is a valuable skill to have. that being said all of our teachers here are pretty seasoned vets in several languages and sometimes its hard to connect to "my first programming class"
i posted more below in the answers section :D