Hello everyone, I have to write a function that takes a sequence of nucleotides and outputs a dictionary with the translation in all possible reading frames. The keys of the dic should be x1, x2, x3 for the forward frames and y1,y2,y3 for the reverse reading frames and the value of each key the translation of the sequence corresponding to the reading frame. I do not need to complement the the sequence when computing the reverse reading frame, just reverse it, and use a * to represent stop condone. I think I am on the right path but I have been trying to create a loop to go through each key of the dictionary but I'm struggling a lot. Any help would be very much appreciated. I am working with a small sequence and will create the function at the end.
This is what I have got:
sequence = "ATGACAGTAGACAGATAGGGGACAGT"
position = 0
protein= ""
dictionary= {}
gencode = {
'ATA':'I', 'ATC':'I', 'ATT':'I', 'ATG':'M',
'ACA':'T', 'ACC':'T', 'ACG':'T', 'ACT':'T',
'AAC':'N', 'AAT':'N', 'AAA':'K', 'AAG':'K',
'AGC':'S', 'AGT':'S', 'AGA':'R', 'AGG':'R',
'CTA':'L', 'CTC':'L', 'CTG':'L', 'CTT':'L',
'CCA':'P', 'CCC':'P', 'CCG':'P', 'CCT':'P',
'CAC':'H', 'CAT':'H', 'CAA':'Q', 'CAG':'Q',
'CGA':'R', 'CGC':'R', 'CGG':'R', 'CGT':'R',
'GTA':'V', 'GTC':'V', 'GTG':'V', 'GTT':'V',
'GCA':'A', 'GCC':'A', 'GCG':'A', 'GCT':'A',
'GAC':'D', 'GAT':'D', 'GAA':'E', 'GAG':'E',
'GGA':'G', 'GGC':'G', 'GGG':'G', 'GGT':'G',
'TCA':'S', 'TCC':'S', 'TCG':'S', 'TCT':'S',
'TTC':'F', 'TTT':'F', 'TTA':'L', 'TTG':'L',
'TAC':'Y', 'TAT':'Y', 'TAA':'*', 'TAG':'*',
'TGC':'C', 'TGT':'C', 'TGA':'*', 'TGG':'W'}
ist_dna = list(sequence)
x1 = "".join(list_dna)
x2 = "".join(list_dna[1:])
x3 = "".join(list_dna[2:])
list_dna.reverse()
y1 = "".join(list_dna)
y2 = "".join(list_dna[1:])
y3 = "".join(list_dna[2:])
dictionary[x1]=""
dictionary[x2]=""
dictionary[x3]=""
dictionary[y1]=""
dictionary[y2]=""
dictionary[y3]=""
while position +3<=len(sequence): #not sure to proceed from here
I think you may be overthinking/complicating the task, but it appears this is an assignment so I'm afraid we can't give fully functional code ;)
You need to approach the problem in logical steps, which will broadly look like:
Iterate over a string, capturing 3 characters at once, with an offset.
for i in range(0, len(seq), 3): ...
x1 = seq[i:i+3]
,x2 = seq[i+1:i+4]
...etcgencode[x1]
)You will need to do some extra fiddling to deal with stop codons and what happens when you reach the end of the sequence and it isn't an exact multiple of 3.
It's worth mentioning too that this is quite a common assignment/challenge so you should find no shortage of solutions on here or stackoverflow.
That makes sense thank you very much.
Only add answers when you're answering the principal question. Otherwise, use
Add Comment
orAdd Reply
as appropriate.