Reading frames in python
0
0
Entering edit mode
4.1 years ago
Gonçalo • 0

Hello everyone, I have to write a function that takes a sequence of nucleotides and outputs a dictionary with the translation in all possible reading frames. The keys of the dic should be x1, x2, x3 for the forward frames and y1,y2,y3 for the reverse reading frames and the value of each key the translation of the sequence corresponding to the reading frame. I do not need to complement the the sequence when computing the reverse reading frame, just reverse it, and use a * to represent stop condone. I think I am on the right path but I have been trying to create a loop to go through each key of the dictionary but I'm struggling a lot. Any help would be very much appreciated. I am working with a small sequence and will create the function at the end.

This is what I have got:

sequence = "ATGACAGTAGACAGATAGGGGACAGT"
position = 0    
protein= ""    
dictionary= {}    
gencode = {
          'ATA':'I', 'ATC':'I', 'ATT':'I', 'ATG':'M',
          'ACA':'T', 'ACC':'T', 'ACG':'T', 'ACT':'T',
          'AAC':'N', 'AAT':'N', 'AAA':'K', 'AAG':'K',
          'AGC':'S', 'AGT':'S', 'AGA':'R', 'AGG':'R',
          'CTA':'L', 'CTC':'L', 'CTG':'L', 'CTT':'L',
          'CCA':'P', 'CCC':'P', 'CCG':'P', 'CCT':'P',
          'CAC':'H', 'CAT':'H', 'CAA':'Q', 'CAG':'Q',
          'CGA':'R', 'CGC':'R', 'CGG':'R', 'CGT':'R',
          'GTA':'V', 'GTC':'V', 'GTG':'V', 'GTT':'V',
         'GCA':'A', 'GCC':'A', 'GCG':'A', 'GCT':'A',
         'GAC':'D', 'GAT':'D', 'GAA':'E', 'GAG':'E',
         'GGA':'G', 'GGC':'G', 'GGG':'G', 'GGT':'G',
         'TCA':'S', 'TCC':'S', 'TCG':'S', 'TCT':'S',
         'TTC':'F', 'TTT':'F', 'TTA':'L', 'TTG':'L',
         'TAC':'Y', 'TAT':'Y', 'TAA':'*', 'TAG':'*',
         'TGC':'C', 'TGT':'C', 'TGA':'*', 'TGG':'W'}

ist_dna = list(sequence)    

x1 = "".join(list_dna)
x2 = "".join(list_dna[1:])
x3 = "".join(list_dna[2:])

list_dna.reverse()

y1 = "".join(list_dna)

y2 = "".join(list_dna[1:])

y3 = "".join(list_dna[2:])

dictionary[x1]=""

dictionary[x2]=""

dictionary[x3]=""

dictionary[y1]=""

dictionary[y2]=""

dictionary[y3]=""


while position +3<=len(sequence):  #not sure to proceed from here
translation python • 1.5k views
ADD COMMENT
0
Entering edit mode

I think you may be overthinking/complicating the task, but it appears this is an assignment so I'm afraid we can't give fully functional code ;)

You need to approach the problem in logical steps, which will broadly look like:

Iterate over a string, capturing 3 characters at once, with an offset.

  • This might look something like for i in range(0, len(seq), 3): ...
  • Capture the codons: x1 = seq[i:i+3], x2 = seq[i+1:i+4]...etc
  • Use the codons to look up the translation (gencode[x1])

You will need to do some extra fiddling to deal with stop codons and what happens when you reach the end of the sequence and it isn't an exact multiple of 3.

It's worth mentioning too that this is quite a common assignment/challenge so you should find no shortage of solutions on here or stackoverflow.

ADD REPLY
0
Entering edit mode

That makes sense thank you very much.

ADD REPLY
0
Entering edit mode

Only add answers when you're answering the principal question. Otherwise, use Add Comment or Add Reply as appropriate.

ADD REPLY

Login before adding your answer.

Traffic: 2307 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6