I have this question where we need to write a code that takes a protein fasta file and the protein sequence identifier, and counts all the possible RNA combinations for the sequence in the fasta file, with a condition that the total of combinations should be less than 5000.
I started with making an RNA codons dictionary, then I made a function that puts the elements of the fasta file (amino acids) into a list, then I tried to do combinations from that list, but I get an error and I tried but didn't know where is the problem, if anyone can check the code and tell me whats wrong I would be gratefull
import itertools
RNA_codon_table = {
'A': ('GCU', 'GCC', 'GCA', 'GCG'),
'C': ('UGU', 'UGC'),
'D': ('GAU', 'GAC'),
'E': ('GAA', 'GAG'),
'F': ('UUU', 'UUC'),
'G': ('GGU', 'GGC', 'GGA', 'GGG'),
'H': ('CAU', 'CAC'),
'I': ('AUU', 'AUC', 'AUA'),
'K': ('AAA', 'AAG'),
'L': ('UUA', 'UUG', 'CUU', 'CUC', 'CUA', 'CUG'),
'M': ('AUG',),
'N': ('AAU', 'AAC'),
'P': ('CCU', 'CCC', 'CCA', 'CCG'),
'Q': ('CAA', 'CAG'),
'R': ('CGU', 'CGC', 'CGA', 'CGG', 'AGA', 'AGG'),
'S': ('UCU', 'UCC', 'UCA', 'UCG', 'AGU', 'AGC'),
'T': ('ACU', 'ACC', 'ACA', 'ACG'),
'V': ('GUU', 'GUC', 'GUA', 'GUG'),
'W': ('UGG',),
'Y': ('UAU', 'UAC'),}
def protein_fasta (protein_file):
protein_sequence = []
protein = SeqIO.parse(protein_file, format = 'fasta')
for Seqrecord in protein:
protein_sequence.append(Seqrecord.seq)
print (protein_sequence)
for seq in protein_sequence:
codons = [ list(RNA_codon_table[key]) for key in protein_sequence ]
print(list(itertools.product(codons)))
I don't know how to attach a fasta file, but this is the sequence inside :
seq_compl complete sequence IEEATHMTPCYELHGLRWVQIQDYAINVMQCL
this is the error I get:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-65-3dd46947c505> in <module>
----> 1 all_combinations ('short_protein.fasta')
<ipython-input-64-45a50fffc1d9> in all_combinations(protein_file)
5 protein_sequence.append(Seqrecord.seq)
6
----> 7 codons = [ list(RNA_codon_table[key]) for key in protein_sequence
]
8 print(list(itertools.product(codons)))
<ipython-input-64-45a50fffc1d9> in <listcomp>(.0)
5 protein_sequence.append(Seqrecord.seq)
6
----> 7 codons = [ list(RNA_codon_table[key]) for key in protein_sequence
]
8 print(list(itertools.product(codons)))
KeyError: Seq('IEEATHMTPCYELHGLRWVQIQDYAINVMQCL')
Thank you