Entering edit mode
4.8 years ago
annushreekurmi
•
0
Hello everyone,
I am a novice in Python and I have been trying to compute the number of A's occurring in 3rd codon position in a multifasta file. I have written the following code but it does not work.
input_file = open('MG1655.FAS', 'r')
output_file = open('A3_counts.tsv','w')
output_file.write('Gene\tA3\n')
from Bio import SeqIO
for cur_record in SeqIO.parse(input_file, "fasta") :
A3count = 0
gene_name = cur_record.name
sequence = cur_record.seq
for i in range(0, len(sequence), 3):
if sequence[i:i+3]=='a':
A3count = A3count+1
output = '%s\t%i\t' % (gene_name, A3count)
output_file.write(output)
output_file.close()
input_file.close()
Please help me with this problem.
Currently you're asking if the whole codon is
"a"
(why lowercase?), you need to asklen(sequence[i:i+3]) == 3 and sequence[i:i+3][2].upper() == 'A'
.Also, right now you're printing
A3count
each time you encounter a codon ending with 'A'.Thanks for the reply. But it does not work. I also tried the following: DNA="ATGCCCGTG" start = 0 codons=[DNA[start:start+3]] for start in range(0,len(DNA),3): print(codons)
But the output I get is ATG, ATG, ATG Can you help me with this?
do
instead
i.e. enclose the whole expression in square brackets, not just the
DNA[start:start+3]