A3 count (count of A occurring in 3rd codon position) in multifasta file
0
0
Entering edit mode
4.8 years ago

Hello everyone,

I am a novice in Python and I have been trying to compute the number of A's occurring in 3rd codon position in a multifasta file. I have written the following code but it does not work.

input_file = open('MG1655.FAS', 'r')
output_file = open('A3_counts.tsv','w')
output_file.write('Gene\tA3\n')
from Bio import SeqIO
for cur_record in SeqIO.parse(input_file, "fasta") :
      A3count = 0
      gene_name = cur_record.name
      sequence = cur_record.seq
      for i in range(0, len(sequence), 3):
             if sequence[i:i+3]=='a':
                     A3count = A3count+1
                     output = '%s\t%i\t' % (gene_name, A3count)
                     output_file.write(output)
output_file.close()
input_file.close()

Please help me with this problem.

sequence • 1.1k views
ADD COMMENT
0
Entering edit mode

Currently you're asking if the whole codon is "a" (why lowercase?), you need to ask len(sequence[i:i+3]) == 3 and sequence[i:i+3][2].upper() == 'A'.

ADD REPLY
0
Entering edit mode

Also, right now you're printing A3count each time you encounter a codon ending with 'A'.

ADD REPLY
0
Entering edit mode

Thanks for the reply. But it does not work. I also tried the following: DNA="ATGCCCGTG" start = 0 codons=[DNA[start:start+3]] for start in range(0,len(DNA),3): print(codons)

But the output I get is ATG, ATG, ATG Can you help me with this?

ADD REPLY
0
Entering edit mode

do

[DNA[start:start+3] for start in range(0, len(DNA) - 3, 3)]

instead

i.e. enclose the whole expression in square brackets, not just the DNA[start:start+3]

ADD REPLY

Login before adding your answer.

Traffic: 2107 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6