How To Count Aminoacids From A Text File In Biopython?
1
1
Entering edit mode
13.1 years ago
Hpk ▴ 60

Hi,

How to count aminoacids from a text file ? My file contains 1200 sequences. After getting amino acid composition, I have to create a Bar plot. How to do these things in Bio python?

I parsed the sequence file in to Biopython. I tried the following code to get amino acid composition. I have to get the composition for each sequence. But I got the composition only for the first sequence.

from Bio.SeqUtils.ProtParam import ProteinAnalysis

x = ProteinAnalysis("cc.fasta") // cc.fasta is my sequence file.I have already parsed this to biopython.

print x.count_amino_acids()

Please help to solve this problem.

biopython amino-acids • 12k views
ADD COMMENT
8
Entering edit mode

The Biopython tutorial is a great place to get started; specifically the section on parsing sequences: http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc47 For plotting, matplotlib is a popular choice: http://matplotlib.sourceforge.net/

ADD REPLY
5
Entering edit mode
13.1 years ago

Your updated code does not actually parse the sequence file. The Tutorial section I linked above is useful way to become more familiar with using SeqIO. Here is a short snippet which uses SeqIO and a loop to apply count_amino_acids to each sequence in a file:

import collections
from Bio import SeqIO
from Bio.SeqUtils.ProtParam import ProteinAnalysis

all_aas = collections.defaultdict(int)
for rec in SeqIO.parse("cc.fasta", "fasta"):
    x = ProteinAnalysis(str(rec.seq))
    print rec.id, x.count_amino_acids()
    for aa, count in x.count_amino_acids().iteritems():
        all_aas[aa] += count
ADD COMMENT
0
Entering edit mode

Thank you very much for your answer.Do you know how to calculate the total number each amino acid from all sequences? My aim is to draw a bar chart.

ADD REPLY
0
Entering edit mode

Thank you very much for your answer. Do you know how to calculate the total number of each amino acid from all sequences? My aim is to create a bar chart.

ADD REPLY
0
Entering edit mode

You need a global dictionary where you increment the counts for each amino acid in each sequence. I edited the code with an example.

ADD REPLY
0
Entering edit mode

I tried your code. But I got the same output as that of previous code.I would like to get the output like this A=45(total number of alanine residues from my dataset), C=12 -----etc.

ADD REPLY
0
Entering edit mode

The change is collecting the total counts, that you want, in the 'all_aas' dictionary. Try a print all_aas line at the end. Depending on your Python experience, it might be worth becoming more familiar with Python data structures; this is a good book to get started with:http://learnpythonthehardway.org/book/

ADD REPLY
0
Entering edit mode

Thank you very much.

ADD REPLY

Login before adding your answer.

Traffic: 2650 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6