How to count the length of a fasta sequence and number of occurrence of a particular alphabet in the sequence from an input file?
1
0
Entering edit mode
5.1 years ago
d.kaur • 0

I have a list of protein sequences in a fasta file, where each entry has a header followed by the actual sequence and separated from the next entry with a new line. I want a script that reads the file that contains the sequences and outputs the length of each sequence and the number of times the alphabet A occurs in that sequence. I got the following script on Biostars from onuralp, which reads the file and gives me the length of each sequence. How can I modify the script for it to give me the number of 'A's in that sequence? Sequence length from Fasta

fasta protein-sequence • 1.5k views
ADD COMMENT
0
Entering edit mode

Is this a homework question?

ADD REPLY
0
Entering edit mode

No, I'm doing a project on the analysis of the amino acids in the proteins of interest.

ADD REPLY
0
Entering edit mode
5.1 years ago
Mensur Dlakic ★ 28k

You already have the ID and amino acid sequence extracted by this script. It is simply a matter of going through the sequence and increasing the counter each time a residue equals A. That would be 3-4 additional lines of code.

ADD COMMENT

Login before adding your answer.

Traffic: 1497 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6