I am very new to bioperl and am trying to write a script that counts the number of sequences, number of characters, %GC content and looks for leucine zipper motifs if possible.
Here is what I have come up with thus far
#!/usr/bin/perl -w
use Bio::SeqIO;
my $seqfile = "t4.fasta" ;
my $in = Bio::SeqIO->new(-format=>'fasta',
-file=> $seqfile );
my $count = 0;
while ( my $seq = $in->next_seq ) {
}
print "There are $count sequences\n";
I was wondering if anyone could give me some pointers on how to get the %GC content and how to find the motif?
Thanks a million!
Sorry to drag up such an old thread, but I've just found this as I need to do the same thing! I don't have much experience with python, but a specific question here: if you passed this script a multifasta, would the totalBP counter reflect the totalBP of the WHOLE multifasta, or would it count each 'sub-fasta' when calculating the gc content of each? I realise this is somewhat academic because you could just as easily pass this each fasta in sequence in a loop to be sure but I thought I'd ask.
the scripts runs on the entirety of the file and produces the total counts for all sequences