I would like to do a perl script which can count all word ocurrences in all multifasta files in a directory and the output looks like
Number of occurrences in >name_sequence is : #
I have tried with
for file in *.fasta; do perl -lne 'if($_ =~ /word/) { $a++;} ; END { print "No of words in the file:"; print $a}' $file; done
I also would like that the script count the sequences length, I put the next script in a for loop in shell
#!/usr/bin/perl -w
use strict;
open( FASTA, $ARGV[0] ) or die "Cannot open fasta file: $!\n";
my $total = 0; #create a counter set to zero
my $name = ''; #to record the name of the current sequence
while( <FASTA> )
{
chomp;
if( $_ =~ />/ )
{
if( $total > 0 )
{
print "$name $total \n";
}
$total = 0; #reset the counter
$name = $_;
}
else
{
$total = $total + length( $_ ); #add to the length
}
}
close( FASTA );
I saved the above code in a text file named count.pl an run the next code
for file in *.fna; do ./count.pl $file; done
I'd like to do these two task in one alone, but I don't know how to do it.
Looks like an assignment exercise. Is it one, erick_rc93?
Your Perl one-liner doesn't really count the number of occurrences of
word
, it counts the numbers of lines whereword
occurred. Test example, save it as test.fasta:The pattern
ATCG
occurs twice, but it will be counted only once.