Hi all. I have a question regarding a perl script I have already made. I need it to be able to read fasta formatted files with multiple sequences. My script below can tell me the GC percent for one whole file not taking into account the headers sadly, needed some advice on going in the right direction (somewhat new to scripting).
#!/usr/bin/perl
use strict;
use warnings;
my $DNA_filename;
my @DNA;
my $cg;
my $total;
my $e;
my $DNA;
my $num;
my $outputfile;
print "type the filename of the DNA sequence data: ";
$DNA_filename = <STDIN>;
chomp $DNA_filename;
unless ( open(DNAFILE, $DNA_filename) ) {
print "Cannot open file \"$DNA_filename\"\n\n";
exit;
}
@DNA = <DNAFILE>;
close DNAFILE;
$DNA = join( '', @DNA);
$DNA =~ s/\s//g;
$cg = 0;$total = 0;$e = 0;
while ($DNA =~ /c/ig){$cg++}
while ($DNA =~ /g/ig){$cg++}
while ($DNA =~ /a/ig){$total++}
while ($DNA =~ /c/ig){$total++}
while ($DNA =~ /g/ig){$total++}
while ($DNA =~ /t/ig){$total++}
while ($DNA =~ /[^acgt]/ig){$e++}
print "CG=$cg total nucleotides = $total errors=$e\n";
$outputfile = "percent cg";
unless ( open(PERCENT_CG, ">$outputfile") ) {
print "Cannot open file \"$outputfile\"\n\n";
exit;
}
print PERCENT_CG "CG =$cg total nucleotides=$total errors=$e\n\n";
$num = ($cg / $total) *100 ;
print "$num\n\n";
close(PERCENT_CG);
exit;
I haven't tested any of this code for functionality but it should be close enough to get you started.
Break your GC content calulator into a sub routine and read through your files getting one record at a time. once you have a complete record in , pass it to your GC calculator. Something like this:
OR you can leverage a special variable in perl $\ , Go read perl var for more infor but something like this should get you started: