So I am trying to write a perl script that gives me the number of Gs or Cs in the third codon position. I have started using a bioperl package but cannot get it to output the name of the sequences in the FASTA.
Also, I have been unsuccessful in extracting the third position thus far. I have got to the point of separating out three base pairs at a time. Here is my code:
#!/usr/bin/perl
use Bio::SeqIO;
$seqio_obj = Bio::SeqIO->new( -file => "./all_ORFS.fasta" , '-format' => 'Fasta');
while ($seq_obj = $seqio_obj->next_seq){
#print the sequence
$seq1 = $seq_obj->seq,"\n";
$gc = 0;
foreach ($seq1){
@chunk = unpack("A3" x (length($seq1)/3), $seq1);
while ($chunk){
if(/\w{2}(\w)/){
if($1 = "g"){
$gc + 1;
}
print $gc;
}
}
#print join("\n",@chunk)."\n";
}
}
Well, for starters...
$1 = "g"
is not the same as$1 == 'g'
. I'm pretty sure that$gc + 1
is not the right syntax either. Andif $chunk {
isn't going to work either, because I'm pretty sure you haven't initialized$chunk
. If you started your script withuse strict
that would be caught.