Hi All
I have been trying to write a script to take IUPAC compliment primers and pull out the corresponding subsequence from a full length (Aiming to select sequence regions from an amplicon).
The code works for direct matches but I am having trouble implementing the IUPAC module for generating a regexp on the fly. The code below only includes generating the forward primer regex at the moment. I also want to change the trimming portion to a substring argument but that's for when I've worked out how to find the character position of the match.
I have been mainly using Bio::Tools::IUPAC and Degenerate_primers but I have hit a brick wall, cant find any similar example scripts and would really appreciate any advice.
Thanks.
This is where I am so far:
#!/usr/bin/perl
use strict;
use warnings;
use Bio::Seq;
use Bio::SeqIO;
use Bio::Tools::IUPAC;
my $infile = $ARGV[0];
my $primer_f = $ARGV[1];
my $primer_r = $ARGV[2];
my $in = Bio::SeqIO->new(-file => "$infile", -format => 'fasta');
my $out = Bio::SeqIO->new(-file => ">out.fasta", -format => 'fasta');
#####
##Generate forward matching regex
#####
my $ambiseq_f = Bio::Seq->new(-seq => "$primer_f", -alphabet => 'dna');
# Create all possible non-degenerate sequences
my $iupac = Bio::Tools::IUPAC->new(-seq => $ambiseq_f);
while (my $uniqueseq = $iupac->next_seq()) {
# process the unique Bio::Seq object. <---This is from the module documentation, Dont know what referring to.
print $uniqueseq->seq . "\n";
}
my $regexp = $iupac->regexp();
#print "\n$regexp\n";
#####
##Trim to matching region
#####
while ( my $seq = $in->next_seq() ){
my $trim5 = 0;
my $trim53 = 0;
if ( $seq->seq =~ m/^.*$primer_f(.*)/i ){
$trim5 = $1;
}
if ($trim5 =~ m/^(.*)$primer_r.*/i ){
$trim53 = $1;
}
#print ">" . $seq->id . "\n" . $trim53 . "\n";
#$out->write_seq($trim53);
}
EDIT: removed an array - pretty sure that's not right but forgot to remove it before.