Hi , all I ask in other forum how to align two sequences in PERL, basically what I need is to find the complementary matching region between one sequence and the reverse complentary of the other one
Here is the code that I'm using
#!/usr/bin/perl
use warnings;
use strict;
sub complement {
$_[0] =~ y/CGAT/GCTA/;
return $_[0];
}
sub match {
my ($s1, $s2) = @_;
$s2 = reverse $s2;
complement $s2;
print "$s1\n";
my $s2l = length $s2;
for (my $length = $s2l; $length; $length--) { # start from the longest possible substring
for my $start (0 .. $s2l - $length) { # starting position of the matching substring
my $substr = substr $s2, $start, $length;
my $pos = index $s1, $substr;
if ($pos + 1) {
return ('-' x $pos) . complement "$substr" . ('-' x ($s2l - $length - $pos));
}
}
}
}
print match('CGTAAATCTATCTT',
'CATGCGTCTTTACG')
,"\n";
My problem is that using this I only get one result:
CGTAAATCTATCTT
GCATTT------A-
and my idea is to find the best complementary match between the two (OK, in this case this is the best one, but you have to imagine when I am dealing with sequences of hundreds and maybe thousands nucleotides), also i'm considering that the sequences will be of different length. I tried to use one approach similar to Smith and Waterman algorithm changing the matrix for complementary matrix: C-G/G-C amd T-A/A-T 1 and the rest 0
Thanks in advance
SW will not make the gaps needed in this case unless you can penalise them less. Perhaps increase the match values from 1 to something larger and reduce the gap open and extension penalties.