Matching Complementary Regions Of 2 Dna Sequences
2
0
Entering edit mode
13.0 years ago
Tonig ▴ 440

Hi , all I ask in other forum how to align two sequences in PERL, basically what I need is to find the complementary matching region between one sequence and the reverse complentary of the other one

Here is the code that I'm using

#!/usr/bin/perl
use warnings;
use strict;

sub complement {
    $_[0] =~ y/CGAT/GCTA/;
    return $_[0];
}

sub match {
    my ($s1, $s2) = @_;
    $s2 = reverse $s2;
    complement $s2;
    print "$s1\n";
    my $s2l = length $s2;
    for (my $length = $s2l; $length; $length--) { # start from the longest possible substring
            for my $start (0 .. $s2l - $length) {     # starting position of the matching substring
            my $substr = substr $s2, $start, $length;
            my $pos = index $s1, $substr;
            if ($pos + 1) {
                return ('-' x $pos) . complement "$substr" . ('-' x ($s2l - $length - $pos));
            }
        }
    }
}

print match('CGTAAATCTATCTT',
            'CATGCGTCTTTACG')
    ,"\n";

My problem is that using this I only get one result:

CGTAAATCTATCTT
GCATTT------A-

and my idea is to find the best complementary match between the two (OK, in this case this is the best one, but you have to imagine when I am dealing with sequences of hundreds and maybe thousands nucleotides), also i'm considering that the sequences will be of different length. I tried to use one approach similar to Smith and Waterman algorithm changing the matrix for complementary matrix: C-G/G-C amd T-A/A-T 1 and the rest 0

Thanks in advance

perl alignment sequence • 5.9k views
ADD COMMENT
0
Entering edit mode

SW will not make the gaps needed in this case unless you can penalise them less. Perhaps increase the match values from 1 to something larger and reduce the gap open and extension penalties.

ADD REPLY
4
Entering edit mode
13.0 years ago
Gustavo ▴ 530

You might find it computationally more efficient to use an external tool for doing the sequence comparison. For example, you could write your sequences out to separate files (if they're not already in files to start with!) and use FASTA to do the comparison.

FASTA has a variety of parameters that will make the job easy. In your case, you can specify that you want only comparisons to the reverse strand of the other sequence (-i parameter). You of course have full control of all alignment parameters like matrix, gap penalties, significance cutoffs. You can also specify which format you want to get, and it has some output formats that are extremely easy to parse.

ADD COMMENT
0
Entering edit mode

Thanks Gustavo, I'll try it I finally did it using BLAT, but it will be nice trying to compare with FASTA

ADD REPLY
0
Entering edit mode
13.0 years ago

Using Smith and Waterman and changing the matrix sounds reasonable.

Is this for homework? If it's not, why not just convert one of the two sequences into the reverse complement and use one of the many available aligners. So in your example where you are aligning:

CGTAAATCTATCTT
CATGCGTCTTTACG

Change the second sequence to its reverse complement:

CGTAAATCTATCTT
CGTAAAGACGCATG

And use [?]clustal[?] to align the sequence. You can then transform it back to the original sequence after you get a satisfactory alignment.

ADD COMMENT
0
Entering edit mode

no, it is not for my homework, i need to find complmentary regions between sequences coming from an experiment. I don't need to align the sequences, only find complementary regions between them, not to find alignments

ADD REPLY

Login before adding your answer.

Traffic: 1758 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6