Extracting Matching Subsequence From Pairwise Alignment
0
0
Entering edit mode
11.6 years ago
Woa ★ 2.9k

Hello All,

I've a pairwise global alignemnet of two DNA sequences generated by the program NEEDLE of EMBOSS package. I wish to extract the sub-sequence that matches/aligns to a given region of the other sequence. In this alignment (Pastebin Link) the given region (actually the CDS) falls between base number 24:485 in the original sequence with ID 'XM_001005073.' I wish to extract the sub-sequence in the sequence ID 'Homolog' that aligns with that 24:485 region of the other sequence.

I'm using Bioperl to parse the alignment. I find out the the alignment column numbers corresponding to 24:485 region in the particular sequence, using 'column_from_residue_number'. Then I extract the sub-sequence from the 'aligned' other sequence(containing gaps) using the corresponding column numbers. Finally I remove the gap characters.

Am I doing this thing correctly and are there any pitfalls ? Is there any better way to do it by (Bio)Perl/Python? The code goes here:

use strict;
use warnings;
use Bio::AlignIO;

# read in an alignment generated by the EMBOSS program Needle
my $in = new Bio::AlignIO(-format => 'emboss',
                  -file   => 'test_needle.aln');

while( my $aln = $in->next_aln ) {
          #Seqnames: 'XM_001005073.'(CDS:24-485),'Homolog'
          my ($cds_start,$cds_end)=(24,485);#
          my $col_cdsstart = $aln->column_from_residue_number( 'XM_001005073.', $cds_start);
          my $col_cdsend= $aln->column_from_residue_number( 'XM_001005073.', $cds_end);

          foreach my $seq ($aln->each_seq) {
                    if($seq->id() eq 'Homolog'){
                        my $homolog_cds=$seq->subseq($col_cdsstart,$col_cdsend);
                         $homolog_cds=~s/\-//g;
                        print $homolog_cds,"\n";
                    }

        }
}

Thanks in advance

alignment bioperl • 3.4k views
ADD COMMENT
0
Entering edit mode

I do not get the question, could you provide an small input and output example ?

ADD REPLY
0
Entering edit mode

The input is the alignment itself,as given in the pastebin link. The region 24:485 in the Sequence XM_001005073 is highlighted in the Genbank record http://img708.imageshack.us/img708/2469/cds24485.png. I wish to extract the subsequence from the Sequence named "Homolog' that aligns to the 24:485 region.In the alignment that corresponds to base number 1:308 in the 'Homolog' sequence. The output would therefore be subsequence 1:308.

ADD REPLY
0
Entering edit mode

The typical output of gapped and gaps-removed sequence http://pastebin.com/pfPn23Ud I've to use the gaps-removed sequence only.

ADD REPLY

Login before adding your answer.

Traffic: 1858 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6