Hi All,
I wanted to write a script to remove certain columns which are in array from an alignment fasta file. I wrote a perl script using bioperl which is like this
#!/usr/bin/perl
use Bio::Seq;
use Bio::SeqIO;
use Bio::AlignIO;
use Bio::Align::AlignI;
use Bio::SimpleAlign;
my $str = Bio::AlignIO->new('-file' => 'aligned.fasta');
my $aln = $str->next_aln();
$out = $aln->remove_columns([4,5]);
my $aio = Bio::AlignIO->new(-fh=>\*STDOUT,-format=>"fasta");
$aio->write_aln($out);
Input data (example file)
>a_1
MEKFGMNFGGGPSKKDLL
>a_2
MEKFGMNFGGGPSKKDLL
>a_3
MEKFGMNFGGGPSKKDLL
But I am having problem that the remove _columns function only works for 2 columns only when I try to put say (e.g [4,5,6]) it doesn't work. So any one can tell me to come up with a solution may be in perl not using bioperl or something else.
Will you reformat your code so it can be read? Also can you provide a small piece of data to test?
formatting fixed.
looks like remove_columns function is used to remove certain types of alignments. From the API docs:
"Creates an aligment with columns removed corresponding to the specified criteria: 'match'|'weak'|'strong'|'mismatch'|'gaps'"
It doesn't remove column by position. There is a slice function to get subcolumns of the alignment. You can probably use that to come up with something.
The perl solutions below will work if you want to remove the same character by position from each sequence. But you should consider if you want to account for gaps that potentially could be inserted in the sequences after alignment.
this post was cited in