Here is a short script which uses the Bioperl module Bio::AlignIO. This script will read your alignment file (alignment_file.aln) and parse each column of the alignment at a time and then extracts the aligned nucleotide sequence from every alignment creating a tab separated table. The table is printed in an output file (table_ouput.txt).
The table will look like this (note: the number in the table indicates the column number):
1 A A A T
2 A A A T
3 A A A A
4 A A A A
5 A A A A
6 T T A A
Then you can write another script to parse the table and do any type of calculation you want to do !!!
Using Sequence Logo can also give you a rough idea of your over all sequence. Here is the sequence logo for your alignment above:
Here is the script to parse the alignment:
#!usr/bin/perl/ -w
use strict;
use warnings;
use Bio::AlignIO;
use Bio::LocatableSeq;
my $align_file = 'alignment_file.aln';
my $out_file = 'table_ouput.txt';
my $str = Bio::AlignIO->new('-file' => $align_file);
my $aln = $str->next_aln();
my $seq1 = $aln->get_seq_by_pos(1);
my $seq2 = $aln->get_seq_by_pos(2);
my $seq3 = $aln->get_seq_by_pos(3);
my $seq4 = $aln->get_seq_by_pos(4);
open(OUTPUT, ">$out_file") or die "error";
for (my $col = 1; $col<=$aln->length; $col++)
{
my $char_seq1 = $seq1->subseq($col, $col);
my $char_seq2 = $seq2->subseq($col, $col);
my $char_seq3 = $seq3->subseq($col, $col);
my $char_seq4 = $seq4->subseq($col, $col);
print OUTPUT $column, "\t";
print OUTPUT $char_seq1, "\t";
print OUTPUT $char_seq2, "\t";
print OUTPUT $char_seq3, "\t";
print OUTPUT $char_seq4, "\n";
}
close OUTPUT;
exit;
Thanks Raunak, I can always transpose the output file to get the results as intended (rows to columns) using Unix. I looked into sequence logo as Niek de Klein mentioned, and I liked the visualization.