Multiple Sequence Alignment Score
2
4
Entering edit mode
13.7 years ago
Lee Katz ★ 3.2k

Does anyone have a script to determine the score of a multiple sequence alignment? Hopefully using BioPerl?

multiple bioperl scoring scripting • 9.0k views
ADD COMMENT
4
Entering edit mode
ADD REPLY
0
Entering edit mode

I almost thought that someone had asked this until I went into the question, but it looks like it hasn't come up yet. Pariwise Local And Global Alignment

ADD REPLY
4
Entering edit mode
13.7 years ago
Lee Katz ★ 3.2k

Per Alastair's comment, I tried MstatX, at https://github.com/gcollet/MstatX.

The program is easy to use and has several ways of calculating the score. However, it bothers me a little that it doesn't output a final score to the termal. It prints a score for each column into a file, which I was able to sum up. It also bothers me that it doesn't come packaged with a simple matrix for DNA and so it is only optimized for protein. I quickly made up a DNA matrix on the spot which may not be technically correct.

tar zxvf gcollet-MstatX-31481c6.tar.gz
cd gcollet-MstatX-31481c6
make
./mstatx -ma path/to/file -b -sp data/dna.mat
perl -e 'while(<>){$score+=$_;}print "$score\n";' < file.cons

The file I made (probably isn't the most correct thing I could have made)

H DNA matrix
D DNA matrix by Lee Katz
R LIT:1902106 PMID:1438297
A Henikoff, S. and Henikoff, J.G.
T Amino acid substitution matrices from protein blocks
J Proc. Natl. Acad. Sci. USA 89, 10915-10919 (1992)
* matrix in 1/3 Bit Units
M rows = ATCGN-, cols = ATCGN-
      2.
     -1.      2.
     -1.     -1.      2.
     -1.     -1.     -1.      2.
     -1.     -1.     -1.     -1.     -2.
     -2.     -2.     -2.     -2.     -2.     -2.
//
ADD COMMENT
1
Entering edit mode

I also tried alistat from the squid package, which does not give a score

ADD REPLY
1
Entering edit mode

MstatX now give a global score of an alignment = the sum of column scores divided by the number of columns.

ADD REPLY
0
Entering edit mode
12.6 years ago

muscle3.8 has a 'spscore' option which computes an SP objective score for a multiple sequence alignment. e.g. path/to/muscle -spscore file_name

e.g. to extract just the score into a variable (psuedocode):

Compute SP score with muscle (e.g. path/to/muscle -spscore file_name -log <log_file>)
Read log file
Iterate through each line of file
    If line contains string 'SP=' (perl e.g. /SP=/)
        match 'SP=' (perl e.g. =~ /SP=/)
        Print segment after match (perl e.g. print $')

NB: You could even extract the matching line with unix 'grep' Download muscle from: http://www.drive5.com/muscle/

"The Father. The Son. The Holy Spirit. And the South African National Bioinformatics Institute."

ADD COMMENT

Login before adding your answer.

Traffic: 2049 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6