Similarity Score Of Multiple Sequence Alignment
3
3
Entering edit mode
13.9 years ago
Ananth ▴ 70

Hello,

I have a file with protein sequences for which I would like to know the similarity score of the multiple sequence alignment.

I have aligned these sequences using ClustalW, but all I get is the pairwise identity score !

I am not looking for the pairwise identity or similarity score, but the similarity score of the multiple sequence alignment, so that I can conclude that "this group of sequences are x% similar with each other".

Is there any tool that gives a measure of similarity of the sequences ? Or any method for calculating this score ?

Please help !

Thank you, Ananth

multiple • 25k views
ADD COMMENT
1
Entering edit mode

the similarity score depends on the substitution matrix used. So you should never say "this group of sequences are x% similar with each other" but rather "this group of sequences are x% similar with each other given this specific substitution matrix". Moreover, check you are doing a global alignment and not a local one.

ADD REPLY
0
Entering edit mode

Thank you Giovanni,

As you correctly pointed out, yes for a specific substitution matrix in a global alignment is there a way to calculate this similarity score for a MSA ?!

ADD REPLY
0
Entering edit mode

how can run the MstatX.

I have try this command but not working

Mstat -m test.fa -s trident

could u please give me example for command

ADD REPLY
0
Entering edit mode

./mstatx -i test.fa -s trident -g

ADD REPLY
8
Entering edit mode
13.9 years ago
Bilouweb ★ 1.1k

I have made a tool to derive statistics from a multiple alignment. It gives a score for each column of the multiple alignment given a substitution matrix. Here is the link (github) : MstatX. (use the -s trident option)

Hope it can help. If you need any help, just ask.

EDIT : The question "How to measure the conservation (or similarity) in a multiple alignment is quite difficult as it is discussed in these questions : Conservation Score Of Amino Acid Positions In Human Proteins and Entropy From A Multiple Sequence Alignment With Gaps

A first measure can be calculated by the following algorithm (the famous sum of pairs):

Msa msa;
float total = 0.0;
for (c = 0; c < nb_column; ++c) {
  float sum = 0.0;
  for (r = 0; r < nb_row - 1; ++r){
    for (s = r + 1; s < nb_row; ++s){
      sum += similarity_score(msa[c][r],msa[c][s]);
    }
  }
  total += sum / (nb_row *(nb_row -1) / 2);
}
total /= nb_column;

Where the similarity_score is your scoring matrix.

ADD COMMENT
1
Entering edit mode

I added two links to relative questions.

ADD REPLY
1
Entering edit mode

Thanks bilouweb ! It was helpful. :)

ADD REPLY
1
Entering edit mode

Is it possible for MstatX to output a final MSA score?

ADD REPLY
1
Entering edit mode

Is it possible for MstatX to output a final MSA score? When I ran it, I could only find ways to output per-column scores. Thank you for the software package!

ADD REPLY
1
Entering edit mode

Thanks for using MstatX ! I can add a total score as a mean of all scores. I will also add a DNA matrix for multiple alignments of dna.

ADD REPLY
1
Entering edit mode

Thanks! I think it would be helpful to have a total score too, similar to the one that Clustal or MUSCLE would output.

ADD REPLY
0
Entering edit mode

what is the difference between wentropy and trident statistics?

ADD REPLY
6
Entering edit mode
13.9 years ago

I think the answer is "no". The reason is that I cannot think of a meaningful way to define the % identity of a multiple sequence alignment.

If one defines it as as the fraction of aligned positions that are identical across all sequences, the % identity would automatically be lower the more sequences you have in the alignment. It would thus not be comparable between different alignments.

ADD COMMENT
1
Entering edit mode
13.9 years ago

Depending on what you mean by 'measure of similarity'. PAM value if a protein alignment? Global %identity?

Look at Sean Eddy's tools. alistat, (build from the SQUID package) might meet your needs. It is also installed as part of the HMMER package.

ADD COMMENT
0
Entering edit mode

Thank you Alastair

As for a pairwise sequence alignment ClustalW indicates the sequence identity by a score which shows the percentage identity shared between the 2 sequences.

By the measure of similarity what I meant was, instead having a score that is for 2 sequences, can we have a score that gives an idea of similarity of the multiple sequence alignment ?

ADD REPLY

Login before adding your answer.

Traffic: 2210 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6