Question

Local Alignment Statistical Significance

1

Entering edit mode

11.0 years ago

Maria ▴ 170

I want to align 2 sequences locally (Smith Waterman algorithm) ==> the output will be several alignments some of them unlike others is significant. What I want to do is to test the significance by a randomization/permutation test. And below is my approach:

Align S1 to S2
==> obtain for example 10 alignments (Ai) longer then some threshold
for each Ai test if significant :
- Do a permutation of S1 and S2 and align those permutations
- We obtain a score for each alignment
- Calculate the probability of those alignments that have a score > the initial score obtained by the initial alignment Question : how to continue ? how to decide if the alignment is significant? And are these probabilities obtained considered as p-values ? (my statistical knowledge is humble) Thanks in advance

test statistics • 3.4k views

ADD COMMENT • link updated 4.4 years ago by Biostar 20 • written 11.0 years ago by Maria ▴ 170

score 2 · Answer 1 · 2013-12-03

I am not clear what exactly you are trying to ask. Here are some random ideas they may or may not make sense to you:

1) SW uses dynamic programming so it will give you the best alignment or the alignment with the highest alignment score. So there is no need to do the permutation testing to check if the best alignment is the best one.

2) Sometimes there can be more than one best alignments as the alignment score depends on the match, mismatch and the gap penalty.

3) Shuffling or Randomisation of the sequence would not help to test for the significance of the alignment as you are messing up the order of the nucleotides in a sequence. You should have a constraint that preserves the order of the nucleotides. So the best thing you can do is that instead of randomizing the sequences , you can slide the two sequences in an alignment against each other (preserving the order of nucleotides in each sequence) and calculate alignment scores.

4) P-value can be generated as : (number of alignment scores from step 3 > alignment score given by Smith Waterman from step 1) / Total number of alignments compared.

P-value using step 4 for the best alignment generated by Smith-Waterman should always be zero.