Question

Pairwise vs. Multiple Sequence Alignments: Which has better accuracy?

1

Entering edit mode

10.3 years ago

weslfield ▴ 90

I am aligning many similar sequences from a BLAST result and looking for mutations at certain positions. My inclination is that an MSA (Clustal Omega) is the best approach but my PI is worried about misalignments and believes that Pairwise alignments against a reference sequence would be the best approach. Assuming that all the sequences to be aligned are homologs, which method would be more accurate and why? I need to convince her that I am right i.e. more information will produce better alignments. Thanks!

msa alignment sequence blast pairwise • 6.6k views

ADD COMMENT • link updated 3.0 years ago by Ram 44k • written 10.3 years ago by weslfield ▴ 90

0

Entering edit mode

It'd be helpful if you provided more information. Are you performing local realignment around indels in the blast results before calling variants (doing this should produce similarish results to using MSA on those regions)? How certain are you that the section of the reference that you're interested matches the sequences you're blasting? If this is data was derived from a PCR that you strongly believe is specific then MSA might work OK. If you have much in the way of off-target sequences, however, then you're going to run into problems.

ADD REPLY • link 10.3 years ago by Devon Ryan 105k

0

Entering edit mode

These are sequences extracted from a metagenomic sample targeting a gene of interest using BLASTp with a relatively high bit score and identity cut-off so all of the sequences to be aligned are very similar. I need to create an alignment to check for mutations based on their position in a reference sequence, so I either do a pairwise alignment of each sequence with the reference or do an MSA including the reference and then check the positions in the MSA using the reference sequence to identify the desired columns in the MSA to iterate over. This is all being done within a script because there are thousands of sequences to examine.

ADD REPLY • link updated 3.0 years ago by Ram 44k • written 10.3 years ago by weslfield ▴ 90

Ram · Answer 1 · 2014-10-06

The output of the two methods are radically different, we perform multiple sequence alignment when we are looking for conserved regions across all the sequences. MSA are not well suited characterize differences unless these also form conserved blocks.

What you most likely need are both methods. Compile differences versus a reference genome and produce MSA across all sequences.

Also note that the word homolog doesn't actually imply any threshold of similarity only a shared ancestry.

score 1 · Answer 2 · 2014-10-06

To identify mutant in your sequences, the pairwise alignment with reference genome is best approach. because;

The sequences that your are going to use for MSA will produce many more mismatches and can not be true mutant
If you align your sequences to genome, many more sequences will align to particular position. From that aligned data, you can easily find the variants with your sequences and reference genome. In this case, you can claim true variants/mutants as you will have more number sequences (high depth).
MSA is not good approach for finding the variants as it will not give good coverage for your dataset.