Pairwise vs. Multiple Sequence Alignments: Which has better accuracy?
2
1
Entering edit mode
10.1 years ago
weslfield ▴ 90

I am aligning many similar sequences from a BLAST result and looking for mutations at certain positions. My inclination is that an MSA (Clustal Omega) is the best approach but my PI is worried about misalignments and believes that Pairwise alignments against a reference sequence would be the best approach. Assuming that all the sequences to be aligned are homologs, which method would be more accurate and why? I need to convince her that I am right i.e. more information will produce better alignments. Thanks!

msa alignment sequence blast pairwise • 6.4k views
ADD COMMENT
0
Entering edit mode

It'd be helpful if you provided more information. Are you performing local realignment around indels in the blast results before calling variants (doing this should produce similarish results to using MSA on those regions)? How certain are you that the section of the reference that you're interested matches the sequences you're blasting? If this is data was derived from a PCR that you strongly believe is specific then MSA might work OK. If you have much in the way of off-target sequences, however, then you're going to run into problems.

ADD REPLY
0
Entering edit mode

These are sequences extracted from a metagenomic sample targeting a gene of interest using BLASTp with a relatively high bit score and identity cut-off so all of the sequences to be aligned are very similar. I need to create an alignment to check for mutations based on their position in a reference sequence, so I either do a pairwise alignment of each sequence with the reference or do an MSA including the reference and then check the positions in the MSA using the reference sequence to identify the desired columns in the MSA to iterate over. This is all being done within a script because there are thousands of sequences to examine.

ADD REPLY
3
Entering edit mode
10.1 years ago

The output of the two methods are radically different, we perform multiple sequence alignment when we are looking for conserved regions across all the sequences. MSA are not well suited characterize differences unless these also form conserved blocks.

What you most likely need are both methods. Compile differences versus a reference genome and produce MSA across all sequences.

Also note that the word homolog doesn't actually imply any threshold of similarity only a shared ancestry.

ADD COMMENT
1
Entering edit mode
10.1 years ago
Renesh ★ 2.2k

To identify mutant in your sequences, the pairwise alignment with reference genome is best approach. because;

  1. The sequences that your are going to use for MSA will produce many more mismatches and can not be true mutant
  2. If you align your sequences to genome, many more sequences will align to particular position. From that aligned data, you can easily find the variants with your sequences and reference genome. In this case, you can claim true variants/mutants as you will have more number sequences (high depth).
  3. MSA is not good approach for finding the variants as it will not give good coverage for your dataset.
ADD COMMENT

Login before adding your answer.

Traffic: 1792 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6