Question

bwa mem multiple alignments doesn't work

0

Entering edit mode

9.7 years ago

chongchu.cs ▴ 10

Hi,

I am using bwa mem to do alignments, and I want to get all the alignments in the output. So I use "-a" option of "bwa mem" to do this. But from the output, it seems not all the alignments are reported.

To validate, I run a simple test, I put one query A with length 600bp in a file "read.fq", and then I put three string A1, A2, B in the subject file "ref.fa", where A1 and A2 are exactly same as A, and B (length 2800bp) contained A (not exactly, with some mismatches). Then I run "bwa mem -a ref.fa read.fq" (ref.fa is indexed firstly), but only two alignments(A and A1 as primary, A and A2 as supplementary) are reported.

Any one can help to explain why the alignment with A and B is not reported? Does "bwa mem -a" only reports the higher scored alignments? Or I did some thing wrong? Thank you.

Best,
Chong

sequencing bwa alignment • 4.9k views

ADD COMMENT • link updated 2.4 years ago by Ram 44k • written 9.7 years ago by chongchu.cs ▴ 10

1

Entering edit mode

Would be good if you can provide the example. EDIT: actually no need. That is because the hit to B has been removed at the seeding phase. Try "-D.1".

ADD REPLY • link 9.7 years ago by lh3 33k

0

Entering edit mode

Hi @lh3,

Thank you for the reply. It works when using -D 0.1.

ADD REPLY • link updated 2.4 years ago by Ram 44k • written 9.7 years ago by chongchu.cs ▴ 10

0

Entering edit mode

Will it be that the number of mismatch cause the alignment score lower than the default threshold?

The default threshold is 30 (-T) and the mismatch score is defined using -A and -B options. So maybe because of the long length of the query, the alignment score of the A2 are lower than the threshold, which will causes the alignment to not be reported

ADD REPLY • link 9.7 years ago by Sam ★ 4.8k

Ram · Accepted Answer · 2015-04-08

Unfortunately, short-read aligners are in a speed race. So if an aligner can identify some site as strictly worse than the currently best-known site, the old site can be discarded, which saves a huge amount of wasted computation on considering sites that are known to be inferior to the most likely source of the read. This may involve probability, and thus may not be strictly deterministic.

To determine empirically how likely it is that a read came from some specific location requires exponential compute time, meaning that in practice, it is impossible. So, people rely on heuristics which are not completely accurate, but allow the computations to finish in finite time.

If you run some aligner and it does not report inferior location X as a possible alignment, don't be surprised. In the general case, it is impossible to report all possible alignments (in finite time). It is similarly impossible to report all alignments with scores above X.

No matter what aligner you run, it cannot not give you what you are asking for in general, though in special cases, it may. I suggest you ask a different question.