Question

BWA aln returns a smaller number of hits when specifying a higher number of mismatches

2

Entering edit mode

2.9 years ago

fortin946 ▴ 190

I've come across an peculiarity when aligning a 19bp sequence (AGCATGGGGAGCTCCCGGG) to the human reference genome (hg38) using bwa aln.

When aligning with 6 mismatches allowed (-n 6), I obtained a smaller number of hits compared to an alignment with 5 mismatches allowed (-n 5).

I used the following configurations for the alignments:

bwa aln -n 6 -k 0 -l 100 -o 0 -N hg38Index toyExample.fastq > out_n5.sai
bwa aln -n 6 -k 0 -l 100 -o 0 -N hg38Index toyExample.fastq > out_n6.sai

where toyExample.fastq is a dummy fastq file containing my sequence:

@AGCATGGGGAGCTCCCGGG
AGCATGGGGAGCTCCCGGG
+AGCATGGGGAGCTCCCGGG
~~~~~~~~~~~~~~~~~~~

I set the seed length to a large number to ignore seed-specific constraints. Anyone has any idea what's going on? I would expect all the hits stored in out_n5.sai to be a subset of the hits stored in out_n6.sai, but that's not the case. This phenomenon does not happen for smaller number of mismatches (that is all hits found for -n 4 mismatches are also found with -n 5 mismatches, for instance).

bwa-aln bwa • 680 views

ADD COMMENT • link updated 2.9 years ago by ATpoint 88k • written 2.9 years ago by fortin946 ▴ 190

0

Entering edit mode

Wildly guessing, but could 6 mismatches result in more multimappers, rendering the sequence as unmapped. Please define how you determined that a sequence was mapped (or not), or rather what a "hit" is?

ADD REPLY • link 2.9 years ago by ATpoint 88k