BWA aln returns a smaller number of hits when specifying a higher number of mismatches
0
2
Entering edit mode
2.4 years ago
fortin946 ▴ 190

I've come across an peculiarity when aligning a 19bp sequence (AGCATGGGGAGCTCCCGGG) to the human reference genome (hg38) using bwa aln.

When aligning with 6 mismatches allowed (-n 6), I obtained a smaller number of hits compared to an alignment with 5 mismatches allowed (-n 5).

I used the following configurations for the alignments:

bwa aln -n 6 -k 0 -l 100 -o 0 -N hg38Index toyExample.fastq > out_n5.sai
bwa aln -n 6 -k 0 -l 100 -o 0 -N hg38Index toyExample.fastq > out_n6.sai

where toyExample.fastq is a dummy fastq file containing my sequence:

@AGCATGGGGAGCTCCCGGG
AGCATGGGGAGCTCCCGGG
+AGCATGGGGAGCTCCCGGG
~~~~~~~~~~~~~~~~~~~

I set the seed length to a large number to ignore seed-specific constraints. Anyone has any idea what's going on? I would expect all the hits stored in out_n5.sai to be a subset of the hits stored in out_n6.sai, but that's not the case. This phenomenon does not happen for smaller number of mismatches (that is all hits found for -n 4 mismatches are also found with -n 5 mismatches, for instance).

bwa-aln bwa • 580 views
ADD COMMENT
0
Entering edit mode

Wildly guessing, but could 6 mismatches result in more multimappers, rendering the sequence as unmapped. Please define how you determined that a sequence was mapped (or not), or rather what a "hit" is?

ADD REPLY

Login before adding your answer.

Traffic: 1741 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6