Bwa Indels
1
2
Entering edit mode
11.9 years ago
j_waterfall ▴ 20

Comparing bwa 0.6.2 to bwa 0.5.9 I'm seeing very different behavior around indels. I'm using default options in both cases and as far as I can tell they haven't changed. I'm aligning Illumina paired end 2x80bp reads. For dbSNP indels of just a few nucleotides bwa 0.5.9 aligns them great - perfect matches for entire read length (plus the indel, except when the indel is near the end of the read) but 0.6.2 soft clips a large fraction of the read and then has a very high number of mismatches plus multiple indels in the remaining portion. Any suggestions?

bwa indel • 5.0k views
ADD COMMENT
3
Entering edit mode
11.9 years ago
Erik Garrison ★ 2.4k

I noticed the same issue. It seems that BWA is more aggressively soft-clipping reads around indels in recent releases. This is fine, provided you can realign the problematic reads.

To resolve issues with indel alignment in a relatively uniform and aligner-agnostic way, I developed a repeat and entropy-aware local realignment system, ogap. In our detection pipeline we use it like this:

bamtools merge -in file1.bam -in file2.bam -in file3.bam \
    | ogap -z -R 25 -C 20 -Q 20 -S 0 -f $reference \
    | bamleftalign -f $reference \
    | samtools calmd -EAru - $reference 2>/dev/null \
    | freebayes -f $reference >results.vcf

In this setting, it picks up reads which have soft clipped or mismatched bases summing to more than Q20, and realigns using an extended Smith-Waterman-Gotoh pairwise alignment algorithm that adjusts gap opening and extension penalties on the basis of local sequence context. To do this, it uses Shannon entropy and perfect repeats up to 12bp. This may be too sensitive for most use, so adjusting -C and -Q higher may improve performance.

ADD COMMENT
0
Entering edit mode

Thanks, I'll check out ogap. I'm pretty sure the GATK realigner ignores soft-clipped bases so it's not very useful at these positions.

ADD REPLY
0
Entering edit mode

Eric, is it possible to force ogap to realign only soft-clipped reads (all of them if possible) but nothing else.

ADD REPLY

Login before adding your answer.

Traffic: 1612 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6