Why does inclusion of decoy sequences cause more BWA alignments to an autosome?
1
1
Entering edit mode
6.5 years ago

I am seeing strange behavior when comparing alignments generated by bwa mem 0.7.12 to two different reference genomes: hg19 and hs37d5 (basically hg19 plus additional decoy sequences). We have DNA-seq data. I noticed that when using hg19, there are very few alignments to the MHC region on chromosome 6, and even fewer of these have nonzero mapping quality. When using hs37d5, there are dramatically more alignments to the region and these have mostly high mapping quality scores. I have not observed this phenomenon anywhere else I've looked in the genome. The behavior is robust to multiple different choices of BWA parameters. Can anyone explain why the inclusion of the additional 35Mb of decoy sequences in hs37d5 would drastically improve the number and quality of alignments to this region of chr6?

BWA hs37d5 • 2.5k views
ADD COMMENT
0
Entering edit mode

I don;t know why the decoy is doing this but I just wonder is there something about chromosome 6?

I'd like to ask have you loaded the region in IGV? Are they all mapping to a very small region? Recently, I also found a very strange behaviour of reads from both ATAC-seq and ChiP-seq data mapping to a small area of chr6. The reads were also highly enriched for a very long motifs (20+ bases). I suspect these regions were missed by repeat masking because they only occured within short regions that bridged between very large repeat masked regions.

ADD REPLY
0
Entering edit mode

Thanks for this idea. I've looked at the alignments in IGV. They do map to several punctate peaks, leaving most of the region uncovered. We would expect coverage of the entire region to be fairly even. I'm guessing we are seeing alignments to regions that are easier to sequence, but still don't know why these alignments disappear when using hg19.

ADD REPLY
2
Entering edit mode
6.5 years ago

I figured this out. It's because I was using the UCSC version of hg19 that includes some additional assemblies of MHC region haplotypes. Reads were preferentially mapping to those extra contigs instead of chr6.

ADD COMMENT

Login before adding your answer.

Traffic: 1791 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6