Question

Bwa-mem alignment on duplicated regions

1

Entering edit mode

9.2 years ago

annecarol7 ▴ 10

I have aligned ChiP-seq data using BWA-mem. The reference genome has a duplicated region on 2 different chromosomes. When I count the number of reads aligned to each of the regions I get a different number. If BWA-mem align the reads randomly on duplicated regions wasn't I supposed to get the same number of reads for each duplicated region?

bwa mem -B 40 -O 60 -E 10 -L 50 -M -R '@RG\tID:WTCHG_261086_294\tSM:WTCHG_261086_294' -t 10 /home/plxacb/Fixed_fasta/Ade_fasta/Cen8Ade6.fa WTCHG_261086_294.unmapped_ecoli_1.fastq WTCHG_261086_294.unmapped_ecoli_2.fastq | samtools view -bS - > WTCHG_261086_294.strict_bwamem.bam

samtools view WTCHG_261086_294.strict_bwamem.sorted.bam Not76_Chr2_ref:125168-133320 | wc -l

222754

samtools view WTCHG_261086_294.strict_bwamem.sorted.bam Not76_Chr4_ref:1619693-1627846 | wc -l

271696

ChIP-Seq alignment • 3.3k views

ADD COMMENT • link updated 9.2 years ago by harold.smith.tarheel ★ 5.0k • written 9.2 years ago by annecarol7 ▴ 10

0

Entering edit mode

To be pedantic, your Chr2 region is 1bp shorter than the Chr4 region (8152 vs 8153), it shouldn't make much difference but still who knows...

ADD REPLY • link 9.2 years ago by dariober 15k

0

Entering edit mode

When I remove this extra 1bp from chr4 the result is 271688, only 8 reads less.

ADD REPLY • link 9.2 years ago by annecarol7 ▴ 10

score 2 · Answer 1 · 2016-06-07

2

Entering edit mode

9.2 years ago

harold.smith.tarheel ★ 5.0k

It looks like you're using paired-end reads. BWA MEM disambiguates multi-mapping reads if the mate is uniquely aligned. Your results suggest that there are more mates mapped to chr4 than chr2, which would explain the discrepancy.

ADD COMMENT • link 9.2 years ago by harold.smith.tarheel ★ 5.0k

0

Entering edit mode

It makes sense but the regions are 8kb long and flanked by the same sequences where my precipitated protein doesn't bind so I would not expect uniquely alignment for any of the sequences.

ADD REPLY • link 9.2 years ago by annecarol7 ▴ 10