Question

BWA cigar not corresponding to reality

0

Entering edit mode

4.1 years ago

ADDNOTHIING ▴ 10

Hi everyone,

I am mapping reads from different size and I run into an issue concerning BWA and Bowtie2.

I have a reads of 93bp that have been map to the reference. I have been simulating this reads so I know they mapped in both cases to their correct position, the SNPs are mutations are incorporated.

So to start I mapped the read using BWA, the cigar string corresponding is the following:

8=1X32=1X27=1X14=1X3=1X4=

Here we can see that 5snps would be present in those reads.

Then I map the read using Bowtie2 and here is the cigar string:

8=1M32=1M27=1M14=1X3=1M4=

Here we can see that only 1 SNPs would be present.

However when I take a closer look at the alignment in IGV this is what I can see:

enter image description here

Both of then only have one SNP but for whatever reason BWA cigar does show me 5.. its the same behaviour for BWAmem. Does anyone have an idea of what happen ? I checked and there is no clipped nucleotides on this read either.

Thanks :)

ADDNOTHIING

alignment bwa bowtie2 cigar mapping • 1.1k views

ADD COMMENT • link updated 4.1 years ago by ATpoint 85k • written 4.1 years ago by ADDNOTHIING ▴ 10

1

Entering edit mode

Is there any use of IUPAC code somewhere? Reads, assembly?

ADD REPLY • link 4.1 years ago by Juke34 8.9k

0

Entering edit mode

In fact in this specific analysis yes, the reference is using IUPAC character. Does it impact the Cigar and it's not showing up as a difference in the alignment on IGV?

ADD REPLY • link 4.1 years ago by ADDNOTHIING ▴ 10

0

Entering edit mode

BWA and Bowtie2 might deal differently with IUPAC code

ADD REPLY • link 4.1 years ago by Juke34 8.9k

0

Entering edit mode

What were the commands to map the reads? Did you process the alignments somehow? If memory serves me right, BWA uses old-style CIGAR strings, with only M, D and I.

In addition, M means alignment match (no insertions or deletions), but the sequence may be either match or mismatch. The codes = and X represent unambiguous identities or differences between read and reference.

ADD REPLY • link 4.1 years ago by h.mon 35k