Hi everyone,
I am mapping reads from different size and I run into an issue concerning BWA and Bowtie2.
I have a reads of 93bp that have been map to the reference. I have been simulating this reads so I know they mapped in both cases to their correct position, the SNPs are mutations are incorporated.
So to start I mapped the read using BWA, the cigar string corresponding is the following:
8=1X32=1X27=1X14=1X3=1X4=
Here we can see that 5snps would be present in those reads.
Then I map the read using Bowtie2 and here is the cigar string:
8=1M32=1M27=1M14=1X3=1M4=
Here we can see that only 1 SNPs would be present.
However when I take a closer look at the alignment in IGV this is what I can see:
Both of then only have one SNP but for whatever reason BWA cigar does show me 5.. its the same behaviour for BWAmem. Does anyone have an idea of what happen ? I checked and there is no clipped nucleotides on this read either.
Thanks :)
ADDNOTHIING
Is there any use of IUPAC code somewhere? Reads, assembly?
In fact in this specific analysis yes, the reference is using IUPAC character. Does it impact the Cigar and it's not showing up as a difference in the alignment on IGV?
BWA and Bowtie2 might deal differently with IUPAC code
What were the commands to map the reads? Did you process the alignments somehow? If memory serves me right, BWA uses old-style CIGAR strings, with only M, D and I.
In addition, M means alignment match (no insertions or deletions), but the sequence may be either match or mismatch. The codes = and X represent unambiguous identities or differences between read and reference.