Hi, I'm using bowtie2 to align reads to the genome, when I use the chr1 ~chr12 to build the index and then do mapping, the pair reads 'V100002715L1C001R026000072' is assigned to chr6 with good quality(mapping score 33). But if I add the chrC and chrM and build the new index, then do mapping, 'V100002715L1C001R026000072' is assigned to chrM with a lower mapping score 1. The command line I use is
bowtie2 -x bowtie2_index -1 read_1.fa.gz -2 read_2.fa.gz --very-sensitive-local -p 10 -S result.sam
The detail information of the 'V100002715L1C001R026000072' read pairs are(for index has chrM and chrC)
V100002715L1C001R026000072 83 ChrM 193580 1 50M = 193461 -169 TTGTTTTTCTTGTTCTTCTTTCTCGAAGAGATGGGTGCACCGCCTTGGAG 7G@F4>FFG=FBFBFG*FBEF5AF?FFEEE?CD1D@CED=GFFGFG<FFF AS:i:100 XS:i:100 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:5YS:i:100 YT:Z:CP
V100002715L1C001R026000072 163 ChrM 193461 1 50M = 193580 169 GGACAATGGTTTTCTAGGTTGTTTCACCAATCTGTTGAATTGGAATGGAG D<AFEEBEFF8C>FFF/BCF>?FFFGFGCEFCFFFF6FE9FGF>F>FF>9 AS:i:100 XS:i:100 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:5YS:i:100 YT:Z:CP
(for index just chr1~chr12)
V100002715L1C001R026000072 83 Chr06 8173354 33 50M = 8173235 -169 TTGTTTTTCTTGTTCTTCTTTCTCGAAGAGATGGGTGCACCGCCTTGGAG 7G@F4>FFG=FBFBFG*FBEF5AF?FFEEE?CD1D@CED=GFFGFG<FFF AS:i:100 XS:i:100 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:5YS:i:100 YT:Z:CP
V100002715L1C001R026000072 163 Chr06 8173235 33 50M = 8173354 169 GGACAATGGTTTTCTAGGTTGTTTCACCAATCTGTTGAATTGGAATGGAG D<AFEEBEFF8C>FFF/BCF>?FFFGFGCEFCFFFF6FE9FGF>F>FF>9 AS:i:100 XS:i:70 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:50 YS:i:100 YT:Z:CP
Could someone please tell me why the genome with chrM and chrC get lower mapping score, do the reads really map to chrM? Thank you very much!
Aifu.
These are probably mitochondrial homologs. They map to both the mitochondrial genome as well as other sequences in the genome with equal identity (= so no mismatches, identical sequence). This is a good example on why it makes sense to always include all chromosomes to the alignment reference. These reads are called multimappers, being assigned a mapping quality of 1 or 0 (not exactly sure how bowtie2 does it, bwa uses 0 for them AFAIK). Anyway, you cannot determine what the true origin of these reads in the genome is. Therefore they should be considered carefully when it comes to quantification of reads over a certain region. Typically they are categorically removed. What do you plan to do with these data?
Thank you ATpoint, I would like to keep this multi-mapped reads as is recommanded in this post: Bowtie 2 - is there a way to discard reads mapping to multiple locations?.
I add the option -a to display all the alignment.
For index with chrM and chrC,
For index without chrM and chrC,
So, for the index with chrM and chrC, bowtie2 actually find 3 positions, chr6 is the primary position(infered from the flag), but all the mapping quality is 1, why not 33? And, why chrM is choosed?