Hello,
I got a very weird output from BWA-mem2 - most of the reads have mapping quality of 0, even though there is no secondary alignment or anything else suspicious.
I got sequencing data that was aligned with Novoalign to hg18, the data was bam files. I needed to realign the data to hg19, so I extracted two fastq files using "bedtools bamtofastq" and run a standard alignment using bwa mem2. The commands I used were:
samtools sort -n -12 -o in_file.bam.sortedByName in_file.bam
bedtools bamtofastq -i in_file.bam.sortedByName -fq in_reads.r1.fq -fq2 in_reads.r2.fq
~/tools/bwa2/bwa-mem2-2.0pre2_x64-linux/bwa-mem2 mem -t 12 hg19.fa in_reads.r1.fq in_reads.r2.fq |\
samtools sort -12 -o out_file.bam -
When inspecting the original bam file there was hardly any reads with 0 mapping quality, but in the new bam file from bwa mem2 (using the same reads as in the original bam file) most reads had 0 mapping quality for an unknown reason.
This greatly impacts my next analysis steps which is variant calling since it skips all reads with 0 mapping quality.
An output from the new bam file is attached below.
Any help/suggestions would be appreciated.
HWI-ST218:512:C7GP4ANXX:1:2115:10195:61746#GCGGAC 145 chr1 17030 0 5S95M = 16819 -306 NNNNNGGCACATAGAAGTAGTTCTCTGGGACCTGCAAGATTAGGCAGGGACATGTGAGAGGTGACAGGGACCTGCAGGGGCAGCCAACAAGACCTTGTGT F</F/FFFFBFB<FFFFFFFFFFBBFFB<BFFFFFFFFFBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFF<FFFFFFFFFFFFFFFFFFFBBBBBB NM:i:0 MD:Z:95 AS:i:95 XS:i:95
HWI-ST218:512:C7GP4ANXX:1:2313:14727:77282#GGAACT 147 chr1 17031 0 8S92M = 16971 -152 NNNNNNNCGCACATAGAAGTAGTTCTCTGGGACCTGCAAGATTAGGCAGGGACATATGAGAGGTGACAGGGACCTGCAGGGGCAGCCAACAAGACGTTGT F<7B/7<//BFFF</B/FF<BF<<FFFFFFFFFFFFFFFFFFFFFF<FFFB<<FB/FFFFFFFFFFFFFFFFFFFFF/FFFFFBB</FFFFFFBFBBBB/ NM:i:2 MD:Z:47G39C4 AS:i:82 XS:i:82
HWI-ST218:512:C7GP4ANXX:1:2303:3877:27572#GCGGAC 163 chr1 17038 0 100M = 17094 156 GAAGTAGTTCTCTGGGACCTGCAAGATTAGGCAGGGACATGTGAGAGGTGACAGGGACCTGCAGGGGCAGCCAACAAGACCTTGTGTGCACCTCCCATGG BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NM:i:0 MD:Z:100 AS:i:100 XS:i:100
HWI-ST218:512:C7GP4ANXX:1:1307:10764:27549#TGACAT 163 chr1 17067 0 100M = 17146 179 GGCAGGGACATGTGAGAGGTGACAGGGACCTGCAGGGGCAGCCAACAAGACCTTGTGTGCACCTCCCATGGGTGGAATAAGGGGCCCAACAGCCTTGACT BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NM:i:0 MD:Z:100 AS:i:100 XS:i:100
HWI-ST218:512:C7GP4ANXX:1:1213:19344:15127#CGTACG 147 chr1 17074 0 100M = 17010 -164 ACATGTGAGAGGTGACAGGGACCTGCAGGGGCAGCCAACAAGACCTTGTGTGCACCTCCCATGGGTGGAATAAGGGGCCCAACAGCCTTGACTGGAGAGG FFFFBFFFFFFBFFFFFFFBFFFFFFFBFFFFFFFFFFFF<FFBFF<FFFFFFFFBFFFFFFFFFFBBFFFFFFFFFFFFFFFFFFFFFFFFFFFBBBBB NM:i:0 MD:Z:100 AS:i:100 XS:i:100
HWI-ST218:512:C7GP4ANXX:1:2308:10967:94378#TTGACT 83 chr1 17074 0 100M = 16996 -178 ACATGTGAGAGGTGACAGGGACCTGCAGGGGCAGCCAACAAGACCTTGTGTGCACCTCCCATGGGTGGAATAAGGGGCCCAACAGCCTTGACTGGAGAGG FFFFF<F<///FFF<<FBBB<FFBFFFFFFFFFFFFFBFFFF<FFFBFFFBF<FFB<BFFFFFFFFFFBF<FBFFFFFFFFFFFFFFFBFFFFFFBBBBB NM:i:0 MD:Z:100 AS:i:100 XS:i:100
HWI-ST218:512:C7GP4ANXX:1:2303:3877:27572#GCGGAC 83 chr1 17094 0 100M = 17038 -156 NNNNGCAGGGGCAGCCAACAAGACCTTGTGTGCACCTCCCATGGGTGGAATAAGGGGCCCAACAGCCTTGACTGGAGAGGAGCTCTGGCAAGGCCCTGGG BFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBBBBB NM:i:4 MD:Z:0A0C0C0T96 AS:i:96 XS:i:96
HWI-ST218:512:C7GP4ANXX:1:1307:10764:27549#TGACAT 83 chr1 17146 0 100M = 17067 -179 AGGGGCCCAACAGCCTTGACTGGAGAGGAGCTCTGGCAAGGCCCTGGGCCACTGCACCTGTCTCCACCTCTGTCCCACCCCTCCCACCTGCTGTTCCAGC FFBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBBBBB NM:i:0 MD:Z:100 AS:i:100 XS:i:100
HWI-ST218:512:C7GP4ANXX:1:2307:19743:14016#CTCTAC 163 chr1 17150 0 100M = 17267 217 GCCCAACAGCCTTGACTGGAGAGGAGCTCTGGCAAGGCCCTGGGCCACTGCACCTGTCTCCACCTCTGTCCCACCCCTCCCACCTGCTGTTCCAGCTGCT BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NM:i:0 MD:Z:100 AS:i:100 XS:i:100
HWI-ST218:512:C7GP4ANXX:1:1101:13442:62440#GGCCAC 99 chr1 17159 0 100M = 17217 158 CCTTGACTGGAGACGAGCTCTGGCAAGGCCCTGGGCCACTGCACCTGTCTCCACCTCTGTCCCACCCCTCCCACCTGCTGTTCCAGCTGCTCTCTCTTGC BBBBBFFFFFFFBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFBFFFFFFF<FFFBFFFFFFF NM:i:1 MD:Z:13G86 AS:i:95 XS:i:95
HWI-ST218:512:C7GP4ANXX:1:2301:14412:20516#GCGGAC 99 chr1 17209 0 100M = 17283 174 CCACCTCTGTCCCACCCCTCCCACCTGCTGTTCCAGCTGCTCTCTCTTGCTGATGGACAAGGGGGCATCAAACAGCTTCTCCTCTGTCTCTGCCCCCAGC BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFB/FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFF NM:i:0 MD:Z:100 AS:i:100 XS:i:100
HWI-ST218:512:C7GP4ANXX:1:2303:18559:70135#TGACAT 99 chr1 17209 0 100M = 17275 166 CCACCTCTGTCCCACCCCTCCCACCTGCTGTTCCAGCTGCTCTCTCTTGCTGATGGACAAGGGGGCATCAAACAGCTTCTCCTCTGTCTCTGCCCCCAGC BBBBBFFFBFFFFFFFFFFFFFFFFFFFFFFFFFFFFF/FFF<FFFFF/FFBFFFF/BBFB//BB<BFFFFFFFFFFFFFFFFFF/FFBFFFFFFBFBFF NM:i:0 MD:Z:100 AS:i:100 XS:i:100
HWI-ST218:512:C7GP4ANXX:1:1304:5501:35478#TTGACT 81 chr1 17311 0 100M chr9 17178 0 CACATGGGTCTTTGTTACAGCACCAGCCAGGGGGTCCAGGAAGACATACTTCTTCTACCTACAGAGGCGACATGGGGGTCAGGCAAGCTGACACCCGCTG #BFFF/BFFBF7<7BBFFFFFBF<FB<FFB<//BFFFFFB</FFFFFFFFFFFBBFF<FFFF<FFFFFFFFFFFFFFFFFFFFFFFFFBFBFFFB/<<B< NM:i:0 MD:Z:100 AS:i:100 XS:i:100
HWI-ST218:512:C7GP4ANXX:1:1214:20018:83329#CCACTC 163 chr1 17321 40 100M = 17412 184 TTTGTTACAGCACCAGCCAGGGGGTCCAGGAAGACATACTTCTTCTACCTACAGAGGCGACATGGGGGTCAGGCAAGCTGACACCCGCTGTCCTGAGCCC BBBBBFFFFFFFFFFFFFFFFFFFFFF/FF<FFFFFFFFFFFFFFFFFFFFFFFFF<FF<FFF<FBFFFFFFFFBFFB/FBFBBF############### NM:i:0 MD:Z:100 AS:i:100 XS:i:100
HWI-ST218:512:C7GP4ANXX:1:2108:2814:58060#CCACTC 163 chr1 17321 40 100M = 17412 184 TTTGTTACAGCACCAGCCAGGGGGTCCAGGAAGACATACTTCTTCTACCTACAGAGGCGACATGGGGGTCAGGCAAGCTGACACCCGCTGTCCTGAGCCC BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NM:i:0 MD:Z:100 AS:i:100 XS:i:100
HWI-ST218:512:C7GP4ANXX:1:1107:6789:55675#CGTACG 163 chr1 17346 40 100M = 17402 156 CCAGGAAGACATACTTCTTCTACCTACAGAGGCGACATGGGGGTCAGGCAAGCTGACACCCGCTGTCCTGAGCCCATGTTCCTCTCCCACATCATCAGGG BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NM:i:0 MD:Z:100 AS:i:100 XS:i:100
HWI-ST218:512:C7GP4ANXX:1:2102:6821:92451#CTCTAC 129 chr1 17370 0 100M chr15 102513436 0 TACAGAGGCAACATGGGGGTCAGGCAAGCTGACACCCGCTGTCCTGAGCCCATGTTCCTCTCCCACATCATCAGGGGCACAGCGTGCACTGTGGGGTCCC BBBBBFFFFBBFBF<FBFFBFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFFB<BFFFFFFBFFFFFFFFFFFFFFB<B<FFFFFBFFFFFBFBFBBFBFF NM:i:1 MD:Z:9G90 AS:i:95 XS:i:95
HWI-ST218:512:C7GP4ANXX:1:1107:6789:55675#CGTACG 83 chr1 17402 45 100M = 17346 -156 CACCCGCTGTCCTGAGCCCATGTTCCTCTCCCACATCATCAGGGGCACAGCGTGCACTGTGGGGTCCCAGGCCTCCCGAGCCGAGCCACCCGTCACCCCC <7<B7/BFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFBFBBBBB NM:i:0 MD:Z:100 AS:i:100 XS:i:92
HWI-ST218:512:C7GP4ANXX:1:1202:2230:41463#CGAAAC 81 chr1 17412 21 7S93M chr9 17270 0 NNNNNNNCCTGAGCCCATGTTCCTCTCCCACATCATCAGGGGCACAGCGTGCACTGTGGGGTCCCAGGCCTCCCGAGCCGAGCCACCCGTCACCCCCTGG FFFFFF<FFFFBFFFFFBFFBFF<F/FFFBB<//7<<BB/FBFFFFFFBFFFFFFFBFFFFFFFBBFBFFBFFFFBFFBFFFFFFFFFFFFFFFFBBBBB NM:i:0 MD:Z:93 AS:i:93 XS:i:85 XA:Z:chr15,+102513657,12M2D81M7S,2;chrX,+155252226,12M2D81M7S,2;chrY,+59355232,12M2D81M7S,2;
HWI-ST218:512:C7GP4ANXX:1:1214:20018:83329#CCACTC 83 chr1 17412 45 7S93M = 17321 -184 NNNNNNNCCTGAGCCCATGTTCCTCTCCCACATCATCCGGGGCACAGCGTGCACTGTGGGGTCCCAGGCCTCCCGAGCCGAGCCACCCGTCACCCCCTGG FFFFFBBFFBFFFFFBFFFFFFFFFFBFFFFFFBB/</FFFFFFFFFFFFFFFBFFFFFF<FFFFFFFFFF<FBFFFFBF/BFFFFFFFFFFFFFBBBB< NM:i:1 MD:Z:30A62 AS:i:88 XS:i:80
HWI-ST218:512:C7GP4ANXX:1:2108:2814:58060#CCACTC 83 chr1 17412 45 7S93M = 17321 -184 NNNNNNNCCTGAGCCCATGTTCCTCTCCCACATCATCAGGGGCACAGCGTGCACTGTGGGGTCCCAGGCCTCCCGAGCCGAGCCACCCGTCACCCCCTGG FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF/FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBBBBB NM:i:0 MD:Z:93 AS:i:93 XS:i:85
HWI-ST218:512:C7GP4ANXX:1:1214:4075:22725#GGAACT 163 chr1 17431 41 100M = 17507 176 CCCACATCATCAGGGGCACAGCGTGCACTGTGGGGTCCCAGGCCTCCCGAGCCGAGCCACCCGTCACCCCCTGGCTCCTGGCCTATGTGCTGTACCTGTG BBBBBBFFFFFFFFFFFBFFFFFFFF<FFFFFFFFFFFF<FF<BFFFFFFFFFFFFFFFFF<BFFFB/FFFFFFFF<FFFFFFFFFFFFFFFFFF##### NM:i:0 MD:Z:100 AS:i:100 XS:i:92 XA:Z:chrX,-155252200,38M2D62M,2;chrY,-59355206,38M2D62M,2;chr15,-102513631,38M2D62M,2;chr9,+17542,62M2D38M,3;chr12,-88093,38M2D62M,4;
Flagstat from the old bam file:
81003600 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
79673553 + 0 mapped (98.36% : N/A)
81003600 + 0 paired in sequencing
40501800 + 0 read1
40501800 + 0 read2
78475118 + 0 properly paired (96.88% : N/A)
79450142 + 0 with itself and mate mapped
223411 + 0 singletons (0.28% : N/A)
513576 + 0 with mate mapped to a different chr
504939 + 0 with mate mapped to a different chr (mapQ>=5)
Flagstat from the new bam file:
81128792 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
125192 + 0 supplementary
0 + 0 duplicates
81023339 + 0 mapped (99.87% : N/A)
81003600 + 0 paired in sequencing
40501800 + 0 read1
40501800 + 0 read2
74405544 + 0 properly paired (91.85% : N/A)
80839590 + 0 with itself and mate mapped
58557 + 0 singletons (0.07% : N/A)
691704 + 0 with mate mapped to a different chr
531479 + 0 with mate mapped to a different chr (mapQ>=5)
hum reads are always badly aligned on 5' of chr1. What is the output of `samtools flagstats your.bam'
Thanks for your suggestions, the strange thin is that I inspected the old file (which was produced from the same reads) on the same region and I saw only rare reads with 0 MAPQ. I also compared the same reads between the two bam files, and in the old file they had MAPQ>0 and the new file had MAPQ of 0. Anyway, I attach the flagstat from the old and the new files, I hope it will help.
Okay, now I checked a different region (chr3:100000) and I see that all/most reads are properly aligned with high MAPQ. Still strange that the variant caller I used produced an empty file gatk's mutec2.
so your assertion was wrong isn't it ?