I have whole genome sequencing paired end data that I am trying to map to the human genome primary sequence hg38.
I mapped using bwa-mem after index the genome with bwa index command.
However, the mapping rate is very low here is the outcome of samtools flagstat downstream I am interested in variant calling using varscan2 or other variant caller.
681299381 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
6190495 + 0 supplementary
0 + 0 duplicates
217048997 + 0 mapped (31.86% : N/A)
675108886 + 0 paired in sequencing
337554443 + 0 read1
337554443 + 0 read2
161718456 + 0 properly paired (23.95% : N/A)
178076376 + 0 with itself and mate mapped
Do you have any idea how to improve the mapping rate, I mean is this rate of mapping expected in general or am I doing something wrong? I usually do exome or rnaseq where mapping rate is in the 90s.
You can take a sample of reads that are not mapping and check them using blast at NCBI. This should help you eliminate potential contamination issues.