I have some reads from a targeted capture kit that bwa gives a mapping quality of 0. I know (probably) the reads are mapped correctly because they do indeed map to the captured gene. Also when I blast the 101bp read they only map to then gene they should map to. The problem comes when I go to call variants because GATK will throw out because the mapping quality is so 0, I could try using -rf ReassignOneMappingQuality -RMQF 255 -RMQT 60 with the unified genotyper but I don't think this ideal.
What I would really like to do get a more descriptive mapping quality score since I'm pretty sure its mapped correctly. However I cant really find documentation on the web about mapping qualities of 0.
Since my blast only returned 1 position for my 101bp read, I am assuming the mapping quality score doesn't really come from the entire read, I am thinking perhaps the seed is the main driver of the mapping score and the seed must map to multiple locations in the genome but the entire read does not? Does anyone know if I just increase the seed size will that work? Can I even increase the seed size with bwa sampe?
Thanks for the help
I should have stated that I realize mapping qualities of 0 means the read maps to multiple locations but I don't think that is true because when I blast the read via NCBI it only maps to my captured gene, it does not map to multiple genomic locations. I realize I am not blast my actual ref sequence hg19 but it should really matter because if there was a location that was a very close match it should have come up on the ncbi blast.That is why I was asking about the seed? It doesn't make sense to me at why my sequence would not map to multiple blast locations.
see the tag XA in your SAM alignment (last column), does it list the alternative locations that the read was matched against?
also feel free to add the read sequence to your post, it could be an interesting case
Here is the sequence
Here is the sam record --as you can see the sequence is the reverse complement.
I dont see the XA column but if I am following the SAM spec. the read does have a mapping quality of 0
it does have the
XT:A:R
tag that indicates reads with multiple alignments see ignore reads in bwait should also have an XA tag that actually lists these positions - but it also seems that you have passed this data through other steps and it is not a direct output from bwa
Hello huskerjeff492, you should notice the following notes:
(1)
bwa mem
will ouput theXA
column only if there wereless than 5 hits
by default (can be changed by the -h parameter). It is impossible to list all the alignments if there were thousands hists.(2) according to this sequence, it really has only one hit in the human genome. However, I'm not sure whether it is the raw bwa output. You should turn to the raw data, get a batch of reads (for example, 100 or 1000) with MapQ=0, and then BLAST them. I've tried ~160 reads with MapQ=0 in my BAM file, all of which has >5 multiple reads