I am doing bwa mem alignment and want to use the -c
in a best way. As, I understand -c INT
is meant to discard a MEM reads if it has more than INT occurrences in a genome. The default -c
value is 10000, so a read may be potentially aligned to at maximum of 10000 places in the genome. But, reporting of an alignment is also controlled by -T
[default value 30], i.e. no alignment will be reported that has value below 30 (1st Question - so is this value of 30 referring to mapQ?).
I want to do a variant calling on my population samples using genome re-seq data that has coverage above 10x (10 to 70). The population should not vary more than 1%.
I previously did bwa mem alignment using default parameter, and I choose uniquely mapped reads using -f 2
flag in samtools, and/or used mapQ values to select for the reads that show good mapping quality (I think mapQ > 39, is good, since anything that maps to more places will have its mapQ value exponentially reduced).
So, if I chose to do -c 1
, any reads that maps uniquely won't be affected in the report. But, the reads that potentially can map to multiple places will be selected for the best MEM, right? Since -c
is set to 1.
But will this multiple mapping reads have the high mapQ values since its alignment to other places in the genome is restricted?
Additionally, what will happen if the reads can map equally to two different places with equal mismatches and gaps?
Finally, is the setting -c 1
as good way of identifying polymorhisms? - I am kind of in a grey area here and would appreciate any suggestions - I understand it all depends upon if the reads has repetitive bases in it and/or the amount of mismatches/gaps it may have compared to one region of the genome vs. the other one, But, my guess if I set -c 1
and restrict the multiple mapping reads to only one alignment the chances that the alignment and identified polymorphisms is right is high - since the reference genomes are generated using highly inbred lines which are mostly homozygous. My guess is setting -c 1
(or possibly 2) will help use capture valuable polymorphisms, but I also plan on using mapQ to supplement the selection.
Thanks
To add to Stefan's comment. I think most SNP callers will not take a multi-mapping read seriously and will give it poor quality. So there is no need to tweak the alignment, rather you can play with the quality of called SNPs.