I am working with nonmodel systems, so I have followed GATK's workflow for using high confidence SNPs discovered in the first round of SNP calling as the SNP panel to aid base quality score recalibration. This is followed by a second round of SNP calling using the recalibrated bam files. However, I am now analyzing SNPs in a sister species to the focal species that has the annotated reference genome. I know that BQSR looks for mismatches with the reference and adjusts quality scores while accounting for the SNP panel I provide. My main questions are the following:
- What kind of biases could be introduced during BQSR when the reference genome is from a sister taxa to the samples?
and
- Should I be more lenient when building my high confidence SNP set since I should expect more mismatches between the samples and the reference genome during BQSR?
Let me know what you think! Happy to elaborate if I've left important information out. Thanks for the help!
See answer here: Base recalibration in normal vs. tumor somatic variant calling in WXS data?
Heng Li said:
Interesting! Thank you for such a quick response!