Hi @slzr_
In short, no. What I mean is, it’s generally not recommended to use Mutect2 calls directly for classical population genetics analyses (like Fst, Tajima's D, etc.).
Analyses that employ metrics like Fst, Tajimas D, etc almost always assume germline variation rather than somatic variation; conversely Mutect2 is optimized for detecting somatic mutations in tumor–normal comparisons.
Why is Mutect2 inappropriate? Because of this difference, Mutect2 applies filters/heuristics that differ significantly from tools dedicated to germline calling. For instance, Mutect2 may aggressively filter out certain sites based on tumor / normal differences --> this makes sense for somatic variant calling, but in contrast, in a germline context, one would not necessarily even have such samples making the intent of use of Mutect2 unclear...
Stated a bit differently, you would expect NOT to have high-confidence genotypes at every site in every sample in the same way you would with germline pipelines ... but this is what you would want for downstream pop genetics studies.
So what would you use? Instead, for classical pop genetics you would likely run something like HaplotypeCaller followed by joint genotyping to obtain accurate germline calls, then calculate any statistics liek those above based on this. robust population genetics analyses.
Additional problems suggested by your question
Additionally, please keep in mind that having only ~50 genes may limit statistical power for measures like Fst and Tajima’s D, which typically benefit from larger genomic regions. It is also possible - even likely - that depending on the identity of those genes, results that dont generalize to the whole genome would be generated.
Thank you so much, you were really helpful! Do you think it is possible to do some kind of analysis to evaluate positive selection or mainly just with HaplotypeCaller?
What is the hypothesis here?