Population genetics with mutect2 data
1
0
Entering edit mode
3 days ago
slzr_ • 0

Hey guys, I am currently evaluating nearly 50 genes in a group of samples, and the variant calling was performed using Mutect2. Is it possible to conduct population genetics analyses, such as Fst, Tajima's D, etc., with data obtained from Mutect2? I know that most analyses are more reliable when performed using HaplotypeCaller.

haplotypecaller mutect2 • 239 views
ADD COMMENT
2
Entering edit mode
3 days ago
LauferVA 4.6k

Hi @slzr_

In short, no. What I mean is, it’s generally not recommended to use Mutect2 calls directly for classical population genetics analyses (like Fst, Tajima's D, etc.).

Analyses that employ metrics like Fst, Tajimas D, etc almost always assume germline variation rather than somatic variation; conversely Mutect2 is optimized for detecting somatic mutations in tumor–normal comparisons.

Why is Mutect2 inappropriate? Because of this difference, Mutect2 applies filters/heuristics that differ significantly from tools dedicated to germline calling. For instance, Mutect2 may aggressively filter out certain sites based on tumor / normal differences --> this makes sense for somatic variant calling, but in contrast, in a germline context, one would not necessarily even have such samples making the intent of use of Mutect2 unclear... Stated a bit differently, you would expect NOT to have high-confidence genotypes at every site in every sample in the same way you would with germline pipelines ... but this is what you would want for downstream pop genetics studies.

So what would you use? Instead, for classical pop genetics you would likely run something like HaplotypeCaller followed by joint genotyping to obtain accurate germline calls, then calculate any statistics liek those above based on this. robust population genetics analyses.

Additional problems suggested by your question Additionally, please keep in mind that having only ~50 genes may limit statistical power for measures like Fst and Tajima’s D, which typically benefit from larger genomic regions. It is also possible - even likely - that depending on the identity of those genes, results that dont generalize to the whole genome would be generated.

ADD COMMENT
0
Entering edit mode

Thank you so much, you were really helpful! Do you think it is possible to do some kind of analysis to evaluate positive selection or mainly just with HaplotypeCaller?

ADD REPLY
0
Entering edit mode

What is the hypothesis here?

ADD REPLY

Login before adding your answer.

Traffic: 2234 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6