Hello everyone,
I’m currently working on a pipeline for analyzing tumor-only WES data, which I understand has its challenges and limited resources available. I’d greatly appreciate any feedback or suggestions on my current approach.
Here's what I’m doing so far:
Somatic Variant Calling: I use GATK’s Mutect2 to call somatic variants, leveraging the germline resource and a panel of normals (PoN). Since my samples are FFPE, I also collect the F1R2 files during this step.
FFPE Artifacts & Contamination Handling:
- I run
LearnReadOrientationModel
to model FFPE artifacts. - I use
GetPileupSummaries
andCalculateContamination
to estimate contamination. - Then, I run
FilterMutectCalls
with both--contamination-table
and--tumor-segmentation
to apply the appropriate filters.
- I run
Variant Filtering:
- I use
SelectVariants
to retain only the variants that pass the filters. - Next, I filter out variants that are common across several germline databases to reduce the likelihood of retaining germline polymorphisms.
- I use
Functional Annotation:
- Finally, I focus on functional filtering by retaining only variants that are confirmed in either COSMIC or OncoKB.
Given that I only have tumor data without a matched normal, do you think this approach is robust and reliable for calling somatic variants? I'm particularly interested in any suggestions on refining the contamination estimation, filtering strategy, or any best practices I might have missed. Also, I want to call CNV with cnvkit, can I use the provided by mutect contamination estimation for the -m clonal --purity step
Thanks in advance for any insights!