Hello everyone,
I'm seeking feedback on my pipeline for detecting somatic variants and rare germline mutations associated with a rare type of cancer. Unfortunately, I wasn't part of the project during the experimental design phase, so we only have whole exome sequencing data from the cancer samples and no paired normal samples.
I aim to produce a multisample VCF and analyze the data from there. Here's my plan:
- GATK Best Practices for Germline Mutations: I intend to follow the GATK best practices for germline mutation calling.
- Filtering Strategy: I'll apply a filtering methodology described in this paper. (Basically using filtering and public variants database)
I've decided not to use Mutect2 for now because it doesn't support creating a multisample VCF (no GVCF output). While I know this approach won't yield purely somatic variants, I believe it's the best option given the circumstances.
What are your thoughts on this approach? Are there any improvements or alternative strategies you would suggest?
Thanks in advance for your insights!
I do not have too much knowledge of the exact differences between multi-sample VCFs (GVCFs) and regular VCFs in particular (i.e. if statistics or assumptions differ and therefore the results), but if you want to look for somatic mutations, is there a reason why it wouldn't be possible to use
Mutect2
and then use something likebcftools merge
to create a multi-sample VCF file? (Though again, I don't know if there are major differences between a multi-sample VCF created this way vs directly from GATK).Hey, ty for your reply! \ The reason this would be problematic is because of missing genotypes. If you use vcf formatting when merging every 0/0 genotype will be interpreted as missing (./.). This will result in ambiguous ./. sites