I have 15 tumors that were sequenced using whole genome sequencing at 30x depth. I would like to identify somatic variants using Mutect2. Unfortunately we don't have any normal samples, so I would like to build a Panel of Normals to use with Mutect2 in tumor-only to identify somatic variants. I also plan on running Mutect2 with gnomAD to help filter germline variants.
Are there any resources that host publicly available WGS samples I could use to construct the Panel of Normals? Ideally I'm looking for healthy blood samples sequenced using TruSeq library prep on the Illumina NovaSeq platform.
I've checked out the 1000 Genomes project but it appears the sequencing technology they used doesn't match my own (probably due to how long ago the project finished). Are there newer resources that would have WGS samples with similar technical properties as my samples?
Furthermore, even if you aren't aware of samples with those specific properties, what resources do people use for WGS Panel of Normal creation if they don't have in house samples?
You could use ExAC or gnomAD as a stand in for Panel of Normals. Also, there are other files you could use, such as the Mutect2-exome-panel.vcf for hg19 from this folder or the 1000g PoN file for hg38 from this folder.
It was my understanding that the PoN should consist of samples that were sequenced in a similar way to my own samples. I guess using the 1000g PoN could seem reasonable (although the assumption there is that their prep kit/sequencer is similar to mine - probably not a bad although not optimal assumption), but aren't exomes sequenced using rather different protocols genomes? Is it common practice to use PoNs generated from exome samples for WGS variant calling and vice versa?
I'm not really sure. Common sense dictates that platform/protocol matched normals should be the ones used to form the PoN. However, there must be some sort of middle ground between "it needs to be sequenced on the same kind of machine using the same protocol" and "it could have been sequenced anywhere". Maybe there is an acceptable difference in depth or acceptable tweaks in comparison parameters between mutation entries in tumor samples vs PoN samples. I'm a beginner at this too, so I'd consult other experts on the forum.
As alternative, you may use VarScan2 and remove variants from Gnomad. Not perfect, but better than some lĂmited panel of normals.
I forget to mention I plan to run Mutect2 with a PoN and gnomAD to filter for germline variants. Would it be correct to say VarScan2 does the same thing as Mutect2 with the gnomAD VCF - filter germline variants using gnomAD?
Not really, it is a totally different variant caller which we use to call tumor variants with absence of normal matched pair :) but well, then keep up to your plan!