Publicly Available WGS Samples for Panel of Normals?
1
4
Entering edit mode
4.3 years ago
tjbencomo ▴ 60

I have 15 tumors that were sequenced using whole genome sequencing at 30x depth. I would like to identify somatic variants using Mutect2. Unfortunately we don't have any normal samples, so I would like to build a Panel of Normals to use with Mutect2 in tumor-only to identify somatic variants. I also plan on running Mutect2 with gnomAD to help filter germline variants.

Are there any resources that host publicly available WGS samples I could use to construct the Panel of Normals? Ideally I'm looking for healthy blood samples sequenced using TruSeq library prep on the Illumina NovaSeq platform.

I've checked out the 1000 Genomes project but it appears the sequencing technology they used doesn't match my own (probably due to how long ago the project finished). Are there newer resources that would have WGS samples with similar technical properties as my samples?

Furthermore, even if you aren't aware of samples with those specific properties, what resources do people use for WGS Panel of Normal creation if they don't have in house samples?

WGS sequencing snp • 4.6k views
ADD COMMENT
2
Entering edit mode

You could use ExAC or gnomAD as a stand in for Panel of Normals. Also, there are other files you could use, such as the Mutect2-exome-panel.vcf for hg19 from this folder or the 1000g PoN file for hg38 from this folder.

ADD REPLY
0
Entering edit mode

It was my understanding that the PoN should consist of samples that were sequenced in a similar way to my own samples. I guess using the 1000g PoN could seem reasonable (although the assumption there is that their prep kit/sequencer is similar to mine - probably not a bad although not optimal assumption), but aren't exomes sequenced using rather different protocols genomes? Is it common practice to use PoNs generated from exome samples for WGS variant calling and vice versa?

ADD REPLY
1
Entering edit mode

I'm not really sure. Common sense dictates that platform/protocol matched normals should be the ones used to form the PoN. However, there must be some sort of middle ground between "it needs to be sequenced on the same kind of machine using the same protocol" and "it could have been sequenced anywhere". Maybe there is an acceptable difference in depth or acceptable tweaks in comparison parameters between mutation entries in tumor samples vs PoN samples. I'm a beginner at this too, so I'd consult other experts on the forum.

ADD REPLY
0
Entering edit mode

As alternative, you may use VarScan2 and remove variants from Gnomad. Not perfect, but better than some lĂ­mited panel of normals.

ADD REPLY
0
Entering edit mode

I forget to mention I plan to run Mutect2 with a PoN and gnomAD to filter for germline variants. Would it be correct to say VarScan2 does the same thing as Mutect2 with the gnomAD VCF - filter germline variants using gnomAD?

ADD REPLY
0
Entering edit mode

Not really, it is a totally different variant caller which we use to call tumor variants with absence of normal matched pair :) but well, then keep up to your plan!

ADD REPLY
2
Entering edit mode
4.3 years ago
tjbencomo ▴ 60

I ended up using the hg38 1000 Genomes PON from the Broad linked to by @RamRS. Although the PON was most likely generated from samples sequenced using different prep protocols, one of the GATK developers says on the GATK forum that the PON is still useful to account for mapping artifacts.

Because most errors caught by the panel of normals are mapping artifacts these are still useful despite changes in sequencing technology. "1000g_pon.hg38.vcf" is an hg38 panel of normals for both exomes and whole genomes generated from 1000 Genomes Project samples. Finally, "af-only-gnomad.hg38.vcf" is a copy of the gnomAD VCF stripped of all unnecessary INFO fields. It is used for the -germline-resource argument.

ADD COMMENT

Login before adding your answer.

Traffic: 2437 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6