Question

Gold Standard for Human cancer exome sequencing

2

Entering edit mode

8.1 years ago

Jerome Lin ▴ 20

Hi all.

I am working on a matched tumor-normal somatic variant calling pipeline. My pipeline is as below:

bwa mem alignment
sort and deduplicate with samtools and picard
Realignment with GATK
Base recalibration with GATK
Somatic mutation calling with Mutect / Varscan2 (For MuTect, I try both 1.1.7 and 2, with default setting. For VarScan2, I filter reads with mapping quality 20 and use processSomatic to pick out high-confidence calls)

Here are my questions:

When I filtered out the variants in introns/UTR/ncRNA, there are very little of intersection between Mutect/VarScan hit. The intersection between Mutect and Mutect2 is also very low. I am aware of the fact that the false positive rate is very high in current somatic mutation calling tools, but is there a way (a combination of parameter setting) that can filter out most of noises? (I know MuTect2 gives INDEL calling while old ones don't.)
I try to find a gold standard reference for whole exome sequencing. But what I've found so far are some articles using NA12878, simulating tumor mutation based on normal sample. Is there any reference I can use to evaluate my pipeline?
COLO829 is another candidate for me to use as reference. Since it is a genome sequencing sample, would it be ideal to use it as a reference standard, by using the exonic intervals?

I am still a novice in WES. Any reply would be greatly appreciated.

Thanks.

WES Somatic MuTect VarScan Cancer • 3.1k views

ADD COMMENT • link updated 5.7 years ago by Biostar 20 • written 8.1 years ago by Jerome Lin ▴ 20

score 4 · Accepted Answer · 2016-11-04

4

Entering edit mode

8.1 years ago

harold.smith.tarheel ★ 5.0k

The DREAM challenge consortium has generated synthetic cancer data sets for benchmarking (whole genome, but you could easily filter to WES after alignment). Brad Chapman et al. at Blue Collar Bioinformatics have validated a lot of mutation-calling tools against this data (see here).

ADD COMMENT • link 8.1 years ago by harold.smith.tarheel ★ 5.0k