Gold Standard for Human cancer exome sequencing
1
2
Entering edit mode
8.1 years ago
Jerome Lin ▴ 20

Hi all.

I am working on a matched tumor-normal somatic variant calling pipeline. My pipeline is as below:

  1. bwa mem alignment
  2. sort and deduplicate with samtools and picard
  3. Realignment with GATK
  4. Base recalibration with GATK
  5. Somatic mutation calling with Mutect / Varscan2 (For MuTect, I try both 1.1.7 and 2, with default setting. For VarScan2, I filter reads with mapping quality 20 and use processSomatic to pick out high-confidence calls)

Here are my questions:

  1. When I filtered out the variants in introns/UTR/ncRNA, there are very little of intersection between Mutect/VarScan hit. The intersection between Mutect and Mutect2 is also very low. I am aware of the fact that the false positive rate is very high in current somatic mutation calling tools, but is there a way (a combination of parameter setting) that can filter out most of noises? (I know MuTect2 gives INDEL calling while old ones don't.)

  2. I try to find a gold standard reference for whole exome sequencing. But what I've found so far are some articles using NA12878, simulating tumor mutation based on normal sample. Is there any reference I can use to evaluate my pipeline?

  3. COLO829 is another candidate for me to use as reference. Since it is a genome sequencing sample, would it be ideal to use it as a reference standard, by using the exonic intervals?

I am still a novice in WES. Any reply would be greatly appreciated.

Thanks.

WES Somatic MuTect VarScan Cancer • 3.1k views
ADD COMMENT
4
Entering edit mode
8.1 years ago

The DREAM challenge consortium has generated synthetic cancer data sets for benchmarking (whole genome, but you could easily filter to WES after alignment). Brad Chapman et al. at Blue Collar Bioinformatics have validated a lot of mutation-calling tools against this data (see here).

ADD COMMENT

Login before adding your answer.

Traffic: 1541 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6