What is a good threshold for calling variants in WES data
1
0
Entering edit mode
9.4 years ago

Hi everyone

I'm currently analyzing data from a targeted sequencing of a panel of 60 genes in some rare cancer samples. The DNA has been extracted from FFPE samples, and therefore we have had a discussion as to what frequency we should allow a variant to be called as FFPE samples might be damaged.

Default is 1% but we were thinking of using 5% instead?

We have an average coverage of 99,6% of the genes with a average depth of around 600X.

We have used VarScan2 to call the SNPs and FreeBayes to call indels.

/Nicolai

sequencing variant-calling threshold • 3.1k views
ADD COMMENT
1
Entering edit mode
9.4 years ago

FFPE samples have a wide range of qualities, depending on a number of factors. Some of this will need to be assessed heuristically - how many calls do you expect, and how many do you see? What's the mutation context of those mutations?

As a general rule, though, the error level of Illumina platform data is somewhere around 0.5%. (Note that this is heavily dependent on prep, number of PCR amplification cycles, etc). In FFPE data, I'd have a hard time believing calls at 1%, even with 600x data.

Another question might be - do you really care about mutations at 1%? As an example, if you're searching for drivers of therapy resistance that might be clonally selected, then the answer may be yes. If you're looking for treatments that will target all cells in the tumor (founding clone), then you'll be looking exclusively for much higher VAF events.

ADD COMMENT
0
Entering edit mode

Hi Chris

Thanks for the comments. It's exactly these kinds of considerations that I'm having. Increasing the call threshold up to 5% reduce our mutations down from 1500 to around 500 which seems more manageable, but what is the cost? We are trying to identify the genomic landscape and want to decrease the chance that what we see is an artefact of the FFPE sample. So would the increase from 1% to 5% help us in this? The DNA was in good condition despite being from FFPE and has been run on an Illumina HiSeq2000.

ADD REPLY
0
Entering edit mode

Without knowing details about the characteristics of your pipeline, library prep, and about a thousand other variables, it's hard to say. Ideally, you would have some controls in place up front - sequencing a known genome from a pure cell line, making note of the artifacts that you call, and then setting appropriate statistical or heuristic thresholds.

I can offer this:

  1. We've done some of these sorts of things with 1000x genome coverage and panels of 8 different variant callers, and concluded that all of them stink below 10% VAF. The false-positive rate is just enormous

  2. If a paper came to me for review where you called a bunch of variants at 1%, I'd demand some serious proof that they weren't artifacts. No one is going to blink an eye if you miss true variants at 1%, as no one expects you to be able to call those. Reporting false-positive sites that may trigger incorrect conclusions or lead people down unfruitful research paths is a much greater sin.

ADD REPLY

Login before adding your answer.

Traffic: 1957 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6