Question

How Many Is Too Many? Germline And Somatic Coding Indels And Snps In Cancer Exome Capture.

6

Entering edit mode

13.6 years ago

Prateek ★ 1.0k

Would anyone care to share their experience with variant calling in cancer genomics using tumor - normal pair to find somatic vs germline variants especially indels?

I have been getting an unbelievably high number of germline indels that are "coding" after running GATK somatic Indel detector on a tumor-normal samples. Even after pretty strict coverage filters both for normal and tumor, we get ~20-30 somatic coding small indels (which I can digest) but about 600 coding germline indels - ~50% of them frameshift!

These are pretty convincingly "germline" when you look at the coverage in "normal" samples (to confirm germline events). I know this cannot happen and am trying to investigate the reasons - could there be

Alignment issues
contamination of normal (less likely as it is blood vs paraffin tumor)
Annotation version issues (I have rechecked and eliminated this cause)

Any help is appreciated Thanks

Additional info:

% of consensus reads with called indel in Normal by total reads in normal is ~40-50% or ~90-100% with average over all indels as 60%. Similar numbers for tumor. So it does seem like true germline

cancer indel gatk variant variant • 7.0k views

ADD COMMENT • link updated 13.6 years ago by David Quigley 11k • written 13.6 years ago by Prateek ★ 1.0k

0

Entering edit mode

By any chance are a lot of these indels close to repetitive sequences?

ADD REPLY • link 13.6 years ago by Gww ★ 2.7k

0

Entering edit mode

@GWW - not really, there are a whole lot in the non-coding region that are close to repetitive regions but the one I am talking about are smack in the middle of well meaning exons. abt 50% small 3n indels and rest 50% frameshift.

ADD REPLY • link 13.6 years ago by Prateek ★ 1.0k

0

Entering edit mode

@GWW - not really, there are a whole lot in the non-coding region that are close to repetitive regions but the ones I am talking about are smack in the middle of well meaning exons. abt 50% small 3n indels and rest 50% frameshift.

ADD REPLY • link 13.6 years ago by Prateek ★ 1.0k

0

Entering edit mode

What do your quality metrics look like? If they don't have a high quality score and good coverage, it's probably junk. See how many you have left if you use SNP quality cutoffs of 50 or 75.

ADD REPLY • link 13.6 years ago by Docroberson ▴ 320

0

Entering edit mode

And you ran an "indel realignment" step on both the tumor and normal BAMs?

ADD REPLY • link 13.6 years ago by Aaronquinlan 12k

0

Entering edit mode

@aaron - yes both files were run through local indel realignment.

ADD REPLY • link 13.6 years ago by Prateek ★ 1.0k

score 2 · Answer 1 · 2011-08-25

2

Entering edit mode

13.6 years ago

Chris Miller 22k

Look at the frequency of reads supporting those SNV calls. If they're close to 50% in normal, then yeah, it's probably a germline event. If you have tumor contamination in the normal (common) and have calls at lower percentages (say, 10%), then you can be fairly confident that contamination is what you're seeing.

ADD COMMENT • link 13.6 years ago by Chris Miller 22k

2

Entering edit mode

As a followup, have you looked for these particular germline variants in dbSNP? If they're common in the population, they're probably not particularly interesting either.

ADD REPLY • link 13.6 years ago by Chris Miller 22k

1

Entering edit mode

I generally just grab the appropriate dbSNP track from UCSC. http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/snp132.txt.gz

ADD REPLY • link 13.6 years ago by Chris Miller 22k

0

Entering edit mode

% of consensus reads with called indel in Normal by total reads in normal is ~40-50% or ~90-100% with average over all indels as 60%. Similar numbers for tumor. So it does seem like true germline.

ADD REPLY • link 13.6 years ago by Prateek ★ 1.0k

0

Entering edit mode

As a followup, have you looked for these particular germline variants in dbSNP? If they're common in the population, they're not particularly interesting either.

ADD REPLY • link 13.6 years ago by Chris Miller 22k

0

Entering edit mode

@Chris - I eyeballed a couple and did find some incidence of proximity of our indels with those from dbSNP (~within 10-20 bp). Although not exactly the same alleles. I am planning to search the entire set against dbSNP.. Do you know a tool that can already do that? else I'll download the entire set from ensembl by using framehift / complex indels as the filter.

ADD REPLY • link 13.6 years ago by Prateek ★ 1.0k

0

Entering edit mode

@Chris - you were right. A lot of them are from dbSNP. However, I still need to find out how and why so many of them can be tolerated in a single individual!

ADD REPLY • link 13.6 years ago by Prateek ★ 1.0k

score 1 · Answer 2 · 2011-08-25

1

Entering edit mode

13.6 years ago

David Quigley 11k

Perhaps the samples were accidentally swapped at some point, and your "normal" is really the tumor DNA and vice versa. These things can happen. You're going to have to do follow-up validation of interesting candidates in your own samples anyway.

ADD COMMENT • link 13.6 years ago by David Quigley 11k