How Many Is Too Many? Germline And Somatic Coding Indels And Snps In Cancer Exome Capture.
2
6
Entering edit mode
13.3 years ago
Prateek ★ 1.0k

Would anyone care to share their experience with variant calling in cancer genomics using tumor - normal pair to find somatic vs germline variants especially indels?

I have been getting an unbelievably high number of germline indels that are "coding" after running GATK somatic Indel detector on a tumor-normal samples. Even after pretty strict coverage filters both for normal and tumor, we get ~20-30 somatic coding small indels (which I can digest) but about 600 coding germline indels - ~50% of them frameshift!

These are pretty convincingly "germline" when you look at the coverage in "normal" samples (to confirm germline events). I know this cannot happen and am trying to investigate the reasons - could there be

  1. Alignment issues
  2. contamination of normal (less likely as it is blood vs paraffin tumor)
  3. Annotation version issues (I have rechecked and eliminated this cause)

Any help is appreciated Thanks

Additional info:

% of consensus reads with called indel in Normal by total reads in normal is ~40-50% or ~90-100% with average over all indels as 60%. Similar numbers for tumor. So it does seem like true germline

cancer indel gatk variant variant • 6.7k views
ADD COMMENT
0
Entering edit mode

By any chance are a lot of these indels close to repetitive sequences?

ADD REPLY
0
Entering edit mode

@GWW - not really, there are a whole lot in the non-coding region that are close to repetitive regions but the one I am talking about are smack in the middle of well meaning exons. abt 50% small 3n indels and rest 50% frameshift.

ADD REPLY
0
Entering edit mode

@GWW - not really, there are a whole lot in the non-coding region that are close to repetitive regions but the ones I am talking about are smack in the middle of well meaning exons. abt 50% small 3n indels and rest 50% frameshift.

ADD REPLY
0
Entering edit mode

What do your quality metrics look like? If they don't have a high quality score and good coverage, it's probably junk. See how many you have left if you use SNP quality cutoffs of 50 or 75.

ADD REPLY
0
Entering edit mode

And you ran an "indel realignment" step on both the tumor and normal BAMs?

ADD REPLY
0
Entering edit mode

@aaron - yes both files were run through local indel realignment.

ADD REPLY
2
Entering edit mode
13.3 years ago

Look at the frequency of reads supporting those SNV calls. If they're close to 50% in normal, then yeah, it's probably a germline event. If you have tumor contamination in the normal (common) and have calls at lower percentages (say, 10%), then you can be fairly confident that contamination is what you're seeing.

ADD COMMENT
2
Entering edit mode

As a followup, have you looked for these particular germline variants in dbSNP? If they're common in the population, they're probably not particularly interesting either.

ADD REPLY
1
Entering edit mode

I generally just grab the appropriate dbSNP track from UCSC. http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/snp132.txt.gz

ADD REPLY
0
Entering edit mode

% of consensus reads with called indel in Normal by total reads in normal is ~40-50% or ~90-100% with average over all indels as 60%. Similar numbers for tumor. So it does seem like true germline.

ADD REPLY
0
Entering edit mode

As a followup, have you looked for these particular germline variants in dbSNP? If they're common in the population, they're not particularly interesting either.

ADD REPLY
0
Entering edit mode

@Chris - I eyeballed a couple and did find some incidence of proximity of our indels with those from dbSNP (~within 10-20 bp). Although not exactly the same alleles. I am planning to search the entire set against dbSNP.. Do you know a tool that can already do that? else I'll download the entire set from ensembl by using framehift / complex indels as the filter.

ADD REPLY
0
Entering edit mode

@Chris - you were right. A lot of them are from dbSNP. However, I still need to find out how and why so many of them can be tolerated in a single individual!

ADD REPLY
1
Entering edit mode
13.3 years ago

Perhaps the samples were accidentally swapped at some point, and your "normal" is really the tumor DNA and vice versa. These things can happen. You're going to have to do follow-up validation of interesting candidates in your own samples anyway.

ADD COMMENT

Login before adding your answer.

Traffic: 2007 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6