Expected Snp Allele Frequencies In Tumor Samples
1
0
Entering edit mode
12.6 years ago
bw. ▴ 260

I am analyzing exome seq data from multiple solid tumor samples. The samples all originate from a single human primary tumor. This original sample was passaged in mice and expanded to create multiple parallel samples that were exposed to several different drug conditions between passages 4 and 6 before being sequenced separately. After calling SNPs on these samples using the UnifiedGenotyper, I plotted a histogram of the mutant allele frequencies across all the called SNPs for all of the samples. Here "allele frequency" is the # of reads supporting the ALT allele (summed across all samples) divided by total # of reads covering that base (summed across all samples). The distribution I got looks like:

[url=http://www.freeimagehosting.net/ow51y]

I'm hoping somebody can help me interpret this.

  • do the peaks at 0.33/0.66, 0.4/0.6, 0.2/0.8 indicate CNV and/or tumor heterogeneity and multiple subpopulations of cells?

  • it looks like allele frequency is mostly normally distributed. Is this expected?

Ps. I ran bwa/Picard/GATK to call the SNPs - running UnifiedGenotyper on all samples simultaneously (-dcov 1000) followed by VQSR. For the VQSR step, I ran with hapmap3.3.b37.sites.vcf and 1000Gomni2.5.b37.sites.vcf as training data and -an QD, HaplotypeScore, MQRankSum, ReadPosRankSum, FS, MQ, DP. For the histogram I only used the ~38,000 SNPs that passed the VQSR filter, and had > 30 reads coverage (out of a total of about ~600,000 SNPs called). To compute the allele frequency for a given SNP in a given sample, I used the GATK-reported values in the AD field of the VCF genotype column to get the # of reads supporting the REF and ALT alleles.

snp gatk cancer • 6.0k views
ADD COMMENT
1
Entering edit mode

Interesting plot. So you're sure that all the variants represented in this plot have > 30X coverage?

ADD REPLY
0
Entering edit mode

Thanks, I went back and double checked and yes its > 30x SNPs. Also, raising the coverage threshold even to 100x excludes ~9,000 more SNPs but doesn't really change the shape of the distribution: http://www.freeimagehosting.net/anjxr

ADD REPLY
0
Entering edit mode

Is this a drug resistance study ? Do you plan to calculate any somatic changes after drug exposure ?

ADD REPLY
5
Entering edit mode
12.6 years ago

For "healthy" individuals the allele frequency distribution should look more like a beta distribution. This article has a nice allele frequency distribution.

http://www.annualreviews.org/doi/pdf/10.1146/annurev.genom.7.080505.115806

I would be concerned about a large amount of heterogeneity in your cancer samples.

I would also consider trying other variant callers designed specifically for cancer.

ADD COMMENT
0
Entering edit mode

+1 for the nice answer and a link to the article.

ADD REPLY
0
Entering edit mode

Thank you for the interesting article. Sorry - I didn't mention a key detail - the samples are all derived from one human primary tumor which was passaged in mice over several passages. Reads that aligned to mouse with higher score than to human were not excluded from SNP calling. Which makes this graph all the more surprising, right? Any recommendations of other tools would be appreciated.

ADD REPLY
1
Entering edit mode

http://genome.cshlp.org/content/early/2012/02/02/gr.129684.111 VarScan. Since it sound like you are doing pooled sequencing I would check out SNVer http://snver.sourceforge.net/

ADD REPLY
0
Entering edit mode

Thanks a lot. I will check it out. The samples were not pooled - I just added a longer description to my original post.. but the # of reads supporting the ALT allele was summed across all samples, as was the total coverage.

ADD REPLY

Login before adding your answer.

Traffic: 1913 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6