I am analyzing exome seq data from multiple solid tumor samples. The samples all originate from a single human primary tumor. This original sample was passaged in mice and expanded to create multiple parallel samples that were exposed to several different drug conditions between passages 4 and 6 before being sequenced separately. After calling SNPs on these samples using the UnifiedGenotyper, I plotted a histogram of the mutant allele frequencies across all the called SNPs for all of the samples. Here "allele frequency" is the # of reads supporting the ALT allele (summed across all samples) divided by total # of reads covering that base (summed across all samples). The distribution I got looks like:
[url=http://www.freeimagehosting.net/ow51y]
I'm hoping somebody can help me interpret this.
do the peaks at 0.33/0.66, 0.4/0.6, 0.2/0.8 indicate CNV and/or tumor heterogeneity and multiple subpopulations of cells?
it looks like allele frequency is mostly normally distributed. Is this expected?
Ps. I ran bwa/Picard/GATK to call the SNPs - running UnifiedGenotyper on all samples simultaneously (-dcov 1000) followed by VQSR. For the VQSR step, I ran with hapmap3.3.b37.sites.vcf and 1000Gomni2.5.b37.sites.vcf as training data and -an QD, HaplotypeScore, MQRankSum, ReadPosRankSum, FS, MQ, DP. For the histogram I only used the ~38,000 SNPs that passed the VQSR filter, and had > 30 reads coverage (out of a total of about ~600,000 SNPs called). To compute the allele frequency for a given SNP in a given sample, I used the GATK-reported values in the AD field of the VCF genotype column to get the # of reads supporting the REF and ALT alleles.
Interesting plot. So you're sure that all the variants represented in this plot have > 30X coverage?
Thanks, I went back and double checked and yes its > 30x SNPs. Also, raising the coverage threshold even to 100x excludes ~9,000 more SNPs but doesn't really change the shape of the distribution: http://www.freeimagehosting.net/anjxr
Is this a drug resistance study ? Do you plan to calculate any somatic changes after drug exposure ?