I use GATK to make variants calling on exome sequencing data from human tumor samples, and have been using GATK for a few months now. In the VQSR step, I use the Mills_and_1000G_gold_standard.indels.hg19.vcf and dbsnp_137.hg19.vcf to filter out those common SNPs/Indels. I additionally use GATK's SelectVariant walker to select only variants. At the end of the GATK run, I still have about 2000 SNPs for the samples.
This number of mutations is not quite workable for biologists who do wet experiments, so I am always asked to narrow down the list of variants. I used the Polyphen2 score as a guide for the data filtration. The choice of a polyphen cut-off score is arbitrary - I use a minimum of 0.6, but it's hard to justify why I did not choose a different score. I want to do this filtration part more objectively without losing those correct and meaningful variants.
I've heard people use dbSNP 130 vcf and NHLBI exome seq data http://evs.gs.washington.edu/EVS/#tabs-7 to filter the VCF results. It looks to me people are trying to filter out those previously identified variants as many as possible - just to get the variants uniquely identified in their samples. I am a little bit concerned about the way of this practice. Unique variants may not tell a whole picture of what's going on in the tumor samples. So I would like to discuss with you guys what's the best practice of filtering VCF for meaningful research.
Do you also have paired normal samples from the sample subject? If so, you could filter out germline variants and keep only somatic variants for your tumor samples.