I think that SIFT refers to some measurement of SNPs, and while reading Annovar paper, I saw below sentence as follows:
Finally, Annovar can filter specific variants such as SNPs with >1% frequency in the 1000 Genomes Projects, or non-synonymous SNPs with SIFT scores > 0.05.
Regarding above sentence, I ask you two questions.
I think that 1% frequency is a little bit low allele frequency. Dose it have an effect to filtering irrelevant snp variants? I don't think so..
SIFT-score threshold is about 0.05 as shown in above sentence. What does SIFT means about and threshold of 0.05 might be effect on filtering variants?
SIFT and PolyPhen are the two most commonly used algorithms for predicting if a SNP has a (generally negative) effect on protein structure. Due to the nature of the redundant genomic code, many SNPs never translate into any effect in the protein - far more than you would expect by chance - because variations which effect protein sequence are usually under negative selection pressure - so SIFT/PloyPhen can be used to weed out a lot of irrelevant stuff from a very large list of candidate variations.
If i'm not mistaken (and its been a long time since i used either, so i might be making this up) SIFT's algorithm gives more weighting to variations which change the net charge of the protein, while PolyPhen uses aminoacid or base conservation to determine relative importance. Both obviously rank premature stop variants and other nonsense variants very highly - so often there is a lot of overlap.
Again, it's been a long time since I used either, i might have gotten that the wrong way around. But i can tell you this - I spent 3 years studying single-basepair exon variants in consanguineous families with a known phenotype, and very very very rarely did SIFT or PolyPhen ever guess the correctly the variant from a list. I wouldn't say they are junk, they're not, but variants which caused transcription factor non-specificity, splicing variants, RNAPol destabilisation, etc, are completely ignored. Do not rely on SIFT and Polyphen for anything other than ordering a candidate list for follow-up analysis :)
Would you use VEP, as well as looking at the genomic location (e.g. splice sites), for prediction of the functional effects that are overlooked by Polyphen and SIFT?
Yes I would, since it also runs PolyPhen/SIFT on your input, but also gives other useful hints in addition as you rightly say. These days, i'd point everyone to VEP :)
To answer your first question, 1% is the standard cutoff used to describe the difference between "common" and "rare" variants. Depending on your study, you might want to change that. For example, in a GWAS for a common trait, you might be interested only in variants that are above a certain frequency in the population, whereas if you're looking at rare Mendelian traits you might only want very low frequency variants. You may also want to narrow this down to a specific population, eg for a GWAS in African Americans, you would be interested in variants common in African populations. Steve mentioned the VEP, which allows you to filter variants by frequency, choosing your own frequency, > or < and pick a population of interest.
An update: VEP is not in the same tier as SIFT or PolyPhen2 - the latter are predictors, and VEP is an annotator. It uses SIFT and PolyPhen2 (not combines, uses) to annotate a nucleotide change in a protein coding context with the effect of its downstream amino acid change. ConDel (Consensus Deletion something) combines multiple tools, and something like PredictSNP adds its own score to a bunch of tools it runs in the background.
If you're looking for the effect of SNPs on protein function, you can probably use CRAVAT. It provides a VEST pathogenecity score that enumerates its functional impact. Refer to this post for more details about the tool.
Hi John,
Would you use VEP, as well as looking at the genomic location (e.g. splice sites), for prediction of the functional effects that are overlooked by Polyphen and SIFT?
Many thanks,
Silvia
Yes I would, since it also runs PolyPhen/SIFT on your input, but also gives other useful hints in addition as you rightly say. These days, i'd point everyone to VEP :)