On the polyphen-2 webpage
http://genetics.bwh.harvard.edu/pph2/bgi.shtml
There are two options for the classifier model:
HumDiv HumVar
Could anyone explain what these are? Is there a documentation for this?
Thanks, forum.
On the polyphen-2 webpage
http://genetics.bwh.harvard.edu/pph2/bgi.shtml
There are two options for the classifier model:
HumDiv HumVar
Could anyone explain what these are? Is there a documentation for this?
Thanks, forum.
ftp://genetics.bwh.harvard.edu/pph2/training/README
PolyPhen-2 v2.2.2 training sets statistics (2011_12):
HumDiv: 5564 deleterious + 7539 neutral mutations from the same set of 978 human proteins.
HumVar: 22196 deleterious + 21119 neutral mutations in 9679 human proteins, no restriction on deleterious and neutral mutations coming from same proteins.
HumDiv is Mendelian disease variants vs. divergence from close mammalian homologs of human proteins (>=95% sequence identity).
HumVar is all human variants associated with some disease (except cancer mutations) or loss of activity/function vs. common (minor allele frequency >1%) human polymorphism with no reported association with a disease of other effect.
These are already explained here
PolyPhen-2 predicts the functional significance of an allele replacement from its individual features by Naïve Bayes classifier trained using supervised machine-learning.
Two pairs of datasets were used to train and test PolyPhen-2 prediction models. The first pair, HumDiv, was compiled from all damaging alleles with known effects on the molecular function causing human Mendelian diseases, present in the UniProtKB database, together with differences between human proteins and their closely related mammalian homologs, assumed to be non-damaging. The second pair, HumVar, consisted of all human disease-causing mutations from UniProtKB, together with common human nsSNPs (MAF>1%) without annotated involvement in disease, which were treated as non-damaging.
The user can choose between HumDiv- and HumVar-trained PolyPhen-2 models. Diagnostics of Mendelian diseases requires distinguishing mutations with drastic effects from all the remaining human variation, including abundant mildly deleterious alleles. Thus, HumVar-trained model should be used for this task. In contrast, HumDiv-trained model should be used for evaluating rare alleles at loci potentially involved in complex phenotypes, dense mapping of regions identified by genome-wide association studies, and analysis of natural selection from sequence data, where even mildly deleterious alleles must be treated as damaging.
For a mutation, PolyPhen-2 calculates Naïve Bayes posterior probability that this mutation is damaging and reports estimates of false positive rate (FPR, the chance that the mutation is classified as damaging when it is in fact non-damaging) and true positive rate (TPR, the chance that the mutation is classified as damaging when it is indeed damaging). A mutation is also appraised qualitatively, as benign, possibly damaging, or probably damaging based on pairs of false positive rate (FPR) thresholds, optimized separately for each model (e.g., HumDiv and HumVar).
Current version 2.1.0 of the PolyPhen-2 uses 5% / 10% FPR for HumDiv model and 10% / 20% FPR for HumVar model as the thresholds for this ternary classification. Mutations with their posterior probability scores associated with estimated false positive rates at or below the first (lower) FPR value are predicted to be probably damaging (more confident prediction). Mutations with the posterior probabilities associated with false positive rates at or below the second (higher) FPR value are predicted to be possibly damaging (less confident prediction). Mutations with estimated false positive rates above the second (higer) FPR value are classified as benign.
You can also find more information here.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
and in case of discrepancies: let's say "possibly" in HumDiv and "benign" in HumVar, which one is more reliable to be disease-associated? I mean, a variant can not be damaging for a Mendelian disease but not associated to any other...or there is something I am missing?
Thanks a lot in advance