Hi,
I am trying to find out a way to select/prioritize the missense variants based on an array of pathogenicity prediction scores generated by SIFT, CADD, REVEL, PolyPhen, MPC, and M_CAP. Right now, I have transformed these scores into dichotomous variables (based on their recommended cut-off) and added them to generate a combined score (ranging from 0 to 6).
Then further->select only those missense variants which showed deleterious/damaging effect recommended by at least 50% of the above-mentioned tools.
Can someone suggest other possible ways/approaches to combine these scores (which can be helpful in prioritizing the missense variants)?
Thanks in advance
Apurba
Thanks, for your reply. I am aware that in addition to CADD, other scores e.g. REVEL, MPC, etc. were generated combinedly by utilizing multiple other scores and features. But my question was how someone can use them together to select/prioritize the missense variants for their own data sets (e.g. right now for our dataset, I am using a cut-off of at least 3 out of 6 prediction tools (if at least 3 of them would show the damaging impact) to select a missense variant, but that seems to be subjective; someone else may prefer to select the cut off of 4 out of 6)?
You can use it as 3 of 6, or 4 of 6, or 5 of 6, or 1 of 6, and still be right. It is the question of the accuracy of your threshold. To assess the accuracy (Precision ad Recall) you can't avoid a proper statistical approach with positive and negative examples.