Question

Need suggestions to combine the missense variants pathogenicity prediction scores generated by SIFT, CADD, REVEL, PolyPhen, MPC, etc. to prioritize/select further

1

Entering edit mode

3.6 years ago

Apurba ▴ 10

Hi,

I am trying to find out a way to select/prioritize the missense variants based on an array of pathogenicity prediction scores generated by SIFT, CADD, REVEL, PolyPhen, MPC, and M_CAP. Right now, I have transformed these scores into dichotomous variables (based on their recommended cut-off) and added them to generate a combined score (ranging from 0 to 6).

Then further->select only those missense variants which showed deleterious/damaging effect recommended by at least 50% of the above-mentioned tools.

Can someone suggest other possible ways/approaches to combine these scores (which can be helpful in prioritizing the missense variants)?

Thanks in advance

Apurba

Missense PredictionTools • 1.5k views

ADD COMMENT • link updated 3.4 years ago by Ram 44k • written 3.6 years ago by Apurba ▴ 10

score 1 · Answer 1 · 2021-05-12

1

Entering edit mode

3.6 years ago

German.M.Demidov ★ 2.9k

Hi,

there is a lot of work in this direction.

In brief, you need a large set of truly pathogenic variants (ClinVar etc) and truly neutral ones (1000GP and other population bases) and use these 6 scoring as features and train some sort of classifier.

CADD if I am not mistaken is already a combination of scores.

Or you may use tools already developed (e.g. https://genomemedicine.biomedcentral.com/articles/10.1186/s13073-020-00775-w )

ADD COMMENT • link 3.6 years ago by German.M.Demidov ★ 2.9k

0

Entering edit mode

Thanks, for your reply. I am aware that in addition to CADD, other scores e.g. REVEL, MPC, etc. were generated combinedly by utilizing multiple other scores and features. But my question was how someone can use them together to select/prioritize the missense variants for their own data sets (e.g. right now for our dataset, I am using a cut-off of at least 3 out of 6 prediction tools (if at least 3 of them would show the damaging impact) to select a missense variant, but that seems to be subjective; someone else may prefer to select the cut off of 4 out of 6)?

ADD REPLY • link 3.6 years ago by Apurba ▴ 10

0

Entering edit mode

You can use it as 3 of 6, or 4 of 6, or 5 of 6, or 1 of 6, and still be right. It is the question of the accuracy of your threshold. To assess the accuracy (Precision ad Recall) you can't avoid a proper statistical approach with positive and negative examples.

ADD REPLY • link 3.6 years ago by German.M.Demidov ★ 2.9k