Hello everyone! I'm trying to perform a pathogenicity prediction over a human sample but I'm having some problems. What I have in input is a VCF file containing all variants (on every chromosome) obtained from GATK that I have split in multiple VCF files, in order to perform a detailed analysis on each chromosome. I used VEP web interface (including MutationAssessor, MetaLR and REVEL scores) to predict pathogenicity and filtered the results on missense variants, where the only critical values were found. What I need now is an optimal threshold value for pathogenicity scores in order to filter out all missense variants that have no clinical relevance, in order to check remaining variants details on ClinVar archives. Is there a way to do that? Thanks in advance for your advice.
PS: Feel free to improve my approach to the experiment as well. I have recently entered this field and I do not have strong basis from a statistical point of view.
A quick comment: the threshold values will differ for each pathogenicity / functionality prediction tool. You may simply have to refer to the documentation for each. There may also be a separate output tag from VEP where these scores have already been categorised for you.
If you have a few weeks of time to spare, you could also check the values for known pathogenic / functional mutations, and define thresholds based on these.
VEP should also output its own prediction as HIGH, MODERATE, LOW, or MODIFIER - please see: What is the VEP Impact column?
I have just done a recent WGS feature selection and selected based on HIGH and MODERATE.
Kevin