Hello everyone. In a recent article published by Dong et al. in Human Molecular Genetics ("Comparison and integration of deleteriousness prediction methods for nonsynonymous SNVs in whole exome sequencing studies") the authors have managed to collect a big dataset of deleterious and neutral SNPs. The paper says "Training dataset is composed of 14191 deleterious mutations as true positive (TP) observations and 22001 neutral mutations as true negative (TN) observations, all based on the Uniprot database". Well, basically that's all they say about how the dataset has been collected. I would highly appreciate any recommendations on how one can query such information on UniProt, as I've found no direct ways of doing it. Thanks in advance.
Thank you for your answer. The main point of the question is about sorting pathogenic variants, e.g. using the information from the "'Pathology and Biotech" field.