How to define gold standard protein interaction? How to construct such a gold standard dataset?
1
0
Entering edit mode
3.1 years ago
bhodai • 0

To my knowledge, protein interactions identified in experiments are usually compared with a reference dataset that is considered as gold standard. And that the gold standard interaction dataset is usually considered as containing true positive and true negative interactions. But I want to know how do we create such a gold standard dataset at the first place?

interaction protein • 1.5k views
ADD COMMENT
2
Entering edit mode
3.1 years ago
Mensur Dlakic ★ 28k

As I think is done in all areas of science, gold standard datasets are created by careful and repeated wet lab experiments on a small scale. In this case that means collecting literature data where various labs have proven interactions in a low-throughput fashion by using two hybrid, pulldowns, co-immunoprecipitations, genetic interactions, TAP-tagging, fluorescence co-localization, etc. When a given interaction is confirmed by multiple labs, using several different methods, and possibly in different organisms, it gets a golden standard status.

ADD COMMENT
0
Entering edit mode

Hi Dr. Dlakic, Thank you for answering. I was thinking more about the process of rigorously defining the term 'gold standard dataset'. I am currently using machine learning algorithms to predict de novo interactions from PPI data and in every subfield of machine learning, there is a proper set of rules that is followed to get to the gold standard (e.g., In natural language processing, the British national corpus is considered a gold standard and they followed a protocol to create the dataset). Since the performance of the machine learning algorithm heavily depends on the quality of the data, I was trying to create the gold standard dataset for my project from scratch. That made me wonder what is the usual procedure to create such datasets for PPI data. I have been looking for relevant literature. I found this paper from 2010 :https://www.researchgate.net/publication/220173162_From_Experimental_Approaches_to_Computational_Techniques_A_Review_on_the_Prediction_of_Protein-Protein_Interactions . But I am still not so sure about how the process works.

ADD REPLY
0
Entering edit mode

My answer is the same after reading your added explanation.

I suggest you find a dataset that has already been used in quality publications, and test it with your own methodology. You are not the first to predict PPI and it would be helpful to others if your method can be compared to others using an existing dataset. If you come up with your own gold standard, it will be debatable whether your own performance is truly an improvement or simply a side-effect of a biased dataset you created.

ADD REPLY
0
Entering edit mode

So, you are suggesting that no set rules are followed by all manually curated gold standard datasets. And it's a process of trial and error?

ADD REPLY
0
Entering edit mode

I am suggesting no such thing. Please read carefully what I wrote, and also read through the papers that have described gold datasets creation.

https://pubmed.ncbi.nlm.nih.gov/14564010/

ADD REPLY

Login before adding your answer.

Traffic: 2678 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6