Hi everyone!
Briefly: I am working with GRN prediction tools and have Human Cell Cycle data. I would like to find a Gold Standard Dataset to evaluate my method. Could someone tell me how to obtain that?
What was done so far: I downloaded all human cell cycle related pathways from KEGG, extracted every interaction, and parsed it into one table.
The hurdle: Using this Kegg-derived dataset, the f-scores for the state-of-the-art tools are 0. I then pulled all phase "S" and phase "G2M" marker genes from the package Seurat, and checked how many of those are included in my "gold standard" data. the answer is: 13%. (As a bit of relief, actually my dataset I used for GRN inference contains 46% of the Seurat cell cycle genes to begin with.)
Another hurdle: In my dataset, I have 3 batches. There also seems to be little agreement between the networks inferred from different batches, even if the same tool was used. Is this normal?
The question(s): How could I obtain a gold standard dataset to compare my inference to? Is it normal that not all Seurat cell cycle genes are in a supposedly gold standard dataset? Is it normal to have a high inconsistency between batches, in terms of inferred network, does anyone have such an experience?
Thank you for any help!