Hi,
My dictionary says, "benchmark means a level of quality that can be used as a standard when comparing other things". So a benchmark method is a standard method which can be used when you wanna compare other methods.
As I have seen, some packages such as "edgeR", "DESeq", "EBSeq", etc. are benchmark methods in detection of DE genes. I wish to know that when you don't access simulated data and you wanna compare two methods, can you assume that "edgeR" or any other benchmark method is standard and can be used to decide which one of your methods is better? I mean, can we compare two models with "edgeR" and choose the closest one to edgeR as a better model?
I also wish to know that how do you decide which model is better when you have a real dataset? the percentage of the overlapped genes found by each method with edgeR? I know another thing we can do is to check how many of the head genes (index genes) or house keeping genes are found by each model. What more can you recommend please?
Thank you.
Thanks a lot for your comprehensive explanation. My concern is that I have to compare two statistical models for my thesis and since I don't have the house keeping genes, I can't decide which model is working better. That's why I suggested using a benchmark method as a standard to compare the other two models. If I could simulate data, that would be easy to compare the models, but I don't know how to simulate data.
If I understand you well, you said I can find the common DE genes in two or more benchmark methods and then use those genes as the index genes and check how many of them are detected by each one of my models, right? It seems good.
This is an idea, but is quite elaborate and perhaps not paying off more than other ways. What I would do is using DESeq2, EdgeR and Limma (just 3 examples, use what you think is best!) trying to use the same parameters in each so to not alter the outcome (that is, same alpha threshold, same null hypothesis, same same). The intersection of the three programs output is most likely to be a set of true positives!
A second level of positives is represented by the intersections between couples of programs. Like, DESEq2 and EdgeR identify a set of common genes which are not identified by Limma but they still are confirmed by both so you could keep them (if the adjusted p-value is significantly low).
A third level is composed of the genes that only a program identified. Be careful here: they might be the most significant ones! Some programs tend to overnormalize and others to undernormalize, and this can work out differently with different data. This means that a set of genes that only (f.e.) Limma identifies might still contain true positives. With very low adjusted p-values, you might as well keep some of them even if you can't really trust them as glittering gold.
At the end, perhaps sieve your list of identified differentially expressed genes by converting the counts to TPM and seeing if the values you get are reliable. http://www.rna-seqblog.com/rpkm-fpkm-and-tpm-clearly-explained/
That's a very good idea. I really liked it and I'll try it. Thank you a lot