Hello,
So, I am working on a method. From this method I can get a small subset of differentially expressed genes. I only get around ~100 genes compared to if I don't use any subsetting metjhod, around ~3000. From this result, I compare the enrichment analysis using Enrichr, DAVID, and other similar web based enrichment analysis tools.
My question is, how can I say my subsetting method can extract more important information compare to without subsetting at all? I noticed that the enrichment result from subset give something important related to cancer in the first five or ten row while ordinary method without subsetting produce a lot of result which I'm not really sure whether it is relevant or not with cancer development.
Any suggestion how to test and to prove that my method of subsestting is better? Thank you.
What is your definition of better ? What do you consider more important information ? Once you've defined these, just turn these into numbers and compare the numbers.
I'd be more worried about subtle effects. For example, let's say your subsetting ends up recovering genes with more annotations. This has two effects: one, it will be enriched in cancer genes because cancer genes being the most studied tend to have the most annotations, second, standard enrichment tests based on the hypergeometric distribution are biased towards genes with more annotations.
Well, the definition of better is subjective. For example, from what I have done, I got 80 significant enrichment result if I don't subset the DEG. From these 80, the top rows mostly are just general cell function. The relevant enrichment result that related to cancer is in the middle. But, if I subset the genes, the top rows are related to cancer development. So, I want to say that my subsetting method can make enrichment analysis simple and the output that is relevant is easier to obtained.