Hi, just want to know if I have obtained KEGG annotation and mapped to pathway using KEGG mapper (http://www.genome.jp/kegg/mapper.html). Due to my lack of skill in doing the gene set enrichment analysis, I have think of a simpler way but not sure whether it is correct?
I simply tabulate how many genes in the mapped to particular gene set (pathway) for both background (whole transcriptome assembly) and sample (DEGs). And also tabulate total genes annotated for both background (whole transcriptome assembly) and sample (DEGs).
Then I will perform using hypergeometric test tool available online (https://www.geneprof.org/GeneProf/tools/hypergeometric.jsp) for every individual pathway.
Is this the correct way? Any simpler way to do this? Or if I am wrong, please guide me to the correct path.
What you're doing is not entirely clear to me. If you've got a contingency table, the standard way to test for enrichment is using Fisher's exact test.
I agree with @Jean-Karim Heriche. Fisher is usually the recommended option. I had great satisfactions using the enrichKEGG function of the R Bioconductor package clusterProfiler.
Thanks for reply, will clusterProfiler work for non-model species? Can I choose to use my own whole assembly as background?
For simplicity, let say I have 10,000 genes assembled. 2,000 were annotated to at least one pathway (for background). I have 1,000 DEGs, 180 annotated to at least one pathway. For 'pathway A', 100 genes were found in background and 10 were found in DEGs.
So my contingency table should be something like A (10, 180-10, 100, 2000-100) or B (10, 1000-10, 100, 10000-100). I am not sure whether to include the genes that are not annotated to any pathway.
I'm looking for the answer to a similar question. Were you able to use your own assembly as background in clusterProfiler?
How did you solve your problem?