Entering edit mode
16 days ago
Haad
•
0
Hi,
I am wondering how to interpret WebGestalt's ORA report. I did the ORA on only the significant genes from the full list (significance criteria: abs(logFC) > 1 and adjusted p-val < 0.05).
I want to know what the terms in this screenshot mean:
I already know that enrichment ratio in this case is 14 / 0.61962 = 22.594. My questions are:
- What is an analyte set and how is it different from a gene set?
- What is the difference between "mapped input" and "gene set"?
- Is the expected value just the number of genes we expect to see in each analyte set?
- What is "overlap"? In other words, back to my second question, what is "mapped input"?
Any insight would be greatly appreciated.
I am also wondering what this highlighted sentence means in the WebGestalt job summary for ORA:
EDIT: How is the expected value calculated for ORA here?
I think it means that of the 116 IDs you supplied, 52 are in pathway_KEGG. Check out the number of entries in "Protein digestion and absorption" - that might be 103 in size.
EDIT: hsa04974 has 105 genes right now so probably, the database version that webgestalt uses could have 103. I know (because I used to work in Zhang lab) that webgestalt operates (or at least used to operate) on manually downloaded database files so that could explain the discrepancy.
See my screen shot with the Venn diagram above. The 14 overlapped genes were between the gene set for protein digestion and absorption (103 genes) and the "mapped input" (52 genes, out of 116 genes I supplied). My question is why is the size of mapped input not 116?
Are you sure that all 116 genes were found in KEGG?
Ah, I see what you mean. Out of 116 genes I supplied only 52 were found in KEGG. I supplied ensembl gene ids. I wonder why the number of genes found in KEGG is so low. Does this mean the results of ORA are as accurate as they would be compared to if all 116 genes were found in KEGG?