Hi,
I am doing a functional analysis of a differential expression analysis on a non-model organism. I have one experimental group and one control group with four replicates each.
The number of differentially expressed genes I get is around 2000 (FC > |1| and adjusted p-value < 0.05) and they have about 3500 GO terms associated with them.
When I do an enricher GO over-representation analysis, I use the following code:
enricher_10 <- enricher(diff_exp_genes_names,
pvalueCutoff = 0.05,
pAdjustMethod = "BH",
universe = NULL,
minGSSize = 10,
qvalueCutoff = 0.2,
gson = NULL,
TERM2GENE=go_data,
TERM2NAME=NA)
The diff_exp_genes_names is a vector of custom gene IDs:
chr [1:2021] "Pr_t00014577-TA" "Pr_t00019548-TA" "Pr_t00018659-TA" ..
and my TERM2GENE file has this general format:
GOs query_name
<char> <char>
1: GO:0005275 Pr_t00014577-TA
2: GO:0005222 Pr_t00018659-TA
3: GO:0005323 Pr_t00018659-TA
Here I get no statistically significant GO terms after adjustment for multiple testing, but many are statistically significant before correction, so the actual analysis is carried out.
When I do GSEA with this code:
GSEA_results_fgsea_minGSSize_10 <- GSEA(gene_list_final,
exponent = 1,
minGSSize = 10,
maxGSSize = 500,
eps = 1e-10,
pvalueCutoff = 0.05,
pAdjustMethod = "BH",
gson = NULL,
TERM2GENE=go_data,
TERM2NAME = NA,
verbose = TRUE,
seed = FALSE,
by = "fgsea")
I get plenty of statistically significant results.
The gene_list_final looks like this:
Named num [1:12510] 7.35 6.99 6.7 6.66 6.6 ...
- attr(*, "names")= chr [1:12510] "Pr_t00018699-TA" "Pr_t00007428-TA" "Pr_t00004980-TA" "Pr_t00004978-TA" ...
Question 1: Have I done the GO over-representation analysis incorrect?
Question 2: Does the GO over-representation analysis not produce any statistically significant results because the number of GO terms are too many?
Question 3: Should I use a more stringent cut-off for FC and adjusted p value to get a list of differentially expressed genes that are easier to analyze and handle? How do you analyze such large lists of genes (circa 2000 genes)?