Question

ORA analysis (over-representation analysis) : different package different padjusted and qvalue

0

Entering edit mode

20 months ago

camillab. ▴ 160

Hi!

Apologies for the stupid question! but I think I am doing something wrong but i do not understand what. I would like to do ORA analysis on bulk-RNAseq dataset so I tried both clusterProfiler and also genekitr.` However, despite getting the same terms, but I have different p-adjusted value and q-value (practically with clusterprofiler none of the term have a p.adjusted or value <= 0.01 whereas wit the genekitr I have few). why is that? Do I do something wrong with my code?

for clusterProfiler:

# we want the log2 fold change 
original_gene_list <- d$log2FC # on the unfiltered dataset

# name the vector
names(original_gene_list) <- d$ENSEMBL

# omit any NA values 
gene_list<-na.omit(original_gene_list)

# sort the list in decreasing order (required for clusterProfiler)
gene_list = sort(gene_list, decreasing = TRUE)

# Exctract significant results (padj < 0.05)
sig_genes_df = subset(d, p_value <= 0.05)

# From significant results, we want to filter on log2fold change
genes <- sig_genes_df$log2FC

# Name the vector
names(genes) <- sig_genes_df$ENSEMBL

# omit NA values
genes <- na.omit(genes)

# filter on min log2fold change (log2FoldChange > 1.5)
genes <- names(genes)[abs(genes) > 1.5]

go_enrich <- enrichGO(gene = genes,
                      universe = names(gene_list),
                      OrgDb = org.Hs.eg.db, 
                      keyType = "ENSEMBL",
                      readable = T,
                      ont = "BP",
                      pvalueCutoff = 0.05, 
                      qvalueCutoff = 0.01)

and for genekitr i have used this code (section 1.7 :

# 1st step: get input IDs
id <- c(dpg6$Associated.Gene.Name) # DEGs

# 2nd step: get gene set 
gs2 <- geneset::getGO(org = "human",ont = "bp") # biological process

#analysis
ego2 <- genORA(id,
               geneset = gs2,
               universe = names (d$ENSEMBL), # bakground aka dataset unfiltered
               p_cutoff = 0.05,
               q_cutoff = 0.01) # bp

What I am doing wrong?

Thank you very much for your help!

Camilla

r p-value ORA q-value • 1.5k views

ADD COMMENT • link updated 19 months ago by love-bioinfo ▴ 20 • written 20 months ago by camillab. ▴ 160

1

Entering edit mode

20 months ago

Istvan Albert 101k

Don't worry about it, those p-value and q-values in these tools are mostly "make-believe".

The problems with ORA analyses are so profound and fundamental that the p-values are almost meaningless.

Think of them as educated guesses and opinions.

ADD COMMENT • link 20 months ago by Istvan Albert 101k

score 1 · Accepted Answer · 2023-03-30

1

Entering edit mode

20 months ago

chaco001 ▴ 40

This could be due to a few different things.

It isn't completely clear from your example whether id and genes are the same list, which they would need to be to expect the same results.
Similarly, it seems like the universes given are slightly different, which affects the hypergeometric test.
It could be that the GO-BP databases are different versions.
Finally, the docs for kitr (while a bit confusingly written) also show that the two approaches yield different results. I'm not sure I'm parsing their explanation fully but it seems to be due to a slight difference in the genes used for the test. https://www.genekitr.fun/ora-analysis-1.html#ora-tools-comparsion

Unrelated, while I have some clients that ask for ORA, I strongly prefer GSEA, because I don't have to do things like choose thresholds. Good luck!

ADD COMMENT • link 20 months ago by chaco001 ▴ 40

0

Entering edit mode

Thank you! the universe and the genes are, I just used different names because the scripts were written in different times! However I re-run both codes using the same gene/names and the results is the same as before (different p and q values). How do I choose which method? I don`t want to choose genekitr just because it gives me more terms statistically significant that would match my theory if it is not the right approach!

ADD REPLY • link 20 months ago by camillab. ▴ 160

5

Entering edit mode

Hi, I'm the author of genekitr. Thanks for your feedback. Regarding your question, firstly, both enrichGO and genORA are based on the enricher function for statistical calculations. As @chaco001 said, the main difference lies in the input annotation of terms used, which of course is not limited to GO. ClusterProfiler mainly adopts the OrgDb method, for example, the function uses org.Hs.eg.db to obtain geneset, while genekitr integrates Panther db (v17.0) and OrgDb.