Question

Optimal Over Representation Analysis DE genes foldFC threshold?

1

Entering edit mode

16 months ago

Manuel Sokolov Ravasqueira ▴ 110

Regarding Over Representation analysis over identification of DGE genes, firstly I have a set of 1550 genes resulting of RNA Seq. Given this number decided to do ORA instead of GSEA.

After choosing ORA, I have to pass to enrichGO a filtered list according to the fold change:

genes <- names(gene_list[abs(gene_list)> 2])

go_enrich <- enrichGO(gene = genes,
                  universe = gene_list,
                  OrgDb = org.Hs.eg.db,
                  keyType="SYMBOL",
                  ont = "ALL",
                  pAdjustMethod = "fdr",
                  pvalueCutoff = 0.01,
                  qvalueCutoff = 0.05,
                  readable = TRUE)

In this situation the threshold is > 2 however some researchers use value of 1. What is the optimal solution for accurate results? What are the best practices for deciding this number?

Best Regards

ClusterProfiler ORA EnrichGo DGE • 762 views

ADD COMMENT • link updated 16 months ago by LauferVA 4.5k • written 16 months ago by Manuel Sokolov Ravasqueira ▴ 110

3

Entering edit mode

Personally, since we are looking for a statistical enrichment, I tend not to use no, or only a very mild lfc filter (e.g. 0.2-0.5) if the purpose of the gene list is enrichment analysis. If use a different filter of the DE genes were the final product of the analysis themselves

ADD REPLY • link 16 months ago by i.sudbery 20k

1

Entering edit mode

this is a really great point i didnt think about the first time i read through this.

i agree - in this case - at least at first - you'd be well advised not to filter. you can bring one in later on if for some reason it seems it could help

ADD REPLY • link 16 months ago by LauferVA 4.5k

1

Entering edit mode

there is no correct answer to this question, apart from what is meaningful to the experimenter. generally, if there are results published by others on the same phenotype, you could try a variety of values between 1 and 2, and see what recapitulates results you trust published by others...

one could also say, there is no correct answer, there is just what maximizes statistical power, but in this case without knowing more, saying what would maximize statistical power is likewise inaccessible.

ADD REPLY • link 16 months ago by LauferVA 4.5k