Optimal Over Representation Analysis DE genes foldFC threshold?
0
1
Entering edit mode
16 months ago

Regarding Over Representation analysis over identification of DGE genes, firstly I have a set of 1550 genes resulting of RNA Seq. Given this number decided to do ORA instead of GSEA.

After choosing ORA, I have to pass to enrichGO a filtered list according to the fold change:

genes <- names(gene_list[abs(gene_list)> 2])

go_enrich <- enrichGO(gene = genes,
                  universe = gene_list,
                  OrgDb = org.Hs.eg.db,
                  keyType="SYMBOL",
                  ont = "ALL",
                  pAdjustMethod = "fdr",
                  pvalueCutoff = 0.01,
                  qvalueCutoff = 0.05,
                  readable = TRUE)

In this situation the threshold is > 2 however some researchers use value of 1. What is the optimal solution for accurate results? What are the best practices for deciding this number?

Best Regards

ClusterProfiler ORA EnrichGo DGE • 762 views
ADD COMMENT
3
Entering edit mode

Personally, since we are looking for a statistical enrichment, I tend not to use no, or only a very mild lfc filter (e.g. 0.2-0.5) if the purpose of the gene list is enrichment analysis. If use a different filter of the DE genes were the final product of the analysis themselves

ADD REPLY
1
Entering edit mode

this is a really great point i didnt think about the first time i read through this.

i agree - in this case - at least at first - you'd be well advised not to filter. you can bring one in later on if for some reason it seems it could help

ADD REPLY
1
Entering edit mode

there is no correct answer to this question, apart from what is meaningful to the experimenter. generally, if there are results published by others on the same phenotype, you could try a variety of values between 1 and 2, and see what recapitulates results you trust published by others...

one could also say, there is no correct answer, there is just what maximizes statistical power, but in this case without knowing more, saying what would maximize statistical power is likewise inaccessible.

ADD REPLY

Login before adding your answer.

Traffic: 2850 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6