Entering edit mode
4.1 years ago
karlaarz
▴
110
Hello,
I have the following human gene list:
CCT3 ABCA4 HEG1 PPARGC1B ADD1 CEP85 SLC1A4 DUSP10 PLAGL2 UBE2G2 NTRK2 PPIP5K1 DDB1 PRPH2 OAZ2 PEA15 ICMT KDM4A NCOA6 ZNF609 AKAP1 SYNE3 CAMSAP1 POLE4 ZDHHC5 ANGEL1 KCNJ14 NDUFA8 SIPA1L2 BTD CCT7 ANO2
I did the enrichment analysis using clusterProfiler
gobp <-enrichGO(glist,OrgDb=org.Hs.eg.db, ont = "BP", pAdjustMethod = "fdr", keyType = 'SYMBOL', pvalueCutoff = 0.05)
head(gobp)
ID Description GeneRatio BgRatio pvalue p.adjust qvalue geneID Count
GO:1903405 GO:1903405 protein localization to nuclear body 2/36 10/18670 0.0001611010 0.02351193 0.0197561 CCT3/CCT7 2
GO:1904851 GO:1904851 positive regulation of establishment of protein localization to telomere 2/36 10/18670 0.0001611010 0.02351193 0.0197561 CCT3/CCT7 2
GO:1904867 GO:1904867 protein localization to Cajal body 2/36 10/18670 0.0001611010 0.02351193 0.0197561 CCT3/CCT7 2
GO:0060249 GO:0060249 anatomical structure homeostasis 6/36 439/18670 0.0001748712 0.02351193 0.0197561 ABCA4/CCT3/CCT7/POLE4/ADD1/PPARGC1B 6
GO:0070203 GO:0070203 regulation of establishment of protein localization to telomere 2/36 11/18670 0.0001966624 0.02351193 0.0197561 CCT3/CCT7 2
GO:0070202 GO:0070202 regulation of establishment of protein localization to chromosome 2/36 12/18670 0.0002357086 0.02351193 0.0197561 CCT3/CCT7 2
However, if I do the same analysis using Panther or David, I don't find statistically relevant results as all FDR values are equal to 1. I know that cluster profiler calculates FDR values following the Storey, 2002 paper, but I wouldn't expect to see a big difference.
1) Why does clusterProfiler show different FDR values from Panther and David?
Thanks!
I don't think the tools you listed use the same statistical test. I believe DAVID uses a modified fishers exact test, PANTHER uses a binomial test, and clusterProfiler uses the hypergeometric test. This would result in different p-values for each tool, and of course different FDR corrected p-values.
There are other factors to consider too, such as the versions of the GO ontology database used, and the "universe" of genes used in the statistical calculations (the total number of genes that are considered from the genome).