I'm doing GO enrichment analysis using clusterProfiler. I used minGSSize filter (10, seems like it's default though) to restrict gene size, but I got enrichment terms with less than 10 elements.
I can manually filter them out. But, I'm concerning whether it is a proper way of handling this, and whether I get wrong statistical numbers (pvalue etc.) because of filtering failure or something. I have more cases but posted one case with example code below.
require(clusterProfiler)
require(org.At.tair.db)
# sorry for this annoying input list, but my original input list is longer than this.
GOI <- c("AT1G02920", "AT1G02930", "AT1G03620", "AT1G03760", "AT1G03880", "AT1G05675", "AT1G11610", "AT1G12790", "AT1G13520", "AT1G13990", "AT1G14540", "AT1G14950", "AT1G15125", "AT1G15670", "AT1G15920", "AT1G16030", "AT1G19250", "AT1G21110", "AT1G21120", "AT1G24100", "AT1G24330", "AT1G26250", "AT1G26380", "AT1G26390", "AT1G26410", "AT1G26420", "AT1G27565", "AT1G44130", "AT1G47540", "AT1G49000", "AT1G51920", "AT1G53950", "AT1G56240", "AT1G56250", "AT1G61970", "AT1G62130", "AT1G63860", "AT1G64400", "AT1G65486", "AT1G65845", "AT1G66500", "AT1G66700", "AT1G66920", "AT1G67270", "AT1G68230", "AT1G68862", "AT1G69280", "AT1G69920", "AT1G69930", "AT1G70140", "AT1G72060", "AT1G72900", "AT1G74360", "AT1G74590", "AT1G75000", "AT1G75335", "AT1G75830", "AT1G80840", "AT2G02010", "AT2G02930", "AT2G07698", "AT2G07719", "AT2G15220", "AT2G17040", "AT2G18370", "AT2G18660", "AT2G19910", "AT2G24600", "AT2G25470", "AT2G27389", "AT2G28450", "AT2G29330", "AT2G32190", "AT2G32830", "AT2G33580", "AT2G35980", "AT2G36950", "AT2G38860", "AT2G38870", "AT2G39210", "AT2G39350", "AT2G39400", "AT2G41010", "AT2G41280", "AT2G43000", "AT2G44370", "AT2G46430", "AT2G46650", "AT3G09405", "AT3G15340", "AT3G15518", "AT3G15590", "AT3G16020", "AT3G16030", "AT3G16530", "AT3G19470", "AT3G22800", "AT3G23150", "AT3G23250", "AT3G23570", "AT3G25900", "AT3G26170", "AT3G26210", "AT3G26830", "AT3G27870", "AT3G29100", "AT3G45420", "AT3G51450", "AT3G53160", "AT3G54150", "AT3G58930", "AT4G01010", "AT4G02280", "AT4G02520")
enrichGO_test <-
enrichGO(GOI,
maxGSSize = 500,
minGSSize = 10,
OrgDb = org.At.tair.db,
ont = 'BP',
keyType = 'TAIR',
pvalueCutoff = 0.01,
pAdjustMethod = 'BH',
qvalueCutoff = 0.01
)
# plot image view
dotplot(enrichGO_test)
# table view
View(enrichGO_test@result)
enrichGO_test@result[enrichGO_test@result$ID == 'GO:0071456',]
In the dotplot(enrichGO_test) result above, GO(GO:0071456) with description 'cellular response to hypoxia' plotted, but GeneRatio for the term has '4/88'.
Versions:
R version 3.6.1 (2019-07-05) clusterProfiler version 3.12.0 RStudio 1,2,1335 Bioconductor version 3.9 (BiocManager 1.30.8)
Thank you for correcting me. What I had mentioned above seems to be the 'k' then, and what I wanted in the original question seems like the gsfilter() function as in the link below. https://github.com/YuLab-SMU/clusterProfiler/issues/46
So, I can filter my result as follows.
Looks great! Thanks again.