Hi,
I have a list of genes like below and interested in looking at pathways these genes are involved in. The genes are in a dataframe eg
gene_id gene_name
ENSG00000128274 A4GALT
ENSG00000250420 AACSP1
ENSG00000114771 AADAC
ENSG00000197953 AADACL2
ENSG00000261846 AADACL2
ENSG00000188984 AADACL3
ENSG00000240602 AADACP1
ENSG00000109576 AADAT
ENSG00000129673 AANAT
ENSG00000090861 AARS
ENSG00000124608 AARS2
ENSG00000008311 AASS
ENSG00000154263 ABCA10
ENSG00000179869 ABCA13
ENSG00000238098 ABCA17P
ENSG00000154262 ABCA6
ENSG00000141338 ABCA8
ENSG00000154258 ABCA9
ENSG00000231749 ABCA9-AS1
ENSG00000073734 ABCB11
ENSG00000276582 ABCB11
ENSG00000004846 ABCB5
ENSG00000103222 ABCC1
ENSG00000278183 ABCC1
ENSG00000124574 ABCC10
ENSG00000114770 ABCC5
ENSG00000091262 ABCC6
ENSG00000275331 ABCC6
ENSG00000256340 ABCC6P1
ENSG00000069431 ABCC9
ENSG00000173208 ABCD2
ENSG00000119688 ABCD4
ENSG00000204574 ABCF1
ENSG00000206490 ABCF1
ENSG00000225989 ABCF1
ENSG00000231129 ABCF1
ENSG00000232169 ABCF1
ENSG00000236149 ABCF1
ENSG00000236342 ABCF1
ENSG00000033050 ABCF2
ENSG00000143921 ABCG8
ENSG00000131969 ABHD12B
ENSG00000248487 ABHD14A
I used enrichGO
from clusterProfiler
R package for pathway analysis.
eg_go <- enrichGO(gene = eg$gene_id,
keyType = "ENSEMBL",
OrgDb = org.Hs.eg.db,
ont = "ALL",
pAdjustMethod = "BH",
pvalueCutoff = 0.05,
readable = TRUE)
ONTOLOGY ID
GO:0099133 BP GO:0099133
GO:0006869 BP GO:0006869
GO:0010876 BP GO:0010876
GO:0042908 BP GO:0042908
GO:0015698 BP GO:0015698
GO:0006855 BP GO:0006855
Description GeneRatio
GO:0099133 ATP hydrolysis coupled anion transmembrane transport 7/32
GO:0006869 lipid transport 11/32
GO:0010876 lipid localization 11/32
GO:0042908 xenobiotic transport 4/32
GO:0015698 inorganic anion transport 7/32
GO:0006855 drug transmembrane transport 5/32
BgRatio pvalue p.adjust qvalue
GO:0099133 13/20505 1.899710e-17 5.547153e-15 3.399481e-15
GO:0006869 384/20505 7.814823e-12 1.140964e-09 6.992210e-10
GO:0010876 427/20505 2.446467e-11 2.381228e-09 1.459296e-09
GO:0042908 13/20505 3.457419e-09 2.523916e-07 1.546740e-07
GO:0015698 191/20505 1.506352e-08 8.797093e-07 5.391153e-07
GO:0006855 91/20505 2.822040e-07 1.373393e-05 8.416610e-06
geneID
GO:0099133 ABCC1/ABCC1/ABCC10/ABCC5/ABCC6/ABCC6/ABCC9
GO:0006869 ABCA10/ABCA13/ABCA6/ABCA8/ABCA9/ABCB11/ABCB11/ABCC1/ABCC1/ABCD2/ABCG8
GO:0010876 ABCA10/ABCA13/ABCA6/ABCA8/ABCA9/ABCB11/ABCB11/ABCC1/ABCC1/ABCD2/ABCG8
GO:0042908 ABCA8/ABCB5/ABCC1/ABCC1
GO:0015698 ABCC1/ABCC1/ABCC10/ABCC5/ABCC6/ABCC6/ABCC9
GO:0006855 ABCA8/ABCB5/ABCC1/ABCC1/ABCG8
Count
GO:0099133 7
GO:0006869 11
GO:0010876 11
GO:0042908 4
GO:0015698 7
GO:0006855 5
Here I selected the pathways based on value. But I have seen many posts using qvalue. Should I select significant enriched pathways on value or adjusted value or qvalue? Which is the right one?
thanq
Normally people are satisfied with an adjusted p < 0.05.
but I don't see any argument for adjusted value for
enrichGO
. I could see only pvalueCutoff and qvalueCutoff arguments. Which one should I use now?The Q value can be regarded as the adjusted P value (although, a Professor of Statistics would find a way to state that they are not the same).
In any case, all you have to do is subset the results data frame, there is no need for a cut-off option. Just do:
Oh yes. thanq. And I'm a bit confused with this tutorial about
gprofileR
. [https://hbctraining.github.io/Training-modules/DGE-functional-analysis/lessons/gProfileR_REVIGO.html] In this they took pvalue cutoff. The results gave onlypvalue
column. Could you please tell me whether this is right or wrong.