Hello Comunity, Using DeSeq2, I have obtained the list of genes which are differentially expressed across my samples. This analysis finished with 2 files: (1) ent_gene (or gene_1) and (2) ent_uni (or uni_1). The first one corresponds with the list of genes diff. expressed, and the second one is with the full list of genes (removing N/A). My transcriptome and gene information is from NCBI. I am trying to use ClusterProfiler. To start, my data are not coming from esemble and the species that I am studying is the horse. I read here that you need to generate your own OrgDb using
>AnnotationHub.library(AnnotationHub)
>hub = AnnotationHub()
>query(hub, c("equus caballus", "orgdb"))
>eqCab = hub[["AH94124"]]
>eqCab
Output:
OrgDb object:
| DBSCHEMAVERSION: 2.1
| DBSCHEMA: NOSCHEMA_DB
| ORGANISM: Equus caballus
| SPECIES: Equus caballus
| CENTRALID: GID
| Taxonomy ID: 9796
| Db type: OrgDb
| Supporting package: AnnotationDbi
Then I have tried to run enrichGo:
ego = enrichGO(gene = ent_gene,
OrgDb = eqCab,
keyType = 'SYMBOL',
ont = "BP",
universe = ent_uni)
My gene ID corresponds with the SYMBOLs. My FIRST issue comes here, my output only has 8 Description biological processes. I know that some of the genes sigf. expressed have got other biological functions that is not represented here. I am wondering if the issue is coming from my script, but my current knowledge does not allow me to see where.
ID Description GeneRatio BgRatio pvalue
GO:0007399 GO:0007399 nervous system development 10/20 347/3657 4.046320e-06
GO:0007605 GO:0007605 sensory perception of sound 3/20 17/3657 9.063503e-05
GO:0048699 GO:0048699 generation of neurons 7/20 217/3657 9.382188e-05
GO:0050954 GO:0050954 sensory perception of mechanical stimulus 3/20 18/3657 1.083826e-04
GO:0007600 GO:0007600 sensory perception 4/20 50/3657 1.275744e-04
GO:0022008 GO:0022008 neurogenesis 7/20 235/3657 1.557383e-04
GO:0030182 GO:0030182 neuron differentiation 6/20 189/3657 3.712655e-04
GO:1901888 GO:1901888 regulation of cell junction assembly 3/20 32/3657 6.273457e-04
p.adjust qvalue geneID Count
GO:0007399 0.002561321 0.002427792 APOD/ATP2B2/COL2A1/EFNB2/GPM6B/LAMC2/LRRN3/MME/NTRK2/POU3F4 10
GO:0007605 0.016150923 0.015308932 ATP2B2/COL2A1/POU3F4 3
GO:0048699 0.016150923 0.015308932 APOD/ATP2B2/EFNB2/GPM6B/LAMC2/MME/POU3F4 7
GO:0050954 0.016150923 0.015308932 ATP2B2/COL2A1/POU3F4 3
GO:0007600 0.016150923 0.015308932 ATP2B2/COL2A1/MME/POU3F4 4
GO:0022008 0.016430390 0.015573830 APOD/ATP2B2/EFNB2/GPM6B/LAMC2/MME/POU3F4 7
GO:0030182 0.033573006 0.031822754 APOD/ATP2B2/EFNB2/GPM6B/LAMC2/POU3F4 6
GO:1901888 0.049638726 0.047050925 APOD/GPM6B/LRRN3 3
I must say that I got 117 genes diff. expressed. If I do not misunderstand this, the third column is telling how many genes of my pull of genes participate in that biological process. I do not understand how is possible that I have got x/20 when I had 117. Why is enrichGO removing the rest of my genes from the analysis??
Then, I have tried to run enrichkegg to get the list of genes participating in the same biological process. I run the previous command, changing the sub and it gave me the next error:
ekg = enrichKEGG(gene = ent_gene,
organism = 'ecb',
keyType ='ncbi-geneid',
universe = ent_uni)
Ouput: --> No gene can be mapped.... --> Expected input gene ID: 100147313,100057998,100072967,100056690,100065115,100055111 --> return NULL...
That makes sense because I select ncbi-geneid like KeyType and I have been using symbols. So I generated my gene and universe file with ncbi-geneid. NCBI gives you the ncbi-geneid with prex: "GENEID: 10478648", so I had to remove it from my vector using (str_remove_all).
I tried to run enrichkegg with my new vector of genes:
library(KEGGREST)
ekg = enrichKEGG(gene = gene_1,
organism = 'ecb',
keyType ='ncbi-geneid',
universe = uni_1)
However, my output is completely empty:
> ekg
#
# over-representation test
#
#...@organism ecb
#...@ontology KEGG
#...@keytype ncbi-geneid
#...@gene chr [1:112] "100034030" "100034058" "100034081" "100050122" "100050231" "100050257" "100050640" ...
#...pvalues adjusted by 'BH' with cutoff <0.05
#...0 enriched terms found
#...Citation
Guangchuang Yu, Li-Gen Wang, Yanyan Han and Qing-Yu He.
clusterProfiler: an R package for comparing biological themes among
gene clusters. OMICS: A Journal of Integrative Biology
2012, 16(5):284-287
> summary(ekg)
[1] ID Description GeneRatio BgRatio pvalue p.adjust qvalue geneID
[9] Count
<0 rows> (or 0-length row.names)
Warning message:
In summary(ekg) :
summary method to convert the object to data.frame is deprecated, please use as.data.frame instead.
I am sure that I am doing something wrong, but I do not know what is it. Could please anybody help me?
Thanks
Did you solve this problem? I am having the same problem. However, It is working for some of my list of genes and not working for others with the same data parameters.