Output:

Question

GO enrichment and KEGG output empty

1

Entering edit mode

3.2 years ago

estpallag ▴ 10

Hello Comunity, Using DeSeq2, I have obtained the list of genes which are differentially expressed across my samples. This analysis finished with 2 files: (1) ent_gene (or gene_1) and (2) ent_uni (or uni_1). The first one corresponds with the list of genes diff. expressed, and the second one is with the full list of genes (removing N/A). My transcriptome and gene information is from NCBI. I am trying to use ClusterProfiler. To start, my data are not coming from esemble and the species that I am studying is the horse. I read here that you need to generate your own OrgDb using

>AnnotationHub.library(AnnotationHub)
>hub = AnnotationHub()
>query(hub, c("equus caballus", "orgdb"))
>eqCab = hub[["AH94124"]]
>eqCab

Output:

OrgDb object:
| DBSCHEMAVERSION: 2.1
| DBSCHEMA: NOSCHEMA_DB
| ORGANISM: Equus caballus
| SPECIES: Equus caballus
| CENTRALID: GID
| Taxonomy ID: 9796
| Db type: OrgDb
| Supporting package: AnnotationDbi

Then I have tried to run enrichGo:

ego = enrichGO(gene = ent_gene,
               OrgDb = eqCab,
               keyType = 'SYMBOL',
               ont = "BP",
               universe = ent_uni)

My gene ID corresponds with the SYMBOLs. My FIRST issue comes here, my output only has 8 Description biological processes. I know that some of the genes sigf. expressed have got other biological functions that is not represented here. I am wondering if the issue is coming from my script, but my current knowledge does not allow me to see where.

                ID                               Description GeneRatio  BgRatio       pvalue
GO:0007399 GO:0007399                nervous system development     10/20 347/3657 4.046320e-06
GO:0007605 GO:0007605               sensory perception of sound      3/20  17/3657 9.063503e-05
GO:0048699 GO:0048699                     generation of neurons      7/20 217/3657 9.382188e-05
GO:0050954 GO:0050954 sensory perception of mechanical stimulus      3/20  18/3657 1.083826e-04
GO:0007600 GO:0007600                        sensory perception      4/20  50/3657 1.275744e-04
GO:0022008 GO:0022008                              neurogenesis      7/20 235/3657 1.557383e-04
GO:0030182 GO:0030182                    neuron differentiation      6/20 189/3657 3.712655e-04
GO:1901888 GO:1901888      regulation of cell junction assembly      3/20  32/3657 6.273457e-04
              p.adjust      qvalue                                                      geneID Count
GO:0007399 0.002561321 0.002427792 APOD/ATP2B2/COL2A1/EFNB2/GPM6B/LAMC2/LRRN3/MME/NTRK2/POU3F4    10
GO:0007605 0.016150923 0.015308932                                        ATP2B2/COL2A1/POU3F4     3
GO:0048699 0.016150923 0.015308932                    APOD/ATP2B2/EFNB2/GPM6B/LAMC2/MME/POU3F4     7
GO:0050954 0.016150923 0.015308932                                        ATP2B2/COL2A1/POU3F4     3
GO:0007600 0.016150923 0.015308932                                    ATP2B2/COL2A1/MME/POU3F4     4
GO:0022008 0.016430390 0.015573830                    APOD/ATP2B2/EFNB2/GPM6B/LAMC2/MME/POU3F4     7
GO:0030182 0.033573006 0.031822754                        APOD/ATP2B2/EFNB2/GPM6B/LAMC2/POU3F4     6
GO:1901888 0.049638726 0.047050925                                            APOD/GPM6B/LRRN3     3

I must say that I got 117 genes diff. expressed. If I do not misunderstand this, the third column is telling how many genes of my pull of genes participate in that biological process. I do not understand how is possible that I have got x/20 when I had 117. Why is enrichGO removing the rest of my genes from the analysis??

Then, I have tried to run enrichkegg to get the list of genes participating in the same biological process. I run the previous command, changing the sub and it gave me the next error:

ekg = enrichKEGG(gene = ent_gene,
        organism = 'ecb',
        keyType ='ncbi-geneid',
        universe = ent_uni)

Ouput: --> No gene can be mapped.... --> Expected input gene ID: 100147313,100057998,100072967,100056690,100065115,100055111 --> return NULL...

That makes sense because I select ncbi-geneid like KeyType and I have been using symbols. So I generated my gene and universe file with ncbi-geneid. NCBI gives you the ncbi-geneid with prex: "GENEID: 10478648", so I had to remove it from my vector using (str_remove_all).

I tried to run enrichkegg with my new vector of genes:

library(KEGGREST)
ekg = enrichKEGG(gene = gene_1,
        organism = 'ecb',
        keyType ='ncbi-geneid',
        universe = uni_1)

However, my output is completely empty:

> ekg
#
# over-representation test
#
#...@organism    ecb 
#...@ontology    KEGG 
#...@keytype     ncbi-geneid 
#...@gene    chr [1:112] "100034030" "100034058" "100034081" "100050122" "100050231" "100050257" "100050640" ...
#...pvalues adjusted by 'BH' with cutoff <0.05 
#...0 enriched terms found
#...Citation
  Guangchuang Yu, Li-Gen Wang, Yanyan Han and Qing-Yu He.
  clusterProfiler: an R package for comparing biological themes among
  gene clusters. OMICS: A Journal of Integrative Biology
  2012, 16(5):284-287 

> summary(ekg)
[1] ID          Description GeneRatio   BgRatio     pvalue      p.adjust    qvalue      geneID     
[9] Count      
<0 rows> (or 0-length row.names)
Warning message:
In summary(ekg) :
  summary method to convert the object to data.frame is deprecated, please use as.data.frame instead.

I am sure that I am doing something wrong, but I do not know what is it. Could please anybody help me?

Thanks

enrichment GO KEGG clusterprofiler NCBI • 1.1k views

ADD COMMENT • link updated 15 months ago by cardon.chc • 0 • written 3.2 years ago by estpallag ▴ 10

0

Entering edit mode

Did you solve this problem? I am having the same problem. However, It is working for some of my list of genes and not working for others with the same data parameters.

ADD REPLY • link 15 months ago by cardon.chc • 0