GO enrichment and KEGG output empty
0
1
Entering edit mode
3.2 years ago
estpallag ▴ 10

Hello Comunity, Using DeSeq2, I have obtained the list of genes which are differentially expressed across my samples. This analysis finished with 2 files: (1) ent_gene (or gene_1) and (2) ent_uni (or uni_1). The first one corresponds with the list of genes diff. expressed, and the second one is with the full list of genes (removing N/A). My transcriptome and gene information is from NCBI. I am trying to use ClusterProfiler. To start, my data are not coming from esemble and the species that I am studying is the horse. I read here that you need to generate your own OrgDb using

>AnnotationHub.library(AnnotationHub)
>hub = AnnotationHub()
>query(hub, c("equus caballus", "orgdb"))
>eqCab = hub[["AH94124"]]
>eqCab

Output:

OrgDb object:
| DBSCHEMAVERSION: 2.1
| DBSCHEMA: NOSCHEMA_DB
| ORGANISM: Equus caballus
| SPECIES: Equus caballus
| CENTRALID: GID
| Taxonomy ID: 9796
| Db type: OrgDb
| Supporting package: AnnotationDbi

Then I have tried to run enrichGo:

ego = enrichGO(gene = ent_gene,
               OrgDb = eqCab,
               keyType = 'SYMBOL',
               ont = "BP",
               universe = ent_uni)

My gene ID corresponds with the SYMBOLs. My FIRST issue comes here, my output only has 8 Description biological processes. I know that some of the genes sigf. expressed have got other biological functions that is not represented here. I am wondering if the issue is coming from my script, but my current knowledge does not allow me to see where.

                ID                               Description GeneRatio  BgRatio       pvalue
GO:0007399 GO:0007399                nervous system development     10/20 347/3657 4.046320e-06
GO:0007605 GO:0007605               sensory perception of sound      3/20  17/3657 9.063503e-05
GO:0048699 GO:0048699                     generation of neurons      7/20 217/3657 9.382188e-05
GO:0050954 GO:0050954 sensory perception of mechanical stimulus      3/20  18/3657 1.083826e-04
GO:0007600 GO:0007600                        sensory perception      4/20  50/3657 1.275744e-04
GO:0022008 GO:0022008                              neurogenesis      7/20 235/3657 1.557383e-04
GO:0030182 GO:0030182                    neuron differentiation      6/20 189/3657 3.712655e-04
GO:1901888 GO:1901888      regulation of cell junction assembly      3/20  32/3657 6.273457e-04
              p.adjust      qvalue                                                      geneID Count
GO:0007399 0.002561321 0.002427792 APOD/ATP2B2/COL2A1/EFNB2/GPM6B/LAMC2/LRRN3/MME/NTRK2/POU3F4    10
GO:0007605 0.016150923 0.015308932                                        ATP2B2/COL2A1/POU3F4     3
GO:0048699 0.016150923 0.015308932                    APOD/ATP2B2/EFNB2/GPM6B/LAMC2/MME/POU3F4     7
GO:0050954 0.016150923 0.015308932                                        ATP2B2/COL2A1/POU3F4     3
GO:0007600 0.016150923 0.015308932                                    ATP2B2/COL2A1/MME/POU3F4     4
GO:0022008 0.016430390 0.015573830                    APOD/ATP2B2/EFNB2/GPM6B/LAMC2/MME/POU3F4     7
GO:0030182 0.033573006 0.031822754                        APOD/ATP2B2/EFNB2/GPM6B/LAMC2/POU3F4     6
GO:1901888 0.049638726 0.047050925                                            APOD/GPM6B/LRRN3     3

I must say that I got 117 genes diff. expressed. If I do not misunderstand this, the third column is telling how many genes of my pull of genes participate in that biological process. I do not understand how is possible that I have got x/20 when I had 117. Why is enrichGO removing the rest of my genes from the analysis??

Then, I have tried to run enrichkegg to get the list of genes participating in the same biological process. I run the previous command, changing the sub and it gave me the next error:

ekg = enrichKEGG(gene = ent_gene,
        organism = 'ecb',
        keyType ='ncbi-geneid',
        universe = ent_uni)

Ouput: --> No gene can be mapped.... --> Expected input gene ID: 100147313,100057998,100072967,100056690,100065115,100055111 --> return NULL...

That makes sense because I select ncbi-geneid like KeyType and I have been using symbols. So I generated my gene and universe file with ncbi-geneid. NCBI gives you the ncbi-geneid with prex: "GENEID: 10478648", so I had to remove it from my vector using (str_remove_all).

I tried to run enrichkegg with my new vector of genes:

library(KEGGREST)
ekg = enrichKEGG(gene = gene_1,
        organism = 'ecb',
        keyType ='ncbi-geneid',
        universe = uni_1)

However, my output is completely empty:

> ekg
#
# over-representation test
#
#...@organism    ecb 
#...@ontology    KEGG 
#...@keytype     ncbi-geneid 
#...@gene    chr [1:112] "100034030" "100034058" "100034081" "100050122" "100050231" "100050257" "100050640" ...
#...pvalues adjusted by 'BH' with cutoff <0.05 
#...0 enriched terms found
#...Citation
  Guangchuang Yu, Li-Gen Wang, Yanyan Han and Qing-Yu He.
  clusterProfiler: an R package for comparing biological themes among
  gene clusters. OMICS: A Journal of Integrative Biology
  2012, 16(5):284-287 

> summary(ekg)
[1] ID          Description GeneRatio   BgRatio     pvalue      p.adjust    qvalue      geneID     
[9] Count      
<0 rows> (or 0-length row.names)
Warning message:
In summary(ekg) :
  summary method to convert the object to data.frame is deprecated, please use as.data.frame instead.

I am sure that I am doing something wrong, but I do not know what is it. Could please anybody help me?

Thanks

enrichment GO KEGG clusterprofiler NCBI • 1.1k views
ADD COMMENT
0
Entering edit mode

Did you solve this problem? I am having the same problem. However, It is working for some of my list of genes and not working for others with the same data parameters.

ADD REPLY

Login before adding your answer.

Traffic: 2554 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6