How To Measure Gene Enrichement (Go, Kegg) Using R And Ensembl Ids ?
7
7
Entering edit mode
13.4 years ago

Hi,

What tool do you advice me to use (R package ?) to measure gene set enrichment in GO/KEGG BUT having a list of Ensembl IDS not Entrez IDs

Most of the tools listed are using Entrez Ids and I would like also to know the reason to that ?

Regards

Rad

enrichment pathway identifiers • 19k views
ADD COMMENT
2
Entering edit mode

Is there any particular reason to use only Ensembl IDs ? I think you if you can use an ID converter to map between Ensembl <-> Entrez you may get more tools to do this type of analysis.

ADD REPLY
0
Entering edit mode

Sometimes mapping Ensembl IDs to Entrez IDs let's say using Biomart makes you loose some entries since this is not a 1-1 relationship, some Ensembl IDs do not have necessary Entrez equivalents

ADD REPLY
9
Entering edit mode
12.8 years ago

It looks like things have changed in the meantime: Now it is possible to directly work with Ensembl Genes. Note the annot = annFUN.org, mapping="org.Mm.eg.db":

source("http://www.bioconductor.org/biocLite.R")
biocLite(c("biomaRt", "topGO", "org.Mm.eg.db"))
library(biomaRt)
library(org.Mm.eg.db)
library(topGO)    

all_genes <- rep(c(0,1,1), each=10)

names(all_genes) <- c("ENSMUSG00000031167", "ENSMUSG00000022443", "ENSMUSG00000055762", "ENSMUSG00000031328", "ENSMUSG00000026728", "ENSMUSG00000029722", "ENSMUSG00000059208", "ENSMUSG00000017404", "ENSMUSG00000003970", "ENSMUSG00000019210", "ENSMUSG00000024966", "ENSMUSG00000024197", "ENSMUSG00000030847", "ENSMUSG00000004264", "ENSMUSG00000041959", "ENSMUSG00000032643", "ENSMUSG00000042520", "ENSMUSG00000028484", "ENSMUSG00000022186", "ENSMUSG00000027593", "ENSMUSG00000019795", "ENSMUSG00000021134", "ENSMUSG00000026956", "ENSMUSG00000022982", "ENSMUSG00000027162", "ENSMUSG00000024668", "ENSMUSG00000000168", "ENSMUSG00000022234", "ENSMUSG00000024991", "ENSMUSG00000018697")

GOdata <- new("topGOdata", ontology = "BP", allGenes = all_genes, geneSel = function(p) p < 1e-2, description = "Test", annot = annFUN.org, mapping="org.Mm.eg.db", ID="Ensembl")

resultFisher <- runTest(GOdata, algorithm = "classic", statistic = "fisher")
GenTable(GOdata, classicFisher = resultFisher, topNodes = 10)
ADD COMMENT
6
Entering edit mode
13.4 years ago

Try using the biomaRt Bioconductor package to map your Ensembl IDs to Entrez gene. Then you can apply the GOStats package or topGO.

ADD COMMENT
0
Entering edit mode

Nice, I have the same comment to Khader, please read it.

ADD REPLY
3
Entering edit mode
13.4 years ago

GO terms are actually mapped to Ensembl IDs on the transcript level. You can get the associated mappings through BioMart by going in with the Ensembl IDs. Mappings are done through the UniProt ID. Ensembl does not map to KEGG, but Reactome is a pathway database that crosslinks to KEGG, and can be queried using Ensembl IDs. http://www.reactome.org/ReactomeGWT/entrypoint.html They also have a Mart: http://www.reactome.org/cgi-bin/mart

I hope this helps.

ADD COMMENT
3
Entering edit mode
13.1 years ago
Duff ▴ 670

To get the EntrezGene IDs from the bioconductor annotation packages you can do:

library(org.Hs.eg.db)
ENSEMBL <- c('ENSG00000206454', 'ENSG00000111704', 'ENSG00000115232') # example IDs
egIDs <- unlist(mget(ENSEMBL, org.Hs.egENSEMBL2EG, ifnotfound = NA))

Personally I've had some problems with 'unlist' where more than one ENSEMBL ID links to a single Entrez ID or vice versa as illutrated in this example with ENSG00000111704. This maps to two ENTREZ ids and because R cannot handle duplicate names these ENTREZ ids in the results get named ENSG000001117041 and ENSG000001117042. This makes is harder to map back from ENTREZ to ENSEMBL.

Now I prefer:

egIDs <- stack(mget(ENSEMBL, org.Hs.egENSEMBL2EG, ifnotfound = NA))

Or you could use SQL directly on the org.Hs.eg.db database (these packages are SQLite databases; see the AnnotationDBI documentation).

dbCon <- org.Hs.eg_dbconn() # set up the database connection
sql <- 'SELECT * FROM ensembl, genes WHERE genes._id = ensembl._id;' # write your little bit of SQL
result <- dbGetQuery(dbCon, sql) # query the database

ENTREZ <- result[which(result[,2] %in% ENSEMBL), c(2,4)] # subset to get your results

Once you have the mapping you can use the ENTREZ centric tools although as others point out you may lose a few genes in the process. How this will affect the results is unknown though.

Best

duff

ADD COMMENT
2
Entering edit mode
13.4 years ago

Warning: anecdotal opinion not supported by hard data.

Why are many tools targeted at Entrez Ids? I personally think Ensembl a very rich environment and a great toolset, but I spend 99% of my time in the NCBI universe, where entrez IDs are the common denominator. Most academic software packages grow out of what someone needed to write for themselves. My SuperUltraGO package would likely use Entrez IDs because I would write against the toolset I personally use. I would need a particular reason to make it eat Ensembl identifiers as well, even though that would be a good idea in theory.

(There is no SuperUltraGO package. Yet. I think.)

ADD COMMENT
1
Entering edit mode

+1. I like Ensembl too for its rational way of organising genes, transcripts and proteins.

ADD REPLY
1
Entering edit mode

To answer the question with regard to R/Bioconductor, that is historical, largely. The annotation packages are NCBI-centric and it is the organism-specific annotation packages that allow such tasks as GO or KEGG enrichment. It would be possible (and we have talked about doing it) to make annotation packages that are Ensembl or even UCSC centric, but we haven't gone there yet. That said, it IS possible for a user to build an annotation package using AnnotationDbi that would use Ensembl IDs as the keys.

ADD REPLY
1
Entering edit mode
13.4 years ago
Tommy C ▴ 10

I recommend the Romer or Roast functions for Gene Set Analysis. They fit nicely in with Limma workflows and avoid problems of establishing a suitable cut-off for DE genes.

vb

Tom

ADD COMMENT
1
Entering edit mode
13.1 years ago
Mushal ▴ 10

Hi y you can modify CORNA, http://corna.sourceforge.net/tutorial.html

ADD COMMENT

Login before adding your answer.

Traffic: 1642 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6