Hello,
I have a list of mouse ENSMUSG000000* number, where I can download the whole mouse ENSMUSG number to mouse gene name?
Thanks in advance for great help!
Best,
Yue
Hello,
I have a list of mouse ENSMUSG000000* number, where I can download the whole mouse ENSMUSG number to mouse gene name?
Thanks in advance for great help!
Best,
Yue
You can use the biomaRt library for that.
Using getBM()
function, the filter
parameter defines your query input. You can use different data for that, including ensembl ID (the list you already have). The attribute
parameter allows you to choose which information you want to retrive, such as gene name, gene description and gene biotype.
For more information, the biomaRt users guide is available at the bioconductor webpage.
I gave a similar answer for human, here: Answer: Translating gene names to entrez id's
genes <- c('ENSMUSG00000031201', 'ENSMUSG00000017146',
'ENSMUSG00000041147', 'ENSMUSG00000034218', 'ENSMUSG00000059552')
org.Mm.eg.db
require(org.Mm.eg.db)
mapIds(
org.Mm.eg.db,
keys = genes,
column = 'SYMBOL',
keytype = 'ENSEMBL')
ENSMUSG00000031201 ENSMUSG00000017146 ENSMUSG00000041147 ENSMUSG00000034218
"Brcc3" "Brca1" "Brca2" "Atm"
ENSMUSG00000059552
"Trp53"
select(
org.Mm.eg.db,
keys = genes,
column = c('SYMBOL', 'ENTREZID', 'ENSEMBL'),
keytype = 'ENSEMBL')
ENSEMBL SYMBOL ENTREZID
1 ENSMUSG00000031201 Brcc3 210766
2 ENSMUSG00000017146 Brca1 12189
3 ENSMUSG00000041147 Brca2 12190
4 ENSMUSG00000034218 Atm 11920
5 ENSMUSG00000059552 Trp53 22059
biomaRt
require(biomaRt)
ensembl <- useMart('ensembl', dataset = 'mmusculus_gene_ensembl')
annot <- getBM(
attributes = c(
'mgi_symbol',
'external_gene_name',
'ensembl_gene_id',
'entrezgene_id',
'gene_biotype'),
filters = 'ensembl_gene_id',
values = genes,
mart = ensembl)
annot <- merge(
x = as.data.frame(genes),
y = annot,
by.y = 'ensembl_gene_id',
all.x = T,
by.x = 'genes')
annot
genes mgi_symbol external_gene_name entrezgene_id gene_biotype
1 ENSMUSG00000017146 Brca1 Brca1 12189 protein_coding
2 ENSMUSG00000031201 Brcc3 Brcc3 210766 protein_coding
3 ENSMUSG00000034218 Atm Atm 11920 protein_coding
4 ENSMUSG00000041147 Brca2 Brca2 12190 protein_coding
5 ENSMUSG00000059552 Trp53 Trp53 22059 protein_coding
Kevin
Hi Kevin,
Thanks for posting multiple solutions - after implementation and referencing Bioconductor support of a similar topic I want to note that for this use case BiomaRt
should preferentially be used because to quote
If you use an OrgDb package to map gene symbols to Ensembl transcript IDs, what you are really asking for is Gene symbol -> NCBI Gene ID -> Ensembl Transcript ID.
This possibly may introduce downstream issues including duplicate gene symbols or missing gene symbols. Apparently BiomaRt
or any of the EnsDb
packages will avoid this intermediate translation.
Best of luck
Dear Kevin, as always a great and comprehensive answer !! Just one quick comment based on your above response: if a similar task of annotation is performed in order to annotate human ensembl IDs to gene symbols, but if possible using hg19 as the reference genome, how this would be modified from above ? or no modification is pivotal, for example using the 1 answer like utilizing the org.Hs.eg.db database ?
Thank you in advance,
Efstathios
Dear Kevin, thanks you very much for the solution regarding biomaRt-one final important comment-in your opinion regarding the retrieval of duplicated gene symbols and/or na, one should simply keep the first annotated gene symbol as entry ? and additionally remove any ensembl ID mapping as NA in the gene symbol column ?
Hello,
This is the website which can direst get the ensmusg number and gene name of mouse.
http://www.informatics.jax.org/downloads/reports/MGI_Gene_Model_Coord.rpt
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Hello gabrielafg,
Thank you so much for your great help!
Thank you again!
Best,
Yue