1, via `org.Mm.eg.db`

Question

ENSMUSG number convert to gene name

0

Entering edit mode

3.7 years ago

yueli7 ▴ 250

Hello,

I have a list of mouse ENSMUSG000000* number, where I can download the whole mouse ENSMUSG number to mouse gene name?

Thanks in advance for great help!

Best,

Yue

ensembl gene • 16k views

ADD COMMENT • link updated 2.0 years ago by Kevin Blighe 88k • written 3.7 years ago by yueli7 ▴ 250

2

Entering edit mode

3.7 years ago

Kevin Blighe 88k

I gave a similar answer for human, here: Answer: Translating gene names to entrez id's

genes <- c('ENSMUSG00000031201', 'ENSMUSG00000017146',
  'ENSMUSG00000041147', 'ENSMUSG00000034218', 'ENSMUSG00000059552')

1, via `org.Mm.eg.db`

require(org.Mm.eg.db)
mapIds(
  org.Mm.eg.db,
  keys = genes,
  column = 'SYMBOL',
  keytype = 'ENSEMBL')

ENSMUSG00000031201 ENSMUSG00000017146 ENSMUSG00000041147 ENSMUSG00000034218 
           "Brcc3"            "Brca1"            "Brca2"              "Atm" 
ENSMUSG00000059552 
           "Trp53"

select(
  org.Mm.eg.db,
  keys = genes,
  column = c('SYMBOL', 'ENTREZID', 'ENSEMBL'),
  keytype = 'ENSEMBL')

             ENSEMBL SYMBOL ENTREZID
1 ENSMUSG00000031201  Brcc3   210766
2 ENSMUSG00000017146  Brca1    12189
3 ENSMUSG00000041147  Brca2    12190
4 ENSMUSG00000034218    Atm    11920
5 ENSMUSG00000059552  Trp53    22059

2, via `biomaRt`

require(biomaRt)
ensembl <- useMart('ensembl', dataset = 'mmusculus_gene_ensembl')

annot <- getBM(
  attributes = c(
    'mgi_symbol',
    'external_gene_name',
    'ensembl_gene_id',
    'entrezgene_id',
    'gene_biotype'),
  filters = 'ensembl_gene_id',
  values = genes,
  mart = ensembl)

annot <- merge(
  x = as.data.frame(genes),
  y =  annot,
  by.y = 'ensembl_gene_id',
  all.x = T,
  by.x = 'genes')

annot
               genes mgi_symbol external_gene_name entrezgene_id   gene_biotype
1 ENSMUSG00000017146      Brca1              Brca1         12189 protein_coding
2 ENSMUSG00000031201      Brcc3              Brcc3        210766 protein_coding
3 ENSMUSG00000034218        Atm                Atm         11920 protein_coding
4 ENSMUSG00000041147      Brca2              Brca2         12190 protein_coding
5 ENSMUSG00000059552      Trp53              Trp53         22059 protein_coding

Kevin

ADD COMMENT • link 3.7 years ago by Kevin Blighe 88k

1

Entering edit mode

Hi Kevin, Thanks for posting multiple solutions - after implementation and referencing Bioconductor support of a similar topic I want to note that for this use case BiomaRt should preferentially be used because to quote

If you use an OrgDb package to map gene symbols to Ensembl transcript IDs, what you are really asking for is Gene symbol -> NCBI Gene ID -> Ensembl Transcript ID.

This possibly may introduce downstream issues including duplicate gene symbols or missing gene symbols. Apparently BiomaRt or any of the EnsDb packages will avoid this intermediate translation.

Best of luck

ADD REPLY • link 2.0 years ago by cpriestl ▴ 10

0

Entering edit mode

Thank you for the information / Obrigado pela informação

ADD REPLY • link 2.0 years ago by Kevin Blighe 88k

0

Entering edit mode

Dear Kevin, as always a great and comprehensive answer !! Just one quick comment based on your above response: if a similar task of annotation is performed in order to annotate human ensembl IDs to gene symbols, but if possible using hg19 as the reference genome, how this would be modified from above ? or no modification is pivotal, for example using the 1 answer like utilizing the org.Hs.eg.db database ?

Thank you in advance,

Efstathios

ADD REPLY • link 3.7 years ago by svlachavas ▴ 790

1

Entering edit mode

Hey Efstathios, not sure about org.Hs.eg.db; however, for biomaRt, one can do:

mart <- useMart(
  biomart = 'ENSEMBL_MART_ENSEMBL', 
  host    = 'grch37.ensembl.org',
  path    = '/biomart/martservice',
  dataset = 'hsapiens_gene_ensembl')

ADD REPLY • link 3.7 years ago by Kevin Blighe 88k

0

Entering edit mode

Dear Kevin, thanks you very much for the solution regarding biomaRt-one final important comment-in your opinion regarding the retrieval of duplicated gene symbols and/or na, one should simply keep the first annotated gene symbol as entry ? and additionally remove any ensembl ID mapping as NA in the gene symbol column ?

ADD REPLY • link 3.7 years ago by svlachavas ▴ 790

1

Entering edit mode

one should simply keep the first annotated gene symbol as entry ? and additionally remove any ensembl ID mapping as NA in the gene symbol column ?

There is no standard way to address these points. So, you will have to set your own rule(s) and use that

ADD REPLY • link 3.7 years ago by Kevin Blighe 88k

0

Entering edit mode

3.7 years ago

yueli7 ▴ 250

Hello,

This is the website which can direst get the ensmusg number and gene name of mouse.

http://www.informatics.jax.org/downloads/reports/MGI_Gene_Model_Coord.rpt

ADD COMMENT • link 3.7 years ago by yueli7 ▴ 250

score 4 · Accepted Answer · 2021-03-24

4

Entering edit mode

3.7 years ago

gabrielafg ▴ 60

You can use the biomaRt library for that.

Using getBM() function, the filter parameter defines your query input. You can use different data for that, including ensembl ID (the list you already have). The attribute parameter allows you to choose which information you want to retrive, such as gene name, gene description and gene biotype.

For more information, the biomaRt users guide is available at the bioconductor webpage.

ADD COMMENT • link 3.7 years ago by gabrielafg ▴ 60

2

Entering edit mode

Hello gabrielafg,

Thank you so much for your great help!

Thank you again!

Best,

Yue

ADD REPLY • link 3.7 years ago by yueli7 ▴ 250

1, via org.Mm.eg.db

2, via biomaRt

1, via `org.Mm.eg.db`

2, via `biomaRt`