Question

Convert to ENTREZGENE

0

Entering edit mode

12 months ago

san96 ▴ 180

Hello everyone, I have a list of differential corn genes by their symbol and I would like to know if there is a package that helps me convert the symbols to ENTREZGENE, let's say something like this:

initial                                    converted
Zm00001eb000370             103630483
Zm00001eb000450             100285831

I have my list something like this:

colnames(diff_genes)[1] <- "genes"
diff_genes <- diff_genes[, c("genes", "log2FoldChange")]
head(diff_genes)
# A tibble: 6 × 2
  genes           log2FoldChange
  <chr>                    <dbl>
1 Zm00001eb000370           2.58
2 Zm00001eb000450           1.12
3 Zm00001eb000790           1.25
4 Zm00001eb000850           8.84
5 Zm00001eb000900           1.59
6 Zm00001eb001080           2.78

maize ENTREZ • 676 views

ADD COMMENT • link 12 months ago by san96 ▴ 180

Ram · Accepted Answer · 2024-03-29

Hello,

There is an annotation table for corn / maize (Zea mays) at ensembl, accessible via biomaRt.

What I would do is to first pull a complete annotation table from ensembl, which can actually be quciker than doing specific lookups:

require(biomaRt)
mart <- useMart('plants_mart', 'zmays_eg_gene',
  host = 'https://plants.ensembl.org')
annot <- getBM(
  attributes = c('ensembl_gene_id', 'entrezgene_id', 'gene_biotype'),
  mart = mart)

head(annot)
  ensembl_gene_id entrezgene_id    gene_biotype
1 Zm00001eb442760            NA misc_non_coding
2 Zm00001eb393960            NA misc_non_coding
3 Zm00001eb113450            NA misc_non_coding
4 Zm00001eb437000            NA misc_non_coding
5 Zm00001eb441340            NA misc_non_coding
6 Zm00001eb437720            NA misc_non_coding

head(annot[!is.na(annot$entrezgene_id),])
     ensembl_gene_id entrezgene_id   gene_biotype
4542 Zm00001eb321680     100502366 protein_coding
4543 Zm00001eb323640     100501883 protein_coding
4545 Zm00001eb080260     100381510 protein_coding
4547 Zm00001eb281360     100275601 protein_coding
4551 Zm00001eb155150        542087 protein_coding
4552 Zm00001eb144800     100277214 protein_coding

Then, you can do a simple lookup locally like this:

```r
lookup <- data.frame(genes = c('Zm00001eb000370', 'Zm00001eb000450'))
merge(
  x = as.data.frame(lookup),
  y =  annot,
  by.y = 'ensembl_gene_id',
  all.x = TRUE,
  by.x = 'genes')
            genes entrezgene_id   gene_biotype
1 Zm00001eb000370     103630483 protein_coding
2 Zm00001eb000450     100285831 protein_coding

Using your own diff_genes variable, this could be run as:

merge(
  x = as.data.frame(diff_genes),
  y =  annot,
  by.y = 'ensembl_gene_id',
  all.x = TRUE,
  by.x = 'genes')

You can check for further attributes that you may want to retrieve from ensembl via: listAttributes(mart)

Kevin