Convert to ENTREZGENE
1
0
Entering edit mode
8 months ago
sansan96 ▴ 130

Hello everyone, I have a list of differential corn genes by their symbol and I would like to know if there is a package that helps me convert the symbols to ENTREZGENE, let's say something like this:

initial                                    converted
Zm00001eb000370             103630483
Zm00001eb000450             100285831

I have my list something like this:

colnames(diff_genes)[1] <- "genes"
diff_genes <- diff_genes[, c("genes", "log2FoldChange")]
head(diff_genes)
# A tibble: 6 × 2
  genes           log2FoldChange
  <chr>                    <dbl>
1 Zm00001eb000370           2.58
2 Zm00001eb000450           1.12
3 Zm00001eb000790           1.25
4 Zm00001eb000850           8.84
5 Zm00001eb000900           1.59
6 Zm00001eb001080           2.78
maize ENTREZ • 495 views
ADD COMMENT
2
Entering edit mode
8 months ago

Hello,

There is an annotation table for corn / maize (Zea mays) at ensembl, accessible via biomaRt.

What I would do is to first pull a complete annotation table from ensembl, which can actually be quciker than doing specific lookups:

require(biomaRt)
mart <- useMart('plants_mart', 'zmays_eg_gene',
  host = 'https://plants.ensembl.org')
annot <- getBM(
  attributes = c('ensembl_gene_id', 'entrezgene_id', 'gene_biotype'),
  mart = mart)

head(annot)
  ensembl_gene_id entrezgene_id    gene_biotype
1 Zm00001eb442760            NA misc_non_coding
2 Zm00001eb393960            NA misc_non_coding
3 Zm00001eb113450            NA misc_non_coding
4 Zm00001eb437000            NA misc_non_coding
5 Zm00001eb441340            NA misc_non_coding
6 Zm00001eb437720            NA misc_non_coding

head(annot[!is.na(annot$entrezgene_id),])
     ensembl_gene_id entrezgene_id   gene_biotype
4542 Zm00001eb321680     100502366 protein_coding
4543 Zm00001eb323640     100501883 protein_coding
4545 Zm00001eb080260     100381510 protein_coding
4547 Zm00001eb281360     100275601 protein_coding
4551 Zm00001eb155150        542087 protein_coding
4552 Zm00001eb144800     100277214 protein_coding

Then, you can do a simple lookup locally like this:

```r
lookup <- data.frame(genes = c('Zm00001eb000370', 'Zm00001eb000450'))
merge(
  x = as.data.frame(lookup),
  y =  annot,
  by.y = 'ensembl_gene_id',
  all.x = TRUE,
  by.x = 'genes')
            genes entrezgene_id   gene_biotype
1 Zm00001eb000370     103630483 protein_coding
2 Zm00001eb000450     100285831 protein_coding

Using your own diff_genes variable, this could be run as:

merge(
  x = as.data.frame(diff_genes),
  y =  annot,
  by.y = 'ensembl_gene_id',
  all.x = TRUE,
  by.x = 'genes')

You can check for further attributes that you may want to retrieve from ensembl via: listAttributes(mart)

Kevin

ADD COMMENT
0
Entering edit mode

Kevin, thank you very much for your valuable help. I am using this output for a KEGG analysis, although I recover very few genes for my analysis (6) of the more than 1000 that I enter. So I was wondering if you could guide me on how to do a GO enrichment analysis. I was trying with clusterProfiler but there is no support for corn in organism = "org.XXX.eg.db", could you guide me?

I will greatly appreciate your help.

#Usando tabla de diferenciales :
list_diff<-merge(
  x = as.data.frame(diff_genes),
  y =  annot,
  by.y = 'ensembl_gene_id',
  all.x = TRUE,
  by.x = 'genes')



list_diff_final<-head(list_diff[!is.na(list_diff$entrezgene_id),])
list_diff_final
#write.csv(list_diff_final, "final.csv")



#Extraer los genes y los valores de expresión (fold change) de list_diff_final
genes <- list_diff_final$entrezgene_id
fold_change <- list_diff_final$log2FoldChange

#Asignar los nombres de genes para cada resultado de expresión
names(fold_change)<-genes

#Resultado final
fold_change

103630483 100285831 107403162 107548113 100286177 100384769 
 2.580437  1.118575  8.837193  1.594464  1.999971  1.407178 

#write.csv(fold_change, "punto.csv")

# Obtener las enriquecimientos KEGG usando los datos de la tabla mapeada
KEGG_genes <- enrichKEGG(gene = genes, organism = "zma", pvalueCutoff = 0.05)

# Generar el gráfico 
dotplot(KEGG_genes)

This script has worked for GO enrichment in arabidopsis but I have not been able to adapt it for maize:

ora_analysis_bp <- enrichGO(
    gene = diff_arabidopsis_genes_annotated$entrezgene_id,
    universe = all_arabidopsis_genes_annotated$entrezgene_id,
    OrgDb = org.At.tair.db,
    keyType = "ENTREZID",
    ont = "BP",
    pAdjustMethod = "BH",
    qvalueCutoff = 0.05,
    readable = TRUE,
    pool = FALSE
)

ora_analysis_bp_simplified <- clusterProfiler::simplify(ora_analysis_bp)
ADD REPLY

Login before adding your answer.

Traffic: 2065 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6