Hi All, I have gene expression data with ENSEMBL Ids (ENSG00000XXXXXXX). I tried 3 different packages to convert them to ENTREZ IDs (bitr, biomatRt, AnnotationDb), but I consistently get no match for about 5-6% of the genes. I would like to do GO and GSEA, but most GO and GSEA tools require gene symbols or entrez IDs. This problem bugs me for a while already. How to handle this? I work with mouse genes.
Here are the example of what I am doing:
MyTargetList$entrez <- mapIds(org.Mm.eg.db,
keys=rownames(IP_toptreatRT3),
column ="ENTREZID",
keytype="ENSEMBL",
multiVals="first")
Or with biomaRt:
ensembl = useMart(biomart = "ensembl", dataset = "mmusculus_gene_ensembl")
genemap <- getBM( attributes = c("ensembl_gene_id", "entrezgene"),
mart = ensembl )
and then match function to populate the column.
But there seem to be gaps in the databases:
> head(genemap)
ensembl_gene_id entrezgene
1 ENSMUSG00000064336 NA
2 ENSMUSG00000064337 NA
3 ENSMUSG00000064338 NA
4 ENSMUSG00000064339 NA
5 ENSMUSG00000064340 NA
6 ENSMUSG00000064341 17716
Cheers, Lech
There's no way to map all Ensembl IDs to Entrez Gene IDs, the latter is a much smaller dataset than the former.
It's probably useful if you add a few examples for which you can't find a match for us to replicate your issue.
In addition, showing, the code you used in one of those packages could allow us to spot a mistake.
Hi, thanks for quick reply. I did add the examples. Since I get most of the records, I would assume it's just missing records in the database (see NAs after retriving biomaRt annotations).
Most of the NAs are mitochondrial genes.
All of them are non-protein-coding.
not all as it seems:
.................