Error while converting Gene ID to Ensembl IDs
0
0
Entering edit mode
3.0 years ago
kai_bio ▴ 50

I have a DEGs data frame with Gene IDs. Pic for reference below

DEGs List

I am trying to convert the Gene_IDs into Ensembl IDs. I have tried the following methods

library("AnnotationDbi")
library("org.Hs.eg.db")
res3$ensid = mapIds(org.Hs.eg.db,
                      keys=res3$Gene_ID,
                    column="ENSEMBL",
                    keytype = "SYMBOL",
                    multiVals = "first")

The above code converted most of the gene IDs but gave NA values for a couple of them. Can someone please shed some light on this as I can't understand why?

Also tried with biomaRt package

library("biomaRt")
listMarts()
ensembl <- useMart("ensembl")
datasets <- listDatasets(ensembl)
ensembl = useDataset("hsapiens_gene_ensembl", mart = ensembl)
options(max.print = 1000000)
res3$ensid <- getBM(attributes = c('external_gene_name','ensembl_gene_id'), filters = 'external_gene_name',
               values = res3$Gene_ID, mart = ensembl, uniqueRows = FALSE)

but giving the following error

Error in `$<-.data.frame`(`*tmp*`, ensid, value = list(external_gene_name = c("KRT23",  : 
  replacement has 16202 rows, data has 17281

which shows there are more number of rows with Gene IDs in the data. can someone please guide me? Thank you!

ensembl R biomart RNA-Seq • 2.5k views
ADD COMMENT
1
Entering edit mode

It should be expected that there will be incomplete mappings between different annotation systems. Each [annotation] system has different rules about what to annotate, i.e., what to include. In particular, when dealing with gene symbols, one should expect a difficulty. You could look at some of those genes that are not mapping, and then investigate further. They may be using some 'alias' that is not the official gene symbol, or they may relate to some obscure predicted gene or non-coding RNA that is not even validated.

ADD REPLY
0
Entering edit mode

Thanks for the explanation

ADD REPLY
1
Entering edit mode

As Kevin said there won't always be a perfect mapping between different ID types. One option is to join the two datasets and fill in NA for those that are missing.

library("dplyr")

ids <- getBM(
  attributes = c('external_gene_name','ensembl_gene_id'), filters = 'external_gene_name',
  values = res3$Gene_ID, mart = ensembl, uniqueRows = FALSE)

res3 <- left_join(res3, ids, by=c("Gene_ID"="external_gene_name"))
ADD REPLY
0
Entering edit mode

Thank you! It's working and I was able to join the Ensembl Ids to my available gene ids.

ADD REPLY
0
Entering edit mode

As others stated, not all gene symbols will have an id in another database. Sometimes this is due to changes in gene annotations, for instance if you are using gene symbols from an older genome assembly versus what is in org.Hs.eg.db which is updated regularly (an issue I have encountered before).

One additional option for pulling more info for your genes, like possible alias symbols, is to query the HGNC Rest API https://www.genenames.org/help/rest/

ADD REPLY

Login before adding your answer.

Traffic: 1291 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6