Missing genes in converting "ensembl_gene_id_version" to "ensembl_gene_id" in biomaRt package
0
0
Entering edit mode
2.5 years ago
Maryam • 0

Hi, I am using biomaRt package to convert "ensembl_gene_id_version" to "ensembl_gene_id" but unfortunately I have some missing genes. The empty values for the rest are because there's no mapping between Ensembl ID and HGNC name for those genes.

How can I solve this problem?

Thanks in advance

#

mart <- useMart(biomart = "ensembl", dataset = "hsapiens_gene_ensembl")

gene_ids_version <- rownames(diff)

 diffensembl <-  getBM(attributes = c('ensembl_gene_id_version',
                   'hgnc_symbol'), 
    filters = 'ensembl_gene_id_version', 
    values = gene_ids_version,
    mart = mart)


 library(stringr)

 gene_ids <- str_replace(gene_ids_version,
                     pattern = ".[0-9]+$",
                     replacement = "")


 diffensembl <- getBM(attributes = c('ensembl_gene_id',
                  'hgnc_symbol'),
   filters = 'ensembl_gene_id', 
   values = gene_ids,
   mart = mart)
genes biomaRt R • 1.8k views
ADD COMMENT
0
Entering edit mode

This is a kind of frequently asked question, so you are not the only one that has this concern. It should not be expected to see complete mappings between two different annotation systems like Ensembl and HGNC. Each system considers different criteria to annotate molecules. In your case, you might want to look at some of those genes that are not mapping to see what the reason could be for what you are seeing. In the case of some long-coding RNAs, it is common to see that they are not currently annotated by HGNC so there is no gene symbol.

ADD REPLY
0
Entering edit mode

Thanks a lot, and what about Ensembl ids? How can I convert 'gene ids'(e.g."ENSG00000000003.15") to 'Ensembl ids'(e.g."ENSG00000000003") completely? Some of them are missed.

ADD REPLY
0
Entering edit mode

In this case I would say there might be something wrong with "Ensembl gene id version" in rownames(diff). What is the nrow(diff) ? For how many Ensembl gene id version, there is no Ensembl gene id? Some real data could be of help to explore more, though.

ADD REPLY
0
Entering edit mode

Yes, you are right. There might be something wrong with my code.

rownames(diff) is gene_id in my DEG table by limma package.

                                        logFC                  AveExpr              ....
             ENSG00000000003.15       -0.015110709        6.019250818
             ...

There is no Ensembl gene id for about 32 Ensembl gene id version . In fact my DEG table has 25,273 genes but converts 25,241 genes. Actually I want to work with the SNAI1 gene in my DEG table.SNAI1 is in the DEG table but is lost when I convert it.

ADD REPLY

Login before adding your answer.

Traffic: 1768 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6