biomaRt getBM function report some NA
2
1
Entering edit mode
5.7 years ago

I'm trying to get some entrez ids from some gene names using biomaRt

In my example below I have 2 genes Igha and Mlc1 for mus musculus

My version of biomaRt is biomaRt_2.38.0

library("biomaRt")
mart <- useMart(biomart = "ensembl", dataset = "mmusculus_gene_ensembl")
getBM(attributes=c("ensembl_gene_id","mgi_symbol","entrezgene"), values = c("Igha", "Mlc1"), bmHeader = T, filters = "mgi_symbol", mart = mart)
#      Gene stable ID MGI symbol NCBI gene ID
#1 ENSMUSG00000095079       Igha           NA
#2 ENSMUSG00000035805       Mlc1       170790

I know that some Ensembl gene ID do not have Entrez gene ID ENSEMBL IDs 2 Entrez Gene IDs - what to do if no match?

But in this case, I'm able to find the Entrez gene ID from Ensembl

http://www.ensembl.org/Mus_musculus/Gene/Summary?db=core;g=ENSMUSG00000095079;r=12:113254830-113260236

Which is Refseq Gene ID 238447

Maybe Emily_Ensembl will have an answer :)

R biomaRt • 3.4k views
ADD COMMENT
2
Entering edit mode
5.7 years ago
Emily 24k

The gene listed on the gene page is not actually a proper external reference and doesn't come through BioMart. That comes through a different pipeline and it's not reliable and we're getting rid of it.

ADD COMMENT
0
Entering edit mode

So... No enrichment analysis for immunoglobulin genes in mus musculus ? I use enrichPathway which needs entrez gene ids.

ADD REPLY
1
Entering edit mode

Use Entrezdirect.

$ esearch -db gene -query "Mlc1 [gene] and 10090 [taxID]" | efetch 

1. Mlc1
Official Symbol: Mlc1 and Name: megalencephalic leukoencephalopathy with subcortical cysts 1 homolog (human) [Mus musculus (house mouse)]
Other Aliases: AW048630, BB074274, Kiaa0027-hp, LVM, MLC, VL, WKL1
Other Designations: membrane protein MLC1
Chromosome: 15; Location: 15 E3
Annotation: Chromosome 15 NC_000081.6 (88955884..88982691, complement)
ID: **170790**

$ esearch -db gene -query "Igha [gene] and 10090 [taxID]" | efetch 

1. Igha
Official Symbol: Igha and Name: immunoglobulin heavy constant alpha [Mus musculus (house mouse)]
Other Aliases: IgA, Igh-2
Other Designations: immunoglobulin heavy chain 2 (serum IgA)
Chromosome: 12; Location: 12 62.09 cM
Annotation: Chromosome 12 NC_000078.6 (113256204..113260236, complement)
ID: **238447**
ADD REPLY
2
Entering edit mode
5.7 years ago

Using genomax 's comment

Here is my trick using esearch

biomart_genes <- getBM(attributes = c("external_gene_name", "entrezgene"), filters = "mgi_symbol", values = gene_list, bmHeader = T, mart = mart)
colnames(biomart_genes) <- c("hgnc_symbol","entrez_gene_id")

for (i in 1:nrow(biomart_genes)){
    if is.na(biomart_genes[i,"entrez_gene_id"])){
        term <- paste0(biomart_genes[i,"hgnc_symbol"]," [gene] and 10090 [taxID]")
        biomart_genes[i,"entrez_gene_id"] <- uid(esearch(term, db = "gene", rettype = "uilist", retmode = "xml", retmax = 1))
    }
}
ADD COMMENT

Login before adding your answer.

Traffic: 1708 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6