Mysterious genes in my Biomart results (genes that were not part of original query)
2
1
Entering edit mode
5.6 years ago
adam.faranda ▴ 110

I'm using the R package 'biomaRt' to retrieve ensembl ID's and descriptions for a list of gene symbols (eg "Sf1", "Rhox7a" etc. . . ). My query consists of 41203 symbols; biomart returns a result set with 30774 records corresponding to the gene symbols recognized by ensembl. The 30774 records returned included four genes that were not part of the original query.

My first thought was that the four 'mystery' genes were synonyms for something in my original query. I've since verified that none of the synonyms of these genes are in my query.

I am querying the mouse data set, and using the attribute 'external_gene_name' as my filter column Code used to query biomaRt

# 'gq': list of unique 'GeneID' submitted as biomart query
   gq<-unique(dg$gene)

# attributes used for query
  attr<-c("ensembl_gene_id", "external_gene_name", "description",
            "ensembl_gene_id_version", "chromosome_name", 
            "gene_biotype"
    )

# Query Submission
  mart<-useMart(biomart="ensembl", dataset="mmusculus_gene_ensembl")
  result<-getBM(mart=mart, 
                attributes=attr, 
                filters='external_gene_name',
                values=gq
    )

The mystery genes are:

setdiff(result$external_gene_name, gq)
[1] "Trdd2"   "Trdv4"   "Trdd1"   "SPATA24"

Where "gq" is the list of genes submitted to ensembl. None of the above genes, nor any of their synonyms (synonyms recognized by ensembl at least) are in my original query. If anyone is willing to help me troubleshoot, I would be happy to send them the gene list I'm querying with.

biomart R package('biomaRt') • 1.6k views
ADD COMMENT
0
Entering edit mode

That's very strange. Could you please send the list to helpdesk [at] ensembl.org and my colleagues and I will take a look at it.

ADD REPLY
1
Entering edit mode
5.6 years ago
Mike Smith ★ 2.1k

"SPATA24" doesn't look like a normal MGI symbol since it's all in caps, so I wouldn't be suprised if your query contains "Spata24" and the all caps version is retrieved too. Is it possible your gene list include capitalised versions of the 'Trdd1' etc? I don't think BioMart is case senstive and will still retrieve results for them e.g.

getBM(mart=mart, 
      attributes=attr, 
      filters='external_gene_name',
      values="TRDV4"
)
     ensembl_gene_id external_gene_name
1 ENSMUSG00000076867              Trdv4

If that's not it, my advice would be to break your query down into smaller chunks and submit this independently, to try and narrow down where the unexpected entries are being introduced. Happy to try and identify if it's a problem in biomaRt, email address is on the biomaRt landing page (https://bioconductor.org/packages/biomaRt/)

ADD COMMENT
0
Entering edit mode
5.6 years ago
adam.faranda ▴ 110

Thank you both for your prompt responses. Mike's answer was correct -- this appears to have been an issue with capitalization.

"SPATA24" doesn't look like a normal MGI symbol since it's all in caps, so I wouldn't be suprised if your query contains "Spata24" and the all caps version is retrieved too. Is it possible your gene list include capitalised versions of the 'Trdd1' etc? I don't think BioMart is case senstive and will still retrieve results for them e.g.

ADD COMMENT

Login before adding your answer.

Traffic: 2438 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6