I'm using the R package 'biomaRt' to retrieve ensembl ID's and descriptions for a list of gene symbols (eg "Sf1", "Rhox7a" etc. . . ). My query consists of 41203 symbols; biomart returns a result set with 30774 records corresponding to the gene symbols recognized by ensembl. The 30774 records returned included four genes that were not part of the original query.
My first thought was that the four 'mystery' genes were synonyms for something in my original query. I've since verified that none of the synonyms of these genes are in my query.
I am querying the mouse data set, and using the attribute 'external_gene_name' as my filter column Code used to query biomaRt
# 'gq': list of unique 'GeneID' submitted as biomart query
gq<-unique(dg$gene)
# attributes used for query
attr<-c("ensembl_gene_id", "external_gene_name", "description",
"ensembl_gene_id_version", "chromosome_name",
"gene_biotype"
)
# Query Submission
mart<-useMart(biomart="ensembl", dataset="mmusculus_gene_ensembl")
result<-getBM(mart=mart,
attributes=attr,
filters='external_gene_name',
values=gq
)
The mystery genes are:
setdiff(result$external_gene_name, gq)
[1] "Trdd2" "Trdv4" "Trdd1" "SPATA24"
Where "gq" is the list of genes submitted to ensembl. None of the above genes, nor any of their synonyms (synonyms recognized by ensembl at least) are in my original query. If anyone is willing to help me troubleshoot, I would be happy to send them the gene list I'm querying with.
That's very strange. Could you please send the list to helpdesk [at] ensembl.org and my colleagues and I will take a look at it.