why does my biomaRt query return inconsistent dataset lists?
1
0
Entering edit mode
5.8 years ago
adam.faranda ▴ 110

I have been using the biomaRt library to retrieve ensembl gene ID's for mouse genes. This moring, I got an unusual error message when running a previously validated script:

mart <- useMart(biomart="ensembl", dataset="mmusculus_gene_ensembl")
Error in useDataset(mart = mart, dataset = dataset, verbose = verbose) : 
  The given dataset:  mmusculus_gene_ensembl , is not valid.  Correct dataset names can be obtained with the listDatasets function.

When I used the "listDatasets" function to check whether "mmusculus_gene_ensembl" is correct, I noticed that the query was returning a different number of results each time I ran it. Sometimes, "mmusculus_gene_ensembl" appears in this result set and other times it does not:

 > nrow(listDatasets(mart, verbose=T))
Attempting web service request:
http://www.ensembl.org:80/biomart/martservice?type=datasets&requestid=biomaRt&mart=ENSEMBL_MART_ENSEMBL
[1] 51
> nrow(listDatasets(mart, verbose=T))
Attempting web service request:
http://www.ensembl.org:80/biomart/martservice?type=datasets&requestid=biomaRt&mart=ENSEMBL_MART_ENSEMBL
[1] 116
> nrow(listDatasets(mart, verbose=T))
Attempting web service request:
http://www.ensembl.org:80/biomart/martservice?type=datasets&requestid=biomaRt&mart=ENSEMBL_MART_ENSEMBL
[1] 27

This behavior has been consistent all day. My R session info is below:

R version 3.3.3 (2017-03-06)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X Mavericks 10.9.5

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] biomaRt_2.30.0

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.0           IRanges_2.8.2        XML_3.98-1.17        digest_0.6.18        bitops_1.0-6         DBI_1.0.0            stats4_3.3.3         RSQLite_2.1.1       
 [9] blob_1.1.1           S4Vectors_0.12.2     tools_3.3.3          bit64_0.9-7          Biobase_2.34.0       RCurl_1.95-4.11      bit_1.1-14           parallel_3.3.3      
[17] BiocGenerics_0.20.0  AnnotationDbi_1.36.2 memoise_1.1.0
ensembl R biomaRt • 1.4k views
ADD COMMENT
1
Entering edit mode
5.8 years ago
Mike Smith ★ 2.1k

There was an issue with biomaRt that manifested when Ensembl release 91 introduced datasets with apostrophes in e.g. "Ma's Night Monkey" which would lead to the error you are seeing. See https://support.bioconductor.org/p/104025/#104043 or A: biomaRt mmusculus_gene_ensembl dataset for more details.

You are currently using old versions of both R and biomaRt. I would suggest updating both, in particular you will need biomaRt version 2.34.1 or newer to handle this correctly.

ADD COMMENT

Login before adding your answer.

Traffic: 1866 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6