Recently getGEO
function of the package GEOquery suddenly started throwing the following error:
gse=getGEO("GSE106977",GSEMatrix=T)
#https://ftp.ncbi.nlm.nih.gov/geo/series/GSE106nnn/GSE106977/matrix/
#OK
#Found 2 file(s)
#/geo/series/GSE106nnn/GSE106977/
#Error in download.file(sprintf("https://ftp.ncbi.nlm.nih.gov/geo/series/%s/%s/matrix/%s", :
#cannot open destfile 'C:\Users\Marti\AppData\Local\Temp\Rtmp8cEqMH//geo/series/GSE106nnn/GSE106977', reason 'No such file or directory'
After some debugging I found that there was an error in the getAndParseGSEMatrices
hidden function
To overcome this issue I made a patch, that fix the issue.
First copy this code into Rstudio or wordpad
getAndParseGSEMatrices=function (GEO, destdir, AnnotGPL, getGPL = TRUE)
{
GEO <- toupper(GEO)
stub = gsub("\\d{1,3}$", "nnn", GEO, perl = TRUE)
gdsurl <- "https://ftp.ncbi.nlm.nih.gov/geo/series/%s/%s/matrix/"
b = getDirListing(sprintf(gdsurl, stub, GEO))
b=b[-1]
message(sprintf("Found %d file(s)", length(b)))
ret <- list()
for (i in 1:length(b)) {
message(b[i])
destfile = file.path(destdir, b[i])
if (file.exists(destfile)) {
message(sprintf("Using locally cached version: %s",
destfile))
}
else {
download.file(sprintf("https://ftp.ncbi.nlm.nih.gov/geo/series/%s/%s/matrix/%s",
stub, GEO, b[i]), destfile = destfile, mode = "wb",
method = getOption("download.file.method.GEOquery"))
}
ret[[b[i]]] <- parseGSEMatrix(destfile, destdir = destdir,
AnnotGPL = AnnotGPL, getGPL = getGPL)$eset
}
return(ret)
}
environment(getAndParseGSEMatrices)<-asNamespace("GEOquery")
assignInNamespace("getAndParseGSEMatrices", getAndParseGSEMatrices, ns="GEOquery")
Save the file as GEOpatch.R
in your working directory and then, when loading the GEOquery
library, source the saved file:
library(GEOquery)
source("GEOpatch.R")
Now it should get going...
gse=getGEO("GSE106977",GSEMatrix=T)
#https://ftp.ncbi.nlm.nih.gov/geo/series/GSE106nnn/GSE106977/matrix/
#OK
#Found 1 file(s)
#GSE106977_series_matrix.txt.gz
#trying URL 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE106nnn/GSE106977/matrix/GSE106977_series_matrix.txt.gz'
#Content type 'application/x-gzip' length 32707196 bytes (31.2 MB)
sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 16299)
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] GEOquery_2.40.0 Biobase_2.34.0 BiocGenerics_0.20.0
[4] BiocInstaller_1.24.0
loaded via a namespace (and not attached):
[1] httr_1.3.1 R6_2.2.2 tools_3.3.2 RCurl_1.95-4.10
[5] bitops_1.0-6 XML_3.98-1.9
You have not include the output of
sessionInfo()
, but I suspect that you are using an outdated version of R/Bioconductor. Updating to the most recent Bioconductor release and R version should fix the problem that you are noticing.As just an aside, rather than submitting a "tutorial" for how to patch an unknown version of GEOquery, the recommended approach is to report the error or problem via the official bug reporting mechanism to the author. In the case of GEOquery, that is a new issue on GitHub (which you did, I noticed). This way, everyone can benefit from any required code changes. In addition, having users copy-and-paste code without version information, testing, and build checks sidesteps the significant testing that occurs during the GEOquery and Bioconductor build processes.
Tagging Sean Davis.
Tnx a lot Martin. i was struggling alot with this prob and now i can finally do my stuff in relief.
Upgrading R to the current release version (3.4.) and then installing the matching GEOquery version will fix the problem noted in the original post.