Puzzling Error Message While Working Through A Bioconductor Tutorial On Microarrays
3
2
Entering edit mode
12.2 years ago
Mycroft34 ▴ 120

I tried recently somme tutorial on microarray data analysis, using either the following link: http://bioinformatics.knowledgeblog.org/2011/06/20/analysing-microarray-data-in-bioconductor/ or the chapter on bioconductor from "R in a nutshell". After installing and loading the GEOquery package, I tried loading data as indicated:

library(GEOquery)
getGEOSuppFiles("GSE20986")

and I was returned the following error message, in both cases:

    [1] "ftp://ftp.ncbi.nlm.nih.gov/pub/geo/DATA/supplementary/series/GSE20986/"
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  : 
  line 1 did not have 6 elements

This is only an example, since the file involved in "R in a nutshell" is GSE2034, producing the same error. As I understand the error message, it tells me that the line 1 has a size different from the 6 elements expected for the data.frame; this is supprising for data retrieved from the NCBI server; so I think something else is faulty. Did anyone has had such an error and found what was wrong and how bypass this block. Thanks in advance.

I use R 2.15.1 on ubuntu 12.04, with bioconductor 2.10; here is the result of sessionInfo():

    R version 2.15.1 (2012-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=fr_FR.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=fr_FR.UTF-8        LC_COLLATE=fr_FR.UTF-8    
 [5] LC_MONETARY=fr_FR.UTF-8    LC_MESSAGES=fr_FR.UTF-8   
 [7] LC_PAPER=C                 LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] GEOquery_2.23.5     Biobase_2.16.0      BiocGenerics_0.2.0 
[4] BiocInstaller_1.4.7

loaded via a namespace (and not attached):
[1] RCurl_1.91-1 tools_2.15.1 XML_3.9-4
r bioconductor • 6.0k views
ADD COMMENT
2
Entering edit mode

Did you check that your libcurl supports ftp? http://www.omegahat.org/RCurl/FAQ.html

ADD REPLY
0
Entering edit mode

Thanks for your help; I followed what was indicated on this page:

curl -V
curl 7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.1 zlib/1.2.3.4 libidn/1.23 librtmp/2.3
Protocols: dict file ftp ftps gopher http https imap imaps ldap pop3 pop3s rtmp rtsp smtp smtps telnet tftp 
Features: GSS-Negotiate IDN IPv6 Largefile NTLM NTLM_WB SSL libz TLS-SRP

It seems that curl support ftp; is that different from libcurl ?

ADD REPLY
1
Entering edit mode

Did you get a message that looked like:

[1] "ftp://ftp.ncbi.nlm.nih.gov/pub/geo/DATA/supplementary/series/GSE20986/"
trying URL 'ftp://ftp.ncbi.nlm.nih.gov/pub/geo/DATA/supplementary/series/GSE20986//GSE20986_RAW.tar'
ftp data connection made, file length 56360960 bytes
opened URL
==================================================
downloaded 53.8 Mb
ADD REPLY
0
Entering edit mode

Thanks for your help; No I didn't; I see what you mean; the double // would be the source of the problem; but I just got the message from above;

however, I remember having such a message in another circumstance; how did you solved this problem ?

For the moment, and specifically for the web tutorial (http://bioinformatics.knowledgeblog.org/2011/06/20/analysing-microarray-data-in-bioconductor/), I bypassed the block by downloading the files.

But the problem remains for the "R in a nutshell example".

ADD REPLY
0
Entering edit mode
12.2 years ago
brentp 24k

This works for me with sessionInfo pasted below. Perhaps try setting LC_ALL=C to test as only our locales differ.

> sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
[1] C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] GEOquery_2.23.5     Biobase_2.16.0      BiocGenerics_0.2.0 
[4] BiocInstaller_1.4.7

loaded via a namespace (and not attached):
[1] RCurl_1.91-1 XML_3.9-4    tools_2.15.1

EDIT:

Is your libcurl built with FTP support? http://www.omegahat.org/RCurl/FAQ.html

ADD COMMENT
0
Entering edit mode

Thanks for your reply; I tried this LC_ALL=C settings, but the error remained.

ADD REPLY
0
Entering edit mode

Works for me too, with locale = en_AU.UTF-8. The error message is misleading; it just means that getGEOSuppFiles() could not access the remote file(s) for some reason. Possibly a transient network error.

ADD REPLY
0
Entering edit mode

Thanks for this info; I also thought that network could be the problem; all internet traffic in our institute is passing through a proxy; could it be the cause of that error ? and how could I manage to bypass this block ?

ADD REPLY
0
Entering edit mode

See the help for download.file.

ADD REPLY
1
Entering edit mode

I read it and set the proxy using

export HTTP_PROXY (and  the same for FTP_PROXY)

before running R; I also checked that the proxy was set in R using Sys.getenv(), and it appeared set; but I am still having the same error:

> getGEOSuppFiles('GSE20986')
[1] "ftp://ftp.ncbi.nlm.nih.gov/pub/geo/DATA/supplementary/series/GSE20986/"
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  : 
line 1 did not have 6 elements

I also updated the packages, since I remembered RCurl having been updated some days ago. A new version have been installed (with several warnings, but nothing else); I am running now RCurl version 1.95-0.1.

I have also received the message :

Setting options('download.file.method.GEOquery'='curl')

after loading GEOquery. Does it means that linux curl is used instead of RCurl ?

ADD REPLY
0
Entering edit mode

I don't know why but I'm getting the same error message when trying to use BiomaRt. I'm attaching here my error. Looks very similar with yours. Commands:

library("biomaRt")

ensembl = useMart("ensembl",dataset="hsapiensgeneensembl")

affyids=c("202763at","209310sat","207500at")

getBM(attributes=c('affyhgu133plus2', 'entrezgene'), filters = 'affyhgu133plus2', values = affyids, mart = ensembl)

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : line 4 did not have 2 elements

R version 2.15.1 (2012-06-22)

Platform: x86_64-pc-linux-gnu (64-bit)

locale: [1] LCCTYPE=enUS.UTF-8 LCNUMERIC=C LCTIME=enUS.UTF-8
[4] LC
COLLATE=enUS.UTF-8 LCMONETARY=enUS.UTF-8 LCMESSAGES=enUS.UTF-8
[7] LC
PAPER=C LCNAME=C LCADDRESS=C
[10] LCTELEPHONE=C LCMEASUREMENT=enUS.UTF-8 LCIDENTIFICATION=C

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] biomaRt2.12.0 BiocInstaller1.4.7

loaded via a namespace (and not attached): [1] RCurl1.95-0.1 tools2.15.1 XML_3.95-0.1

ADD REPLY
0
Entering edit mode

Since changing the RCurl version did not solved my problem (with GEOquery), I tried your case; I suggest that you edit your reply, since the name of the attribute from getBM is incorrect: it should be

"affy_hg_u133_plus_2"

instead of "affyhgu133plus2"; that makes reproducing your case a little bit difficult, since I had to retrieve the correct name.

The same apply to "hsapiensgeneensembl", that should be

"hsapiens_gene_ensembl".

Correction: to have the name correctly inserted in your message, you should put your R commands as code (inserting 4 space in front of it); otherwise, the underscores are removed (at least in the comments).

ADD REPLY
0
Entering edit mode
12.1 years ago

Finally, I solved the problem!

You need to download the previous version of RCurl http://cran.r-project.org/src/contrib/Archive/RCurl/RCurl_1.91-1.tar.gz

and install using the command:

install.packages("~/Downloads/RCurl_1.91-1.tar.gz", repos=NULL)

Thank you brentp for the insights!

ADD COMMENT
0
Entering edit mode

Unfortunately, changing RCurl version did not correct my own problem with the GEOquery package. May be another package ought to be backed to a previous version; does anyone has an idea what package it would be ? (see sessionInfo in original message).

ADD REPLY
0
Entering edit mode
10.1 years ago
zhanxw ▴ 20

I manually download files, and then manually import the data:

gset <- getGEO(filename="GSE10246_series_matrix.txt.gz", GSEMatrix = TRUE)
ADD COMMENT

Login before adding your answer.

Traffic: 1759 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6