Problems with ArrayExpress datasets downloads on Bioconductor
1
1
Entering edit mode
5.8 years ago
Davide Chicco ▴ 120

Hi all

I need to download 3 datasets from ArrayExpress: E-MTAB-5273, E-MTAB-5274, and E-MTAB-4451. I tried to do that in R by using the ArrayExpress() function of the ArrayExpress package on Bioconductor, but it generate errors for all the three cases. I am using R version 3.5.2 on Linux Ubuntu 18.

Here are the command I used and the errors generated.

First dataset -- E-MTAB-5273

I tried to download the first dataset through the following R commands:

# library installation and loading if (!requireNamespace("BiocManager", quietly = TRUE))

install.packages("BiocManager") BiocManager::install("ArrayExpress", version = "3.8")

library("ArrayExpress")

rawset = ArrayExpress("E-MTAB-5273") # download

and here's the error

And it generated the following error:

Unpacking data files
ArrayExpress: Reading pheno data from SDRF
Error in `.rowNamesDF<-`(x, value = value) :
  duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique value when setting 'row.names':
‘Burnham_sepsis_discovery_raw_237.txt’

Second dataset -- E-MTAB-5274

I tried to download the second dataset through the following R command:

rawset = ArrayExpress("E-MTAB-5274")

and it generated the following error:

Unpacking data files
ArrayExpress: Reading pheno data from SDRF
Error in `.rowNamesDF<-`(x, value = value) :
  duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique value when setting 'row.names':
‘Burnham_sepsis_validation_raw_108.txt’

Third dataset -- E-MTAB-4451

I tried to download the third dataset through the following R command:

rawset = ArrayExpress("E-MTAB-4451")

And I get the error:

Unpacking data files
ArrayExpress: Reading pheno data from SDRF
Error in which(sapply(seq_len(nrow(pData(ph))), function(i) all(pData(ph)[i,  :
  argument to 'which' is not logical
In addition: Warning message:
In readPhenoData(sdrf, path) :
  ArrayExpress: Cannot find 'Array Data File' column in SDRF. Object
might not be created correctly.

Questions: why am I getting these errors?

How to overcome these errors?

My goal is to be able to read the file content and the raw data.

Thanks!

arrayExpress bioconductor • 3.9k views
ADD COMMENT
0
Entering edit mode

Hi. I got the same error. Did you solve it?

ADD REPLY
0
Entering edit mode

same problem. Would appreciate if someone could give a solution . Thanks in advance.

ADD REPLY
0
Entering edit mode

Hi! I am trying to use the same data sets... Has any one find a solution around this problem ? Thanks

ADD REPLY
0
Entering edit mode

I will see whether I can reach the ArrayExpress maintainers at the Bioconductor Slack.

By the way, please add comments via ADD COMMENT and not the answer field unless you have an answer, thanks!

For technical Bioc-related questions you can always open a question at support.bioconductor.org using the name of the package as tag.

Edit: My guess is that the array type is simply not supported, opened an issue here: https://github.com/ebi-gene-expression-group/bioconductor-ArrayExpress/issues/2 Will see what they say.

ADD REPLY
2
Entering edit mode
3.8 years ago
Hannes ▴ 60

Hi all, maybe this might be of some help.

I built two types of functions for that task using the getAE function from the ArrayExpress Package. To me this seemed to create less often problems. (Still they happen from time to time)

Option 1:

If you only want to download the data and store locally feel free to use this tiny AEDownloadBulk function:

AEDownloadBulk <- function(accession, type = "processed", out = getwd()) {
  # create output dir
  dir.create(paste0(out,"/ArrayExpress"), showWarnings = F)
  setwd(paste0(out,"/ArrayExpress"))
  # run getAE() in loop for accession numbers
  for(i in accession) {
    dir.create(i, showWarnings = F)
    getAE(i, type = type, path = i, extract = T)
    zip <- list.files(i, full.names = T)[grep(".zip",list.files(i))]
    if(file.exists(zip[1])) file.remove(zip)
    }
  setwd("../")
}

It will generate a folder (if not specified with out in your pwd) named after the provided accession number and stores the downloaded files in there. It will further remove any zipped files. So to download the data from multiple entries you can simply run:

accession = c("E-MTAB-9056","E-MTAB-9054")
AEDownloadBulk(accession)

Option 2:

If you wish to download and directly import the data into your R session you can use the AEDownload function below.

AEDownload <- function(accession, type = "processed", out = getwd(), import = T) {
  if(length(accession)>1){stop("length(accession) > 1. Please provide only a single accession number. ")}
  # create output dir
  dir.create(paste0(out,"/ArrayExpress"), showWarnings = F)
  setwd(paste0(out,"/ArrayExpress"))

  # run getAE() in loop for accession numbers
  i = accession
  dir.create(i, showWarnings = F)
  AE <- getAE(i, type = type, path = i, extract = T)
  zip <- list.files(i, full.names = T)[grep(".zip",list.files(i))]
  if(file.exists(zip[1])) file.remove(zip)

  # Import data in R list object
  if(import == T) {
    ls <- list()
    sdrf <- list.files(i, full.names = T)[grep(".sdrf.txt",list.files(i))]
    idf <- list.files(i, full.names = T)[grep(".idf.txt",list.files(i))]
    ls[["sdrf"]] <- read.delim(file = sdrf, row.names = 1)
    ls[["idf"]] <- read.delim(file = idf)
    for(k in AE$processedFiles) {
      ls[[k]] <- read.table(file = paste(i,k,sep = "/"),row.names = 1)
    }
    ls[["info"]] <- AE
  }
  setwd("../")
  if(import == T) return(ls)
}

AEDownload will generate a folder named after the provided accession number and stores the downloaded files in there. It will further remove any zipped files. If import = TRUE the function will generate an R list object containing the idf, sdrf and downloaded files. In this case the function was designed to get the downloaded processed count matrices and annotation files quickly into R. For multiple accession IDs you have to write a loop:

accession = c("E-MTAB-9056","E-MTAB-9054")
for(i in accession) {
  x <- AEDownload(i)
  assign(i,x)
}

The provided code snippets can be found in a larger teaching script on ArrayExpress I wrote for my students and it is available on github.

ADD COMMENT

Login before adding your answer.

Traffic: 2202 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6