R package ArrayExpress error
1
0
Entering edit mode
4.9 years ago
MatthewP ★ 1.4k

The dataset I need is _E-MTAB-1803_. First I downloaded all files use wget. My code and error:

> library(ArrayExpress, quietly=TRUE)
> dlExp <- getAE(accession="E-MTAB-1803", path=".", local=TRUE, sourcedir=".")
Unpacking data files
> str(dlExp)
List of 8
 $ path            : chr "."
 $ rawFiles        : chr [1:170] "FR_1_U133_2.CEL" "FR_100_U133_2.CEL" "FR_103_U133_2.CEL" "FR_106_U133_2.CEL" ...
 $ rawArchive      : chr [1:6] "E-MTAB-1803.raw.1.zip" "E-MTAB-1803.raw.2.zip" "E-MTAB-1803.raw.3.zip" "E-MTAB-1803.raw.4.zip" ...
 $ processedFiles  : chr [1:2] "Bladder-RI-rma.txt" "Bladder-RI-CGH-norm.txt"
 $ processedArchive: chr "E-MTAB-1803.processed.1.zip"
 $ sdrf            : chr "E-MTAB-1803.sdrf.txt"
 $ idf             : chr "E-MTAB-1803.idf.txt"
 $ adf             : chr [1:2] "A-AFFY-44.adf.txt" "A-GEOD-16070.adf.txt"
> rawset <- ae2bioc(dlExp)
ArrayExpress: Reading pheno data from SDRF
Error in .subset2(x, i, exact = exact) : subscript out of bounds

I don't know what value can I parse to param dataCols of function ae2bioc. Reference said:

by default, the columns are automatically selected according to the scanner type. If the scanner is unknown or if the user wants to use different columns than the default, the argument ’dataCols’ can be set. For two colour arrays it must be a list with the fields ’R’, ’G’, ’Rb’ and ’Gb’ giving the column names to be used for red and green foreground and background. For one colour arrays, it must be a character string with the column name to be used. These column names must correspond to existing column names of the expression files.

ArrayExpress • 1.8k views
ADD COMMENT
0
Entering edit mode

I checked all the file I download and is fine.

ADD REPLY
1
Entering edit mode
4.9 years ago

The problem is that this ArrayExpress accession is storing data from 2 different array types:

  • A-AFFY-44 - Affymetrix GeneChip Human Genome U133 Plus 2.0 [HG-U133_Plus_2]
  • A-GEOD-16070 - CIT-CGH Homo sapiens BAC

So, the standard procedure with the ArrayExpress R package does not work. The specific error that you receive is thrown when reading the SDRF file, which contains entries for both arrays in the same file.

You can, however, 'trick' the ArrayExpress R package into thinking that there is just a single array type.

The first thing to do (outside R) is to edit the 'E-MTAB-1803.sdrf.txt' file to only include lines for one array type ,e.g., Affymetrix U133 (first 85 entries), and store these in 'E-MTAB-1803.sdrf_AffyU133B.txt':

Then:

library(ArrayExpress, quietly = TRUE)
dlExp <- getAE(accession = 'E-MTAB-1803')
dlExp$sdrf <- "E-MTAB-1803.sdrf_AffyU133B.txt"
dlExp$rawFiles <- dlExp$rawFiles[1:85]
rawset <- ae2bioc(dlExp)

ArrayExpress: Reading pheno data from SDRF
ArrayExpress: Reading data files
Platform design info loaded.
Reading in : /home/kblighe/Escritorio/FR_1_U133_2.CEL
Reading in : /home/kblighe/Escritorio/FR_100_U133_2.CEL
Reading in : /home/kblighe/Escritorio/FR_103_U133_2.CEL
...
Reading in : /home/kblighe/Escritorio/FR_99_U133_2.CEL
Read 32 items

rawset

ExpressionFeatureSet (storageMode: lockedEnvironment)
assayData: 1354896 features, 85 samples 
  element names: exprs 
protocolData
  rowNames: FR_1_U133_2.CEL FR_100_U133_2.CEL ... FR_99_U133_2.CEL (85
    total)
  varLabels: exprs dates
  varMetadata: labelDescription channel
phenoData
  rowNames: FR_1_U133_2.CEL FR_100_U133_2.CEL ... FR_99_U133_2.CEL (85
    total)
  varLabels: Source.Name Material.Type ...
    Factor.Value.MIBC.molecular.subtype. (55 total)
  varMetadata: labelDescription channel
featureData: none
experimentData: use 'experimentData(object)'
Annotation: pd.hg.u133.plus.2

Kevin

ADD COMMENT

Login before adding your answer.

Traffic: 1747 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6