to achieve this, but it's randomly crashing with strange errors. The benefit of the aroma package is that it is supposed to work within finite memory. With this package not working, I'm at a bit of a loss how to proceed.
Am I crazy trying to normalise 508 arrays all at once? Or is this a trivial amount compared to the large scale studies? Any advice would be greatly appreciated!
In BioConductor there are RMA and GCRMA functions that operate with much lower memory overheads than other functions. Have a look at justRMA(), justRMALite(), or their GCRMA equivalents.
If you can bear to step outisde of R there is also RMAExpress that should scale enough for you to get expression values out of a large number of arrays.
By the way the link in your post doesn't lead to a GEO entry, I'd be interested to see what experiment has 508 arrays in. Certainly normalising them together would be a question we could only answer if we knew what the experimental design was, what comparisons you wanted to make. You will undoubtedly want to assess potential batch effects in a dataset this large.
I had this very same problem last week (870 samples), and solved it by using justRMA() after asking for the different options I had. AROMA needed me to invest too much time to learn for what I needed, so it was my 3rd option.
You're not crazy. Henrik (the Aroma developer) is very actively developing this project, so try posting a specific bug report to the mailing list after you check to see if there is already an existing report for your error listing. Aroma is a nice package, though there are numerous things you have to do to get everything set up correctly.
I'd be surprised if a decent 64-bit linux box (e.g., 16+ GB RAM) couldn't run GCRMA on the whole swadge of files all at once.
You're not crazy: we routinely do normalization on several hundred arrays. However, we use RMA in the Bioconductor packages affy or simpleaffy. Also, we use a machine with 128 GB memory :-) but I think you could get away with less than that; perhaps 32 GB minimum.
This page has some simulations to analyse RMA memory use.
Since you want to analyze Affymetrix Mouse Gene 1.0 ST Arrays, just use the Affymetrix Power Tools. These command line tools can do normalization with < 2GB RAM.
You could also have a look at XPS Bioconductor package.
"The package handles pre-processing, normalization, filtering and analysis of Affymetrix GeneChip expression arrays, including exon arrays (Exon 1.0 ST: core, extended, full probesets), gene arrays (Gene 1.0 ST) and plate arrays on computers with 1 GB RAM only. "
Use the "justRMA"-like functions from the affy package
Switch to a single-array normalization method, like the Affymetrix MAS5, and normalize the files one by one, then merge the exprs(eset).
(Unrecommended) as partially pointed out in this paper you may divide your dataset in >100 array subsets, then use the multi-array normalization methods like RMA or GCRMA (or PLIER or FARMS), and finally merge the output. The result won't be identical as a full batch 500 normalization, but will be close (like <0.01 normalize log expression per probeset). Use this only in combination with point 5, as crazily-behaving samples are the real issue when comparing groups normalized in separate runs.
You can always filter out low-quality samples. A fast method is the deleted residuals approach (see here) which checks if any sample in the dataset significantly diverges from the average expression behavior using the KS test.
Good luck! :-) And no, you are not crazy, I had the same issue with 3700 microarrays once.
If you have access to a computer cluster you can try the package "affyPara" (http://www.bioconductor.org/packages/2.8/bioc/html/affyPara.html). It will distribute your data to different machines and supports many functions from the affy package. It will solve your memory problems and accelerate your calculation. Depending on your computer cluster I was able to normalized 12.000 arrays with rma.
Feel lucky you're not looking to normalize this set :) http://www.ncbi.nlm.nih.gov/projects/geo/query/acc.cgi?acc=GSE2109
To echo others, just get more RAM and/or explore the justXYZ() normalization functions.