While gathering up data for a microarray meta analysis, I came across something odd. I hope I'm just being stupid and that I'm missing something blindingly obvious.
The GSE I'm looking at (GSE29312) was created using the Illumina HumanHT-12 V3.0 expression beadchip (GPL6947). Because I cannot with 100% certainty say what preprocessing and normalization steps have been performed to arrive at the deposited values, I was planning on re-processing them from the RAW data using the lumi bioconductor package. However, the submitted RAW data file is a 6.2 Mb file which just contains the GPL6947 bgx file: in other words, just the chip description.
Thinking I missed something, I looked for other datasets on the same platform, and I found that none of them actually contain the raw data (I didn't actually perform an exhaustive search, but none of the 12 randomly sampled datasets contained raw data.) They all just have that same 6.2 Mb description file.
So, what am I not seeing here? Are the raw data for illumina beadchips just never deposited? Because of size constraints maybe? Did I just get very unlucky in the datasets that I sampled? Are they stored somewhere that I overlooked?
It does look like only processed data was deposited for these samples GSE29312.
I looked at some datasets on illumina bead chip and it seems that what you are looking for is the second file (GSE29312_non-normalized.txt.gz).If I am not mistaken, those are intensities measured by scanner (similar to affymetrix CEL files)