Question

GEO Illumina HumanHT-12 V3.0 Series never include Raw Data?

1

Entering edit mode

8.7 years ago

Tim D ▴ 70

While gathering up data for a microarray meta analysis, I came across something odd. I hope I'm just being stupid and that I'm missing something blindingly obvious.

The GSE I'm looking at (GSE29312) was created using the Illumina HumanHT-12 V3.0 expression beadchip (GPL6947). Because I cannot with 100% certainty say what preprocessing and normalization steps have been performed to arrive at the deposited values, I was planning on re-processing them from the RAW data using the lumi bioconductor package. However, the submitted RAW data file is a 6.2 Mb file which just contains the GPL6947 bgx file: in other words, just the chip description.

Thinking I missed something, I looked for other datasets on the same platform, and I found that none of them actually contain the raw data (I didn't actually perform an exhaustive search, but none of the 12 randomly sampled datasets contained raw data.) They all just have that same 6.2 Mb description file.

So, what am I not seeing here? Are the raw data for illumina beadchips just never deposited? Because of size constraints maybe? Did I just get very unlucky in the datasets that I sampled? Are they stored somewhere that I overlooked?

GEO Microarray • 2.9k views

ADD COMMENT • link updated 4.7 years ago by Kevin Blighe 88k • written 8.7 years ago by Tim D ▴ 70

0

Entering edit mode

It does look like only processed data was deposited for these samples GSE29312.

ADD REPLY • link 8.7 years ago by GenoMax 147k

0

Entering edit mode

I looked at some datasets on illumina bead chip and it seems that what you are looking for is the second file (GSE29312_non-normalized.txt.gz).If I am not mistaken, those are intensities measured by scanner (similar to affymetrix CEL files)

ADD REPLY • link 8.7 years ago by minio.cz ▴ 10

score 0 · Answer 1 · 2020-02-27

A late answer, but:

For some Illumina studies, the raw data IDAT files are available. These can be input to an EListRaw object via the illuminaio package. For other studies, like those where only a file of the form *_non-normalized.txt.gz is available, these can be read into R using standard functions and then coerced to an EListRaw object manually.

The annotation BGX file is simply a compressed, tab-delimited file. It can be read in manually, too.

With your data as an EListRaw object, proceed with the advice in Limma's manual ( see section 17.3 - https://bioconductor.org/packages/release/bioc/vignettes/limma/inst/doc/usersguide.pdf ), which essentially involves normalisation via neqc() followed by further probe filtering.

Kevin