Question

The same Illumina expression data gathered by GEOquery and direct download shows different value after normalization?

0

Entering edit mode

7.4 years ago

BioMed ▴ 50

Dear all,

I have one question that needs your help. Suppose that I need to process GSE39340 data set.

M1. I used GEOquery to get the data and normalized it by the below commands:

    library(GEOquery)
    library(lumi)
    eset <- getGEO("GSE39340")
    lumi.N.Q <- lumiExpresso(eset$GSE39340_series_matrix.txt.gz, normalize.param = list(method='rsn'))
    write.exprs(lumi.N.Q, file = 'processedExampledata.txt')

M2. I don't use GEOquery but instead downloaded the txt file directly from GEO (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE39340) and processed using the same method described below:

example.lumi <- lumiR("GSE39340_non_normalized.txt") 
lumi.A.B <- lumiExpresso(example.lumi, normalize.param = list(method='rsn'))
write.exprs(lumi.A.B, file = 'processedExampledata1.txt')

However, when comparing the output files, the expression values of the same probe/sample are quite different. For example, ILMN_1343295 of GSM966273 (aka E31) were 11.67 vs 11.80 in processedExampledata and processedExampledata1, respectively. I don't know why.

Please let me know where I get lost.

Thank you.

illumina preprosessing normalization • 2.7k views

ADD COMMENT • link updated 7.4 years ago by andrew.j.skelton73 6.6k • written 7.4 years ago by BioMed ▴ 50

score 1 · Answer 1 · 2017-06-26

1

Entering edit mode

7.4 years ago

andrew.j.skelton73 6.6k

GEO generally holds a normalised and "raw" version of the data. I suspect that the getGEO function is by default pulling down the pre-normalised matrix, and that's why you're seeing the differences.

If you want to accurately reproduce the author's normalisation strategy, then it's often best to get in direct contact with them. You can however look for clues, such as this included in one of the sample's metadata:

The data were normalised using quantile normalisation with Illumina Genomestudio V2011.1 and gene expression module (1.9.0).

ADD COMMENT • link 7.4 years ago by andrew.j.skelton73 6.6k

0

Entering edit mode

Thank you very much for pointing it out. We should be careful when using getGEO when gathering Illumina arrays then.

ADD REPLY • link 7.4 years ago by BioMed ▴ 50