Question

normalizing methods for microarray

0

Entering edit mode

9.2 years ago

Shamim Sarhadi ▴ 220

Hi

If I want to merge some GSEs from GEO, what types of normalizing methods do you recommend me, FRMA or SCAN?

I think, I should notice that my datasets are from different platform,and I'm looking for DEGs

R bioconductor statistics • 1.7k views

ADD COMMENT • link updated 2.2 years ago by Ram 44k • written 9.2 years ago by Shamim Sarhadi ▴ 220

Ram · Answer 1 · 2015-09-24

Combining data is not an easy task, and it should only be undertaken if it provides a substantial benefit. Often approaches where one dataset is analysed (pilot), and a separate dataset is analysed (followup), are more beneficial than artificially combining datasets.

If you still insist on merging datasets from GEO, you'll need to try and get the raw data. Raw data is preferable as you can apply your own normalisation technique, and keep things consistent (normalised GEO datasets could be using different methods between experiments). Alternatively you can use the two normalised expression sets and use an additive model to try and account for the variation.

Using additive models requires that you have the same sample types across experiments to accurately estimate the between dataset variance. Additionally you need to map up probes in some way, nuIDs are probably the best method as they're then using the same nucleotide sequence.

As for normalisation method, that'll depend on what approach you want to take, but it also depends on the platform the dataset is derived from.

Ram · Answer 2 · 2015-09-25

There is not a general normalization procedure that will make your data comparable across platforms. You'll need to determine whether the experimental designs of the various GSEs are compatible with the questions you want to ask. To be concrete, comparing sample type A in GSE1 with sample type B in GSE2 is probably not justifiable without great care and thought (though people have been known to try it). Comparing sample types A and B, both represented in both platforms is much easier to justify, though it is still difficult to interpret results in many cases; in this case, using the single "best" dataset may be simpler and give meaningful results.