how to normalize two microarray dataset coming from different platform
1
1
Entering edit mode
6.9 years ago

Hi everyone,

I want to compare different microarray datasets from different platforms, which is the best method and package to use?

thank you very much for your help

R • 5.5k views
ADD COMMENT
1
Entering edit mode

So you want to compare condition 1 on platform A with condition 2 on platform B? I'd say you can't.

ADD REPLY
0
Entering edit mode

Yes, the exact arrangement of the samples/conditions, and the array types is important

ADD REPLY
0
Entering edit mode

This question has been already discussed. Did you check ?

ADD REPLY
0
Entering edit mode

I saw some questions on the same topic, many are from 3-4 years ago, as someone suggested I checked insilicoDB and virtualarray packages and I saw that they are discontinued from the latest bioconductor version (which I am using).Someone suggested sva (I read the vignette and it is something that is not the best for my work) so I asked if asked again to know what are the best method and packages available now.

ADD REPLY
1
Entering edit mode

This paper is quite new, maybe you can read it which includes review of cross-platform normalization. It could be useful.

ADD REPLY
0
Entering edit mode

thank you, I read this article and it is quite interesting, some of the packages used are not anymore available.

ADD REPLY
3
Entering edit mode
6.9 years ago

NB - Major update to answer: December 16, 2019

----------------------

It would help to state the specific arrays that you have.

There are different ideas on how best to do this - a search of the World Wide Web reveals this. I would not look for a package but instead begin to think critically about how it could work and what needs to be done.

1, 'Z-score' merge

Firstly, if you are interested in processing each dataset independently, then take my approach here: A: How to integrate multiple data sets from microarray platform prior meta-analysis

In this way, each dataset is processed and normalised independently. Then, they are respectively filtered so that genes are matched across all datasets, followed by a transformation to Z-scores (independently for each dataset). Once they are on the 'standard score distribution' (Z-scale), you can conduct statistical analyses on this merged dataset, and it may be recommended to include ArrayVersion as a covariate in your models.

This is best for downstream analyses based on correlation analyses, like network analysis.

2, 'direct' merge

Attempt a 'direct' merge, as per the approach here: ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE6nnn/GSE6344/suppl/GSE6344_MethodsDetails.PDF

Here, datasets are again normalised independently and then genes are filtered so that they match. Then, a scaling factor is applied to one array so that the arrays can easily be merged together. There is further info here: Regarding Microarray Platforms

--------------------------------------

Further posts on this topic that I have made:

ADD COMMENT
0
Entering edit mode

thank you very much for your reply, I want to normalize indipendently each datasets and after I was trying to find a system to normalize between the datasets. I was thinking about z-scores or one of the methods I read on the internet, but I am a bit afraid that a not optimal normalization can impact on the signal biological difference and be removed as batch effect

ADD REPLY
1
Entering edit mode

I think that, provided you include the array type as a covariate in modelling / statistical tests, then it should not be a major issue. The regression models should be capable of adjusting the differences due to the fact that the differences will be consistent across all transcripts.

Also, if you're doing correlation-based analysis like WGCNA and other network analyses, then you do not actually need to worry about adjustments across array. Getting the data to the Z-score stage would be beneficial though (i.e., same distribution).

ADD REPLY
0
Entering edit mode

thank u for your help. it was my question too. but I don't know why you recommended to calculate z-score?

ADD REPLY
0
Entering edit mode

Z-scores represent 'standardised scores', and the Z distribution is often referred to as the 'standard normal distribution'. A Z-score of 1 is equivalent to 1 standard deviation above the mean; whereas, e.g., -2.5, is 2.5 standard deviations below the mean. Z = 1.95 is equivalent to p = 0.05 (roughly).

So, bringing both datasets to a standard distribution allows for a more 'harmonious' merge; however, batch effects may still lurk, so, batch ought to still be included as a covariate in the design formula.

ADD REPLY

Login before adding your answer.

Traffic: 1941 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6