Hi everyone,
I want to compare different microarray datasets from different platforms, which is the best method and package to use?
thank you very much for your help
Hi everyone,
I want to compare different microarray datasets from different platforms, which is the best method and package to use?
thank you very much for your help
NB - Major update to answer: December 16, 2019
It would help to state the specific arrays that you have.
There are different ideas on how best to do this - a search of the World Wide Web reveals this. I would not look for a package but instead begin to think critically about how it could work and what needs to be done.
Firstly, if you are interested in processing each dataset independently, then take my approach here: A: How to integrate multiple data sets from microarray platform prior meta-analysis
In this way, each dataset is processed and normalised independently. Then, they are respectively filtered so that genes are matched across all datasets, followed by a transformation to Z-scores (independently for each dataset). Once they are on the 'standard score distribution' (Z-scale), you can conduct statistical analyses on this merged dataset, and it may be recommended to include ArrayVersion
as a covariate in your models.
This is best for downstream analyses based on correlation analyses, like network analysis.
Attempt a 'direct' merge, as per the approach here: ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE6nnn/GSE6344/suppl/GSE6344_MethodsDetails.PDF
Here, datasets are again normalised independently and then genes are filtered so that they match. Then, a scaling factor is applied to one array so that the arrays can easily be merged together. There is further info here: Regarding Microarray Platforms
Further posts on this topic that I have made:
thank you very much for your reply, I want to normalize indipendently each datasets and after I was trying to find a system to normalize between the datasets. I was thinking about z-scores or one of the methods I read on the internet, but I am a bit afraid that a not optimal normalization can impact on the signal biological difference and be removed as batch effect
I think that, provided you include the array type as a covariate in modelling / statistical tests, then it should not be a major issue. The regression models should be capable of adjusting the differences due to the fact that the differences will be consistent across all transcripts.
Also, if you're doing correlation-based analysis like WGCNA and other network analyses, then you do not actually need to worry about adjustments across array. Getting the data to the Z-score stage would be beneficial though (i.e., same distribution).
Z-scores represent 'standardised scores', and the Z distribution is often referred to as the 'standard normal distribution'. A Z-score of 1 is equivalent to 1 standard deviation above the mean; whereas, e.g., -2.5, is 2.5 standard deviations below the mean. Z = 1.95 is equivalent to p = 0.05 (roughly).
So, bringing both datasets to a standard distribution allows for a more 'harmonious' merge; however, batch effects may still lurk, so, batch
ought to still be included as a covariate in the design formula.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
So you want to compare condition 1 on platform A with condition 2 on platform B? I'd say you can't.
Yes, the exact arrangement of the samples/conditions, and the array types is important
This question has been already discussed. Did you check ?
I saw some questions on the same topic, many are from 3-4 years ago, as someone suggested I checked insilicoDB and virtualarray packages and I saw that they are discontinued from the latest bioconductor version (which I am using).Someone suggested sva (I read the vignette and it is something that is not the best for my work) so I asked if asked again to know what are the best method and packages available now.
This paper is quite new, maybe you can read it which includes review of cross-platform normalization. It could be useful.
thank you, I read this article and it is quite interesting, some of the packages used are not anymore available.