I am attempting to do an integrated analysis of microarray data. The normalization of each data set is disparate, quantile norm, z-socre, and RMA. Is it still possible to integrate and analyze them with ComBat?
I am attempting to do an integrated analysis of microarray data. The normalization of each data set is disparate, quantile norm, z-socre, and RMA. Is it still possible to integrate and analyze them with ComBat?
Yes, and other packages:
While newer packages such as ComBat
and sva
are aimed more at RNA-seq, but the original versions of it and other packages were designed for Microarray data and if I recall correctly still include functionality for microarray data. These approaches will provide information on the best kind of input to provide (raw, normalized, etc.).
See, for instance,
Johnson WE, Li C, Rabinovic A (2007) Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics, 8 (1), 118-127
Leek JT and Storey JD. (2007) Capturing heterogeneity in gene expression studies by ‘Surrogate Variable Analysis’. PLoS Genetics, 3: e161.
From the sva
reference manual (here):
sva
has functionality to estimate and remove artifacts from high dimensional data the sva function can be used to estimate artifacts from microarray data thesvaseq
function can be used to estimate artifacts from count-based RNA-sequencing (and other sequencing) data.The
ComBat
function can be used to remove known batch effecs from microarray data.The
fsva
function can be used to remove batch effects for prediction problems.
I have also outlined a "DIY" approach that is a good starting point for many data types: Batch correction for Nanopore RNAseq.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Hello, VAL! Thank you for your comment.
It is very informative with helpful attachments. I thought that the state of each dataset (raw , normalization, etc.) must be unified before using ComBat.
Now I am thinking that at least the dataset that will be used as a reference when integrating (the dataset that contains all conditions) might be manageable as long as it is normalized.
Thank you for your wealth of knowledge and kind response.
can you process them so that they are all normalized, or all consistent, THEN run ComBat?
VL
The state of the dataset was varied.
Raw signals, quantile norm, Z-scores, RMA, etc., etc.
I am now trying to figure out how to maintain consistency.
But I thought ComBat would adjust other databases based on the reference dataset. If the reference dataset (the dataset that contains all the terms) is normalized, will the other datasets be adjusted as well?
hi yoshi -
combat will return batch corrected values for each input. but it will not convert them to the same data type (I think - I have not checked in a while).
But, at some point in the analysis, you will either need to harmonize them, or you will have to perform all the analyses on different data.
I usually find it easier/better to try to harmonize as many of the datasets as possible to one data type as early as possible; if possible granted the datasets.
so, in this case, to the degree possible, id try to convert the data to the same type of values early on, then run the rest of the pipeline - but, I am not aware of a specific need to do this before running combat...
Thanks for your response! Oh, really? I was mistaken.
Then, for example, it looks like Combat would be a good idea to unify all of them with quantile norm at an early stage.
The one I am analyzing is public database, and the upload included RMA and z-score etc.
It seems to be difficult to convert z-score to quantile norm and so on, It still seems difficult to integrate.
please double check me on that using the SVA/ComBat documentation