I am trying to correct my RNA data from 3 sources of variation. I wrote this post asking about my problem and one of the users ( ATpoint) has recommended me to use removeBatchEffect
from limma
.
However, I cannot find any published information apart from this one to see how it works, in which cases you should use it, comparisons between other methods, etc. In general, more information to reference the reason why I chose this method and not another one.
In my case, I know that I have to do this to be able to adjust by 3 variables...
On the other hand, I would like to confirm if it is possible to keep and not touch the biological difference from my data such as age, sex and group and WBC count (several biological variables). Is there a way to preserve it? because the only mention that I always find is about preserving the treatment information.
The design matrix is used to describe comparisons between the samples, for example treatment effects, that should not be removed. The function (in effect) fits a linear model to the data, including both batches and regular treatments, then removes the component due to the batch effects
Note that my 2 main objectives are:
- Get adjusted counts for future analyses.
- Run linear mixed models (with
variancePartition
package) in order to assess the contribution of several variables (such as age, diabetes, sex, hypertension, cholesterol, etc) to the expression variation of each gene.
Could anybody help me, please?
Thanks very much in advance
Thanks very much for your replies, I really appreciate all the information that you provided. Thanks