Question

removeBatchEffect with or without log2 values

2

Entering edit mode

6.4 years ago

Sebastian Hesse ▴ 350

Using LIMMA's removeBatchEffect and PVCA to check for batch influences before and after removal I just noticed that if I don't use log2 values the batch effect removal does not work in the sense that PVCA shows retained batch effects after removal. With log2 values all runs fine.

Is this a general feature of removeBatchEffect?

Its a bit strange as later on, eg for VOOM, non-log2 values are required. (though to be fair not for the corrected data as those won't go into the diff expression analysis).

Thanks for your help! Sebastian

removeBatchEffect limma r • 7.3k views

ADD COMMENT • link 6.4 years ago by Sebastian Hesse ▴ 350

2

Entering edit mode

6.4 years ago

Sebastian Hesse ▴ 350

As VOOM gave strange results that still showed heteroscedasticity I solved it now using log2 values and the LIMMA option eBayes(fit2_object, trend = F, robust = T)

Thank you all for your input!

ADD COMMENT • link 6.4 years ago by Sebastian Hesse ▴ 350

score 2 · Accepted Answer · 2019-02-27

There are multiple tools live SVA, COMBAT, edgeR function, limma function, RUVSEq, etc.

PVCA is doing PCA + Variance component analysis to extract factors that influence your variability using linear mixed model Steps :

So normalized data + PVCA = confounders of batch effects (Assessing and identifying confounders)
Perform correction or adjustments of the confounders extracted from 1 using any standard batcheffect adjustment methods and perform again PVCA to visualize on normalized-adjusted- log transformed data to see if confounders are assessed and what you see as primary gnee variability is due to biological phenotypic variation.( Viewing post-batch adjustments on all the genes expressed in all samples)
However, for DE analysis it is the model matrix and the model effects/covariates that will be pulled out from 1(batch effect confounders) . These effects should be modeled around your count data for any linear model fitting. Any results of DE should be then viewed in with log transformed/corrected (from 2) data for visualizations like heatmaps, expression box plots, etc. (using confounders as covariates in model design for linear fit with limma on counts data to perform DEA)

My only two cents are, batch effect removal is not the key, one needs to adjust for it rather not deduct it. You are trying to understand what are the confounders in your data and how they mess around. You do it from counts data and any normalization that entails should be used while performing the effect analysis. What you see for plots later can be viewed via log transformation. However, for downstream differential analysis you don't use log2 transformed batch corrected data. One uses the counts data, pulls out the effect information, adds it either as covariates in model design for differential expression or use it for adjustments. I suggest to you take a look at the below links to understand how it is done, what is the underlying statistics and variance associated , and that you do not make any over-fitting.

Removing continuous covariate effects in limma analysis

Good luck!

Edit 1: I just realized there was a bit wrong flow I suggested earlier. So corrected it.