Modelling vs removing effects from confounding variables and their uses for downstream analyses in RNA-seq experiments
2
1
Entering edit mode
2.3 years ago
Jasmin ▴ 20

Hi,

I have read that when one encounters a confounding variable, the preferred method to account for their effects would be to add this variable to the design formula but not remove them using limma's removebatcheffect function.
My questions on this topic are:

  • should I remove batch effect using limma's removebatcheffect function if the confounding effect is large and inconsistent as suggested here (https://support.bioconductor.org/p/125386/#125387). For example, when one can see in a pca plot and heatmap that samples still cluster according to the presence or absence of a confounding variable even though it has been added to the design formula (eg when samples cluster by sex after it has been added to the design formula).

  • should one use the transformed counts after removal of a confounding effect or the untransformed counts (and only add the confounding variable to the design formula) for heatmaps, pca plots and DE analysis?

removebatcheffect sva heatmap plot limma pca • 1.1k views
ADD COMMENT
4
Entering edit mode
2.3 years ago

Confounders should be modeled where that is possible, not removed. This particularly applies to DE analysis, or anything that relies on count-based statistics, like the Poisson or Negative Binomial distribution.

Where you are not modeling the data, but just presenting it - like in heapmaps or pca, then modeling the confounder isn't possible, and it should be removed.

Where the counfounder is known, you can directly use it in the design formula, or remove it using limma::removeBatchEffects. Where the confounder is unknown, then you must identify it using something like SVA.

ADD COMMENT
1
Entering edit mode
2.3 years ago
LauferVA 4.5k

False dichotomy (sort of - both concepts exist sure but one subsumes the other in effect).

In a general linear model, the explanatory value of the model is the ratio of the sums of squares accounted for by the model to the total sums of squares.

If you have something like Y = B0 + B1X1 + B2X2 + B3X3 + E

and you generate an effect size estimate (beta) for each term, nothing prevents you from subtracting the remaining predictors (other than the explanatory variable(s)) out.

Thus, while you are controlling for the covariates, in the sense you can just subtract them with respect to the final explanatory value of the model (Explained SS / Total SS), you can both model covariates and remove their effects.

But, the general principle is you never discard data. Rather, model it as a covariate. For any downstream application you have that might benefit from removal (e.g. a picture before and after controlling for Age), you can in effect remove them anyway. But you do that by storing the data and doing each aspect of the analysis and visualization thoughtfully.

ADD COMMENT

Login before adding your answer.

Traffic: 1634 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6