Update: Our manuscript has been published in Bioinformatics -- https://doi.org/10.1093/bioinformatics/btx635.
Abstract: We propose a data-adaptive, non-parametric, and non-regression approach to remove the biological signal to prepare the data for batch detection and then apply a semi-NMF method to obtain the estimation of the hidden batch factors associated with the samples. To isolate the batch signal, we uses fusion penalties that shrink each individual expression profile towards the means of its corresponding biological group in a non-parametric and data-adaptive manner. To ensure the stability of the estimated batch factors, we derive a consensus matrix by applying semi-NMF multiple times. There are three major advantages of our approach compared to existing approaches:
- it estimates batch effects from the data
- it makes no assumptions on data probability distributions (no log transformation as required by svaseq) and
- makes no assumptions on all genes affected at the same level by batch effects
Tool: Bioconductor R Package or github source code
User Guide: How to use DASC
If you are interested in obtaining Differentially expressed genes --
- Calculate the batch factors using DASC
- Use batch factor as a covariate in your DESeq2 model
Manuscript is under preparation; will be out soon with all the comparisons to existing methods/tools (& with a lot more examples.)