Question

Batch effect correction in DE analysis of single cell RNA-seq data and visualization

0

Entering edit mode

6.1 years ago

Poorya Parvizi ▴ 60

Hi Everyone,

In my experiment, I have cells (same cell type) with 5 different drug treatments. These cells are sequenced in 3 batches and the goal is to find differentially expressed genes between treatments.

I use SCDE for the differential expression analysis. SCDE accept gene read counts as an optimal input and there is an argument in the script that it takes batch information to deal with batch effect (In the PCA I can see the batches). In the end, I have differentially expressed genes and their significance, but it doesn't give an access to the user its batch effect corrected values (I guess this is also true in Seurat).
As I can't get corrected values from SCDE and in order to continue the analysis based on DE genes, I used the Limma removeBatchEffect function on my read counts (or TPM or log2(TPM+1)). In this case, In outputs, I get negative values (probably due to regressing out) or all zeros in each batch change to the slightly bigger or smaller values (i.e. 0 to 0.1234).

What is the solution to get proper batch affected corrected values to continue the analysis with? Is the thing I do fine? My batch effect correction methods in SCDE and Limma are different, Is it okay? What people usually do.

Seurat Batch-effect SCDE RNA-Seq Limma • 3.3k views

ADD COMMENT • link updated 7 months ago by Ram 44k • written 6.1 years ago by Poorya Parvizi ▴ 60

score 2 · Answer 1 · 2018-10-31

Please use SVA package in R : https://bioconductor.org/packages/release/bioc/manuals/sva/man/sva.pdf

SVA can help you with identifying surrogate variables in the data and also account for batch effect removal using COMBAT

SVA is easy to use and if you find no surrogate variables then just use betweenLaneNormalization from EDASeq package with upper quantile normalization.

Also, you can try RUVSeq [https://www.bioconductor.org/packages/devel/bioc/vignettes/RUVSeq/inst/doc/RUVSeq.pdf] with RUVg normalization where you take counts of ~5000 least DE transcripts to normalize the whole counts matrix. This normalization is very effective with batch effect and sequencing effects.

Cheers !!