Gene read count-level batch correction in scRNA-seq?
1
0
Entering edit mode
3.3 years ago
Jeff-Gui ▴ 10

Hi, I'm working on the integration of several scRNA-seq datasets. After trying Seurat v3 and Harmony, I realized they outputs dimension reduction matrix rather than correct read counts, therefore not suitable for some downstream analysis on gene-expression level. I wonder if there are any software that can correct batch effect on read counts.

RNA-seq NGS single-cell-sequencing • 3.6k views
ADD COMMENT
0
Entering edit mode

What downstream analysis are you considering?

ADD REPLY
0
Entering edit mode

I think if things are not corrected on gene level, the visualization (heatmap, feature plot) of the combined dataset will be less informative. The network analysis (e.g. WGCNA) requires expression matrix input as well.

ADD REPLY
1
Entering edit mode
3.3 years ago
FlMai ▴ 10

Depending on which platform you use, you could try Scanorama. The correct funktion of scanorama does correct the count matrix, also there is an external wrapper for scanpy. For comparison of different batch-correction methods have a look at A benchmark of batch-effect correction methods for single-cell RNA sequencing data

ADD COMMENT
0
Entering edit mode

Thanks for reminding. Harmony does not output counting matrix therefore not suitable for gene-based analysis. It seems that Seurat also has counting matrix output but my previous experience is that the "integrated" assay does not have count matrix. Does anyone know which slot it locates? Here the latest review discusses more detail. https://academic.oup.com/nar/article/49/7/e42/6125660

ADD REPLY
0
Entering edit mode

Hi, this is a very late comment, since you posted your question 21 months ago.. I also had the same question as the original post. I did try seurat integration method for batch correction. The output matrix is stored in 'integrated' assay of the seurat object. object@assays$integrated@data However, this matrix has only 2000 features (genes) which are the variable features selected for the integration process. So I believe this matrix is not suitable for downstream DEG analysis. Have you found an answer to your original question? Please let me know if you did, because I am still searching for an answer haha

ADD REPLY
0
Entering edit mode

For most integration methods the integrated values should only be used for clustering and dimension reduction. If you want to perform differential expression and have replicates it should ideally be performed at the pseudobulk level using the original UMI counts.

ADD REPLY
0
Entering edit mode

Hi, I've been searching community threads and I see a consensus that one should never use 'integrated' data (batch-corrected) for differential expression analysis.

But can I still use SCTransform + integration for the purpose of combining datasets? For example, I have 4 seurat objects = 4 replicates = 4 batches, and I wanna combine them into one single object. After SCTransform + integration, I have 'RNA' assay, 'SCT' assay, and 'integrated' assay in the final single object.

Then for DE using generalized linear model, can I use the matrix in @assays$SCT$count? My glm will be: gene expression ~ gRNA + nFeature_RNA + percent.mt + batch + sex

Unfortunately, it's hard for me to use pseudobulk method. I have to subset data by cell type (there will be dozens of different cell types) and then within cell type I have to subset by gRNA expression. I have like 85 different guides for CRISPR... It's kinda complicated..

ADD REPLY
1
Entering edit mode

But can I still use SCTransform + integration for the purpose of combining datasets?

Yep! If I was unclear in my original message your can combine your datasets using SCTransform + Integration. The usual Seurat integration workflow is SCTransform → integration → dimension reduction → clustering/annotation. usually after this point is when you go back to the original or log normalized counts dependent on the downstream analysis.

ADD REPLY

Login before adding your answer.

Traffic: 2620 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6