Batch correction for scRNAseq when wildtypes and mutants were sequenced in separate batches?
1
0
Entering edit mode
2.5 years ago
ahaan • 0

Hello! I have a question about batch correction for single cell RNA sequencing (scRNAseq) experiments. I inherited scRNAseq data for wildtype and mutant mouse embryos. I thought that all samples were sequenced together, but I just discovered that all wildtype embryos were sequenced in one sequencing run, and mutant embryos were sequenced in a separate, later run. Does anyone know if it is possible to check and correct for batch effects in this scenario? My main concerns are 1) if I don't correct, expression differences between WTs and mutants may be due to batch effects, and 2) if I do correct using standard methods, real differences in expression between WTs and mutants may be lost. Any insight would be appreciated!

batch correction cell single sequencing scRNAseq RNA • 1.7k views
ADD COMMENT
1
Entering edit mode

Batch is perfectly correleted with condition in your case, so no. My intuiton though is that illumina run won't contribute an appreciable batch effect.

ADD REPLY
2
Entering edit mode

@OP, please define "batch". Have the samples been prepared in different days, meaning the actual RNA extraction, cDNA synthesis, library prep or was that done together and just the sequencing itself on the Illumina machine was done on different days? The former is the batch effect source, the latter, as mentioned, not really.

ADD REPLY
1
Entering edit mode

That having said, there are two types of batch corrections in (sc)RNA-seq, the per-gene correction where you directly modify the counts and the per-cell corrections, in the sc context often called integration or anchoring. BOth are widely different, with different assumptions and aims, please describe what the analysis goal is.

ADD REPLY
0
Entering edit mode

I used the SCTransform function (method glmGamPoi) to normalize and scale counts before and after integration/anchoring. Below is my relevant R code. Filtering was done on individual samples prior to merging. My main goals are to 1) identify cell clusters in which the proportion of cells differs significantly between WTs and mutants and 2) identify differentially expressed genes between WTs and mutants using pseudo-bulk RNAseq analysis.

library(dplyr)
library(Seurat)
library(patchwork)
library(ggplot2)
library(Matrix)
library(sctransform)
library(glmGamPoi)

#Split data by orig.ident to perform normalization and find variable features
Embryos.list <- SplitObject(Embryos_Filt, split.by="orig.ident")
Embryos.list <- lapply(X = Embryos.list, FUN = function(x) {
  x <- NormalizeData(x, verbose = FALSE)
  x <- FindVariableFeatures(x, selection.method="vst",verbose = FALSE)
})

#Select features for integration
features <- SelectIntegrationFeatures(object.list = Embryos.list)
#Scale data and run PCA
Embryos.list <- lapply(X = Embryos.list, FUN = function(x) {
  x <- SCTransform(x, method = "glmGamPoi",verbose = F)
  x <- RunPCA(x, features = features, verbose = FALSE)
})

#choose anchors to integrate the dataset
anchors <- FindIntegrationAnchors(object.list = Embryos.list, reference = c(1, 2), reduction = "rpca",dims = 1:50)
#Use anchors to integrate the dataset
Embryo.integrated <- IntegrateData(anchorset = anchors, dims = 1:50)

DefaultAssay(Embryo.integrated) <- "integrated"

#Scale the counts
Embryo.integrated <- SCTransform(Embryo.integrated, method = "glmGamPoi",verbose = F)
ADD REPLY
0
Entering edit mode

Oh yes, I should have clarified. Methods were the same, but all steps - dissection, RNA extraction, library prep, and sequencing - were performed at different times. Dissections and RNA extractions were done by the same lab members and library prep and sequencing were performed at the same core facility and using the same instruments (personnel probably varied).

ADD REPLY
0
Entering edit mode

So it’s perfectly confounded, meaning integration/anvhoring is the only thing you can do.

ADD REPLY
3
Entering edit mode
2.5 years ago
LauferVA 4.5k

this is called a problem of perfect separation. you won't truly know if the differences seen are attributable to batch (whatever that mean) or the biological / treatment differences. the way to solve this problem is to avoid it during the experimental design stage. Its exceedingly common, even in very reputable labs.

ADD COMMENT
0
Entering edit mode

Thank you for the input...that's what I thought. We are currently designing additional, follow-up experiments, and I am making sure this won't happen again.

ADD REPLY
0
Entering edit mode

one way to start to get at this is to get other data on the same cells, in the same conditions, done on the same technology, etc. etc. if you can find very analogous datasets and they look pretty similar to your own, might be ok to go.

do you have reason to suspect that the conditions were very different (other than what you intended to change)?

ADD REPLY

Login before adding your answer.

Traffic: 2021 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6