Hello! I have a question about batch correction for single cell RNA sequencing (scRNAseq) experiments. I inherited scRNAseq data for wildtype and mutant mouse embryos. I thought that all samples were sequenced together, but I just discovered that all wildtype embryos were sequenced in one sequencing run, and mutant embryos were sequenced in a separate, later run. Does anyone know if it is possible to check and correct for batch effects in this scenario? My main concerns are 1) if I don't correct, expression differences between WTs and mutants may be due to batch effects, and 2) if I do correct using standard methods, real differences in expression between WTs and mutants may be lost. Any insight would be appreciated!
Batch is perfectly correleted with condition in your case, so no. My intuiton though is that illumina run won't contribute an appreciable batch effect.
@OP, please define "batch". Have the samples been prepared in different days, meaning the actual RNA extraction, cDNA synthesis, library prep or was that done together and just the sequencing itself on the Illumina machine was done on different days? The former is the batch effect source, the latter, as mentioned, not really.
That having said, there are two types of batch corrections in (sc)RNA-seq, the per-gene correction where you directly modify the counts and the per-cell corrections, in the sc context often called integration or anchoring. BOth are widely different, with different assumptions and aims, please describe what the analysis goal is.
I used the SCTransform function (method glmGamPoi) to normalize and scale counts before and after integration/anchoring. Below is my relevant R code. Filtering was done on individual samples prior to merging. My main goals are to 1) identify cell clusters in which the proportion of cells differs significantly between WTs and mutants and 2) identify differentially expressed genes between WTs and mutants using pseudo-bulk RNAseq analysis.
Oh yes, I should have clarified. Methods were the same, but all steps - dissection, RNA extraction, library prep, and sequencing - were performed at different times. Dissections and RNA extractions were done by the same lab members and library prep and sequencing were performed at the same core facility and using the same instruments (personnel probably varied).
So it’s perfectly confounded, meaning integration/anvhoring is the only thing you can do.