There was a similar post, but I'd like to explore the scaling part more profoundly here, as the current practices seem counter-intuitive to me.
This is from Seurat's basic integration vignette:
library(Seurat)
library(SeuratData)
# install dataset
InstallData("ifnb")
# load dataset
ifnb <- LoadData("ifnb")
# split the RNA measurements into two layers one for control cells, one for stimulated cells
ifnb[["RNA"]] <- split(ifnb[["RNA"]], f = ifnb$stim)
# run standard anlaysis workflow
ifnb <- NormalizeData(ifnb)
ifnb <- FindVariableFeatures(ifnb)
ifnb <- ScaleData(ifnb)
Let's pause here for a moment and think about why the scaling was done on the whole merged matrix of normalized counts. Shouldn't it have been done on a per-sample basis? Wouldn't we want to 'equalize' the ranges of gene expressions levels between samples before merging them?
ifnb <- ScaleData(ifnb, split.by = 'orig.ident')
But when I do the scaling this way, the visualisation is different from what they get in their plain analysis. Compare the un-integrated analysis in their case vs mine:
And the integrated analysis:
And before you say that this is not a big difference, this is only a toy example, in my own datasets the changes are quite drastic, let's say.