Data scaling in single-cell RNAseq - sample-wise or on the full set?
1
0
Entering edit mode
11 months ago
e.r.zakiev ▴ 230

There was a similar post, but I'd like to explore the scaling part more profoundly here, as the current practices seem counter-intuitive to me.

This is from Seurat's basic integration vignette:

library(Seurat)
library(SeuratData)

# install dataset
InstallData("ifnb")

# load dataset
ifnb <- LoadData("ifnb")
# split the RNA measurements into two layers one for control cells, one for stimulated cells

ifnb[["RNA"]] <- split(ifnb[["RNA"]], f = ifnb$stim)

# run standard anlaysis workflow
ifnb <- NormalizeData(ifnb)
ifnb <- FindVariableFeatures(ifnb)
ifnb <- ScaleData(ifnb)

Let's pause here for a moment and think about why the scaling was done on the whole merged matrix of normalized counts. Shouldn't it have been done on a per-sample basis? Wouldn't we want to 'equalize' the ranges of gene expressions levels between samples before merging them?

ifnb <- ScaleData(ifnb, split.by = 'orig.ident')

But when I do the scaling this way, the visualisation is different from what they get in their plain analysis. Compare the un-integrated analysis in their case vs mine:

enter image description here

enter image description here

And the integrated analysis:

enter image description here

enter image description here

And before you say that this is not a big difference, this is only a toy example, in my own datasets the changes are quite drastic, let's say.

scRNA-seq seurat • 664 views
ADD COMMENT
2
Entering edit mode
11 months ago
bk11 ★ 3.0k

Let's pause here for a moment and think about why the scaling was done on the whole merged matrix of normalized counts. Shouldn't it have been done on a per-sample basis? Wouldn't we want to 'equalize' the ranges of gene expressions levels between samples before merging them?

If you are using Seurat V5, the issue you have asked here has been addressed Here. It does find variable features, normalization, scaling and dimensional reduction at individual sample level. If you find batch effect (you will always), you can integrate data using appropriate approach discussed there.

ADD COMMENT

Login before adding your answer.

Traffic: 1744 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6