Question

scRNAseq Normalisation Question/Help

1

Entering edit mode

3.1 years ago

Alex Gibbs ▴ 90

Hi everyone,

Apologies if this has been posted before but I can't find an answer and am struggling to understand the literature on what to do.

A bit of background: I have 4 samples that I have sent for 10x single cell sequencing. I have run the raw data through the Cell Ranger pipeline to generate the raw count matrices. I am now stuck on the normalization process.

I have imported the 4 samples into Seurat using the same filters (min.cells=3 & min.features=200) as 4 separate Seurat objects. I then combined them into one Seurat object using the merge function. I then filtered the samples after QC checks (nFeature_RNA > 200 & < 6500 & percent.MT < 20) and performed NormalizeData function using normalization.method = "LogNormalize".

To assess the normalzation method, I checked the expression of GAPDH, which looked good. However, the distributions of expression values did not look good. I tried the other normalization methods, such as CLR using the margin = 1 or 2 argument and am still getting bad normalizations.

I have checked many articles/vignettes and am thinking perhaps I should be processing and normalizing these samples separately and then combining them into one object afterwards? Would I use the "LogNormalize" method on each sample and then integrate them through the 'IntegrateData' function, or should I use SCTransform and then combine afterwards?

My overall goal here is to be able to explore each sample and the cells etc within each and then also compare cell clusters etc between samples, hence why I would like to combine them into one object.

Thank you very much in advance!!

scRNAseq Seurat • 2.4k views

ADD COMMENT • link updated 3.1 years ago by Friederike 9.0k • written 3.1 years ago by Alex Gibbs ▴ 90

0

Entering edit mode

the distributions of expression values did not look good

If by this you mean that the clustering looks really odd before integration, that's normal. If anything, I would just follow whatever the documentation says to do (for Seurat, Harmony, Scanorama, etc).

ADD REPLY • link 3.1 years ago by Griffen Wakelin ▴ 10

score 3 · Answer 1 · 2021-11-10

the distributions of expression values did not look good

Can you share what you're seeing?

I should be processing and normalizing these samples separately and then combining them into one object afterwards?

Small comment first: just because all samples are stored in the same object, does not mean, that they HAVE to be treated the exact same way.

To get to your actual question: It's typically more useful to define the best filtering settings for every batch separately. To find those, you may have to go all the way to clustering for each sample separately to identify clusters that may look suspicious.

Would I use the "LogNormalize" method on each sample and then integrate them through the 'IntegrateData' function, or should I use SCTransform and then combine afterwards?

The typical Seurat workflow is to normalize each object separately and then to do the integration (see vignette. In fact, they have a section explicitly dealing with integration and scTransform.

Whether you should use scTransform or not is up to you. In addition to the original paper, you can read up on some potential downsides here.

While Seurat seems easy to get started with, it can become a bit unwieldy and difficult to understand what is actually being done and why. I you'd like to develop a better understanding of the actual steps, I highly recommend the Bioconductor resource OSCA and its chapter on multi-sample workflows Just came across this summary of why some labs choose to go with BioC rather than Seurat which may give you some context, too.