Question

Forum:Correct method of Merging/Integrating multiple single cell dataset using Seurat

5

Entering edit mode

2.7 years ago

rohitsatyam102 ▴ 920

There is more than one way to skin a cat.

So is with Seurat. There is more than one way you can analyze your scRNASeq data using Seurat. And mostly it is guided by the data you have in hand. Given two normalization strategies that Seurat provides i.e lognormalization and SCT the analysis regimens can be classified as follows:

Say you have two scRNASeq samples s_ctrl and s_treat. And you wish to carry out Differential Expression analysis post proper cell clustering.

Now you can possibly skin your scRNASeq data in following ways:

LogNormalization

Merge s_ctrl and s_treat matrix and perform logNormalisation on this concatenated matrix and perform clustering and other down stream analysis.
Perform logNormalisation separately on s_ctrl and s_treat matrix and then merge the two matrix and perform clustering and other down stream analysis.
Integrate the sctrl and s_treat samples by separately performing the logNormalisation on each matrix and following standard Seurat protocol to carry out further data analysis.

SCT Normalization

Merge s_ctrl and s_treat matrix and perform SCT Normalization on this concatenated matrix and perform clustering and other down stream analysis.
Perform SCT Normalization on s_ctrl and s_treat matrix separately and then merge them both to perform clustering and other down stream analysis.
Integrate the sctrl and s_treat samples by separately performing the SCT Normalization on each matrix and following standard Seurat protocol to carry out further data analysis.

Strategies 3 and 6 are clearly discussed in Seurat Integration Workflow here. However, such a clarity has not been offered as to when merging is appropriate and when integration. Some explanation has been offered by HBCTraining material here which states that:

Generally, we always look at our clustering without integration before deciding whether we need to perform any alignment. Do not just always perform integration because you think there might be differences - explore the data.

and

Condition-specific clustering of the cells indicates that we need to integrate the cells across conditions to ensure that cells of the same cell type cluster together.

Also, integration method expects “correspondences” or shared biological states among at least a subset of single cells across the groups.

Now, let's assume that our s_ctrl and s_treat overlaps fairly in UMAP and there is no condition specific clustering (or stacking) being observed when we merged the matrix and performed the clustering. Which strategy out of 1, 2, 4, 5 is appropriate for our data. No systematic efforts has been made until recently (a paper in bioRxiv) to address that question and the question has remained unaddressed in the below given seurat issues and biostars posts.

GitHub Issues: Issue 1, issue 2, issue 3, issue 4, issue 5

Biostars Issues: Post 1, Post 2

The bioRxiv paper mentioned above discuss the abovementioned 4 strategies and observe over-merging when using SCTransform both strategies 4 and 5 as shown below and finds strategy 2 most appropriate. The code use by the paper is shared here. But I wish to understand and gather thoughts from the scRNASeq community which approach works well and when and invite them for further discussion on this neglected yet important data analysis approach that affects downstream analysis.

enter image description here

seurat scrnaseq • 3.3k views

ADD COMMENT • link 2.7 years ago by rohitsatyam102 ▴ 920