There is more than one way to skin a cat.
So is with Seurat. There is more than one way you can analyze your scRNASeq data using Seurat. And mostly it is guided by the data you have in hand. Given two normalization strategies that Seurat provides i.e lognormalization
and SCT
the analysis regimens can be classified as follows:
Say you have two scRNASeq samples s_ctrl
and s_treat
. And you wish to carry out Differential Expression analysis post proper cell clustering.
Now you can possibly skin your scRNASeq data in following ways:
LogNormalization
- Merge
s_ctrl
ands_treat
matrix and perform logNormalisation on this concatenated matrix and perform clustering and other down stream analysis. - Perform logNormalisation separately on
s_ctrl
ands_treat
matrix and then merge the two matrix and perform clustering and other down stream analysis. - Integrate the
sctrl
ands_treat
samples by separately performing the logNormalisation on each matrix and following standard Seurat protocol to carry out further data analysis.
SCT Normalization
- Merge
s_ctrl
ands_treat
matrix and perform SCT Normalization on this concatenated matrix and perform clustering and other down stream analysis. - Perform SCT Normalization on
s_ctrl
ands_treat
matrix separately and then merge them both to perform clustering and other down stream analysis. - Integrate the
sctrl
ands_treat
samples by separately performing the SCT Normalization on each matrix and following standard Seurat protocol to carry out further data analysis.
Strategies 3 and 6 are clearly discussed in Seurat Integration Workflow here. However, such a clarity has not been offered as to when merging
is appropriate and when integration
. Some explanation has been offered by HBCTraining material here which states that:
Generally, we always look at our clustering without integration before deciding whether we need to perform any alignment. Do not just always perform integration because you think there might be differences - explore the data.
and
Condition-specific clustering of the cells indicates that we need to integrate the cells across conditions to ensure that cells of the same cell type cluster together.
Also, integration method expects “correspondences” or shared biological states among at least a subset of single cells across the groups.
Now, let's assume that our s_ctrl
and s_treat
overlaps fairly in UMAP and there is no condition specific clustering (or stacking) being observed when we merged the matrix and performed the clustering. Which strategy out of 1, 2, 4, 5 is appropriate for our data. No systematic efforts has been made until recently (a paper in bioRxiv) to address that question and the question has remained unaddressed in the below given seurat issues
and biostars posts.
GitHub Issues: Issue 1, issue 2, issue 3, issue 4, issue 5
Biostars Issues: Post 1, Post 2
The bioRxiv paper mentioned above discuss the abovementioned 4 strategies and observe over-merging when using SCTransform
both strategies 4 and 5 as shown below and finds strategy 2 most appropriate. The code use by the paper is shared here. But I wish to understand and gather thoughts from the scRNASeq
community which approach works well and when and invite them for further discussion on this neglected yet important data analysis approach that affects downstream analysis.