Question

normalization in single cell

0

Entering edit mode

3 months ago

kayah ▴ 20

enter image description here

I'm curious that even though I get through all the NormalizeData process and ScaleData process, but 4 group don't have same cell counts. I want to compare four groups, and I think it’s important to adjust the cell numbers in each group to be equal in order to make a proper comparison and identify markers. Otherwise, bias could occur. I’ll share my code, so if there are any mistakes or areas that need adjustment, please let me know.

library(Seurat)
library(SeuratObject)
library(dplyr)
library(tidyverse)
library(patchwork)

WAT_M_Y <- Read10X(data.dir = "~/Desktop/GSE137869/WAT-M-Y/") 
WAT_M_Y <- CreateSeuratObject(counts = WAT_M_Y, 
                              project = "WAT_Male_Young",
                              min.cells = 3,
                              min.features = 200)
WAT_M_Y [["percent.mt"]]<-PercentageFeatureSet(WAT_M_Y, pattern =  "^Mt-")


WAT_M_O <- Read10X(data.dir = 
                     "~/Desktop/GSE137869/WAT-M-O/") 
WAT_M_O <- CreateSeuratObject(counts = WAT_M_O, 
                              project = "WAT_Male_Old",
                              min.cells = 3,
                              min.features = 200)
WAT_M_O [["percent.mt"]]<-PercentageFeatureSet(WAT_M_O, pattern =  "^Mt-")


WAT_F_Y <- Read10X(data.dir = 
                     "~/Desktop/GSE137869/WAT-F-Y/") 
WAT_F_Y <- CreateSeuratObject(counts = WAT_F_Y, 
                              project = "WAT_Female_Young",
                              min.cells = 3,
                              min.features = 200)
WAT_F_Y [["percent.mt"]]<-PercentageFeatureSet(WAT_F_Y, pattern =  "^Mt-")


WAT_F_O <- Read10X(data.dir = 
                     "~/Desktop/GSE137869/WAT-F-O/") 
WAT_F_O <- CreateSeuratObject(counts = WAT_F_O, 
                              project = "WAT_Female_Old",
                              min.cells = 3,
                              min.features = 200)
WAT_F_O [["percent.mt"]]<-PercentageFeatureSet(WAT_F_O, pattern =  "^Mt-")

WAT <- merge(WAT_M_Y, y = c(WAT_M_O, WAT_F_Y, WAT_F_O), 
             project = "WAT")

WAT@meta.data$type <- c(rep("Young", ncol(WAT_M_Y)),
                        rep("Old", ncol(WAT_M_O)),
                        rep("Young", ncol(WAT_F_Y)),
                        rep("Old", ncol(WAT_F_O)))
WAT@meta.data$sex <- c(rep("Male", ncol(WAT_M_Y)),
                       rep("Male", ncol(WAT_M_O)),
                       rep("Female", ncol(WAT_F_Y)),
                       rep("Female", ncol(WAT_F_O)))
view(WAT@meta.data)
VlnPlot(WAT,features = c("nFeature_RNA", "nCount_RNA", "percent.mt"))
VlnPlot(WAT,features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), 
        pt.size=0,
        group.by="orig.ident")


plot1 <- FeatureScatter(WAT, feature1 = "nCount_RNA", feature2 = "percent.mt") 
plot2 <- FeatureScatter(WAT, feature1 = "nCount_RNA", feature2 = "nFeature_RNA") 
plot1 + plot2

WAT$lowQC <- ifelse(WAT$nFeature_RNA>500 & 
                      WAT$percent.mt <10, "PASS", "FAIL")
plot1 <- FeatureScatter(WAT, feature1 = "nCount_RNA", feature2 = "percent.mt",
                        group.by = "lowQC", cols= c("PASS"="black", "FAIL"="red"))
plot2 <- FeatureScatter(WAT, feature1 = "nCount_RNA", feature2 = "nFeature_RNA",
                        group.by = "lowQC", cols= c("PASS"="black", "FAIL"="red"))
plot1 + plot2
table(WAT$lowQC)

WAT<- subset(WAT, subset = lowQC == "PASS")

WAT <- NormalizeData(WAT)
WAT_variable  <- FindVariableFeatures(WAT, selection.method = "vst", nfeatures = 2000) 
WAT_variable <- ScaleData(WAT_variable, vars.to.regress = c("percent.mt"))
VariableFeaturePlot(object= WAT_variable)
WAT_variable <- RunPCA(WAT_variable, features = VariableFeatures(WAT_variable), verbose = T) 
DimPlot(WAT_variable,
        reduction = "pca",
        group.by= "orig.ident",
        split.by = "orig.ident")
DimPlot(WAT_variable,split.by = "sex")

#batch correction 
library(harmony)
library(Rcpp)

WAT_variable2 <- RunHarmony(WAT_variable, group.by.vars = "sex")
#ElbowPlot(WAT_variable2, ndims = 50)
WAT_variable2 <- RunUMAP(WAT_variable2, dims = 1:10, reduction = "harmony") %>%
  FindNeighbors(., dims = 1:10, reduction = "harmony") %>%
  FindClusters(., resolution = 0.2)%>%
  FindClusters(., resolution = 0.5)%>%
  FindClusters(., resolution = 0.8)%>%
  FindClusters(., resolution = 1.0)%>%
  FindClusters(., resolution = 1.2)%>%
  FindClusters(., resolution = 1.4)

normalization scRNA-seq single-cell • 677 views

ADD COMMENT • link updated 3 months ago by Bastien Hervé 5.9k • written 3 months ago by kayah ▴ 20

score 3 · Accepted Answer · 2024-08-20

3

Entering edit mode

3 months ago

Bastien Hervé 5.9k

I think there is a missunderstanding of your expectation on normalization and scaling.

There is generally no need for downsampling the number of cells to be equivalent in each group before clustering.

The NormalizeData function will normalized gene counts within each cell and the ScaleData will scale and center your counts. No cells are been removed in those steps.

Harmony takes care of ajusting your prinicpal components in regards of an effect you are trying to regress (here sex). This step will only modifying your clustering and projection.

If you want to compare gene expression between 2 clusters in the same sample you can use FindMarkers, or if you want to compare gene expression between conditions you can use the AggregateExpression, which aggregate and normalize your counts, and if you have enough statistical power you can give the output to differentially expressed genes methods like DESeq2 or EdgeR.

ADD COMMENT • link 3 months ago by Bastien Hervé 5.9k

0

Entering edit mode

First of all, thank you very much for your kind response. I’m looking to quantitatively compare a specific cell cluster between conditions. Is there a normalization method that I can use for this?and I'm curious that is it a typical anlaysis method that comparing between cell clusters with different condition... thank you again..!!

ADD REPLY • link 3 months ago by kayah ▴ 20

0

Entering edit mode

What do you mean by "quantitatively compare" ? Do you want to compare cell numbers (compositional) or gene expression ?

For the former you can use scCODA, for the latter I explained it in the last paragraph of my answer.

ADD REPLY • link 3 months ago by Bastien Hervé 5.9k

0

Entering edit mode

To ask more specifically, I am currently conducting research on aging and focusing on the differences in a specific cluster between old and young groups. My question is, for example, if the number of cells in the male_old group is different and I want to compare the cluster difference with the male_young group, should I match the total number of cells before making the comparison? In this case, would it be appropriate to compare by looking at the proportion of a specific cluster in each condition, or should I unify the cell count across all groups and then check for differences in the clusters? I’m not sure which method to use, so I wanted to ask.

ADD REPLY • link 3 months ago by kayah ▴ 20

0

Entering edit mode

To summarize, I want to compare clusters between different conditions! Specifically, I want to identify which clusters differ and determine which genes within those clusters show significant differences, with the goal of finding potential biomarkers.

ADD REPLY • link 3 months ago by kayah ▴ 20

0

Entering edit mode

No, you don't need to downsample your number of cells, but I hope you have replicates for each condition.

If you do a pseudobulk expression analysis, the number of cells will be normalized at the size factor estimation step after aggregating your cells by clusters-samples.

If you are interested in gene expression comparison between conditions (old/yound) within each cluster you can have a look at this workflow, from seurat counts object to DESeq2 analysis.

ADD REPLY • link 3 months ago by Bastien Hervé 5.9k