I'm curious that even though I get through all the NormalizeData process and ScaleData process, but 4 group don't have same cell counts. I want to compare four groups, and I think it’s important to adjust the cell numbers in each group to be equal in order to make a proper comparison and identify markers. Otherwise, bias could occur. I’ll share my code, so if there are any mistakes or areas that need adjustment, please let me know.
library(Seurat)
library(SeuratObject)
library(dplyr)
library(tidyverse)
library(patchwork)
WAT_M_Y <- Read10X(data.dir = "~/Desktop/GSE137869/WAT-M-Y/")
WAT_M_Y <- CreateSeuratObject(counts = WAT_M_Y,
project = "WAT_Male_Young",
min.cells = 3,
min.features = 200)
WAT_M_Y [["percent.mt"]]<-PercentageFeatureSet(WAT_M_Y, pattern = "^Mt-")
WAT_M_O <- Read10X(data.dir =
"~/Desktop/GSE137869/WAT-M-O/")
WAT_M_O <- CreateSeuratObject(counts = WAT_M_O,
project = "WAT_Male_Old",
min.cells = 3,
min.features = 200)
WAT_M_O [["percent.mt"]]<-PercentageFeatureSet(WAT_M_O, pattern = "^Mt-")
WAT_F_Y <- Read10X(data.dir =
"~/Desktop/GSE137869/WAT-F-Y/")
WAT_F_Y <- CreateSeuratObject(counts = WAT_F_Y,
project = "WAT_Female_Young",
min.cells = 3,
min.features = 200)
WAT_F_Y [["percent.mt"]]<-PercentageFeatureSet(WAT_F_Y, pattern = "^Mt-")
WAT_F_O <- Read10X(data.dir =
"~/Desktop/GSE137869/WAT-F-O/")
WAT_F_O <- CreateSeuratObject(counts = WAT_F_O,
project = "WAT_Female_Old",
min.cells = 3,
min.features = 200)
WAT_F_O [["percent.mt"]]<-PercentageFeatureSet(WAT_F_O, pattern = "^Mt-")
WAT <- merge(WAT_M_Y, y = c(WAT_M_O, WAT_F_Y, WAT_F_O),
project = "WAT")
WAT@meta.data$type <- c(rep("Young", ncol(WAT_M_Y)),
rep("Old", ncol(WAT_M_O)),
rep("Young", ncol(WAT_F_Y)),
rep("Old", ncol(WAT_F_O)))
WAT@meta.data$sex <- c(rep("Male", ncol(WAT_M_Y)),
rep("Male", ncol(WAT_M_O)),
rep("Female", ncol(WAT_F_Y)),
rep("Female", ncol(WAT_F_O)))
view(WAT@meta.data)
VlnPlot(WAT,features = c("nFeature_RNA", "nCount_RNA", "percent.mt"))
VlnPlot(WAT,features = c("nFeature_RNA", "nCount_RNA", "percent.mt"),
pt.size=0,
group.by="orig.ident")
plot1 <- FeatureScatter(WAT, feature1 = "nCount_RNA", feature2 = "percent.mt")
plot2 <- FeatureScatter(WAT, feature1 = "nCount_RNA", feature2 = "nFeature_RNA")
plot1 + plot2
WAT$lowQC <- ifelse(WAT$nFeature_RNA>500 &
WAT$percent.mt <10, "PASS", "FAIL")
plot1 <- FeatureScatter(WAT, feature1 = "nCount_RNA", feature2 = "percent.mt",
group.by = "lowQC", cols= c("PASS"="black", "FAIL"="red"))
plot2 <- FeatureScatter(WAT, feature1 = "nCount_RNA", feature2 = "nFeature_RNA",
group.by = "lowQC", cols= c("PASS"="black", "FAIL"="red"))
plot1 + plot2
table(WAT$lowQC)
WAT<- subset(WAT, subset = lowQC == "PASS")
WAT <- NormalizeData(WAT)
WAT_variable <- FindVariableFeatures(WAT, selection.method = "vst", nfeatures = 2000)
WAT_variable <- ScaleData(WAT_variable, vars.to.regress = c("percent.mt"))
VariableFeaturePlot(object= WAT_variable)
WAT_variable <- RunPCA(WAT_variable, features = VariableFeatures(WAT_variable), verbose = T)
DimPlot(WAT_variable,
reduction = "pca",
group.by= "orig.ident",
split.by = "orig.ident")
DimPlot(WAT_variable,split.by = "sex")
#batch correction
library(harmony)
library(Rcpp)
WAT_variable2 <- RunHarmony(WAT_variable, group.by.vars = "sex")
#ElbowPlot(WAT_variable2, ndims = 50)
WAT_variable2 <- RunUMAP(WAT_variable2, dims = 1:10, reduction = "harmony") %>%
FindNeighbors(., dims = 1:10, reduction = "harmony") %>%
FindClusters(., resolution = 0.2)%>%
FindClusters(., resolution = 0.5)%>%
FindClusters(., resolution = 0.8)%>%
FindClusters(., resolution = 1.0)%>%
FindClusters(., resolution = 1.2)%>%
FindClusters(., resolution = 1.4)
First of all, thank you very much for your kind response. I’m looking to quantitatively compare a specific cell cluster between conditions. Is there a normalization method that I can use for this?and I'm curious that is it a typical anlaysis method that comparing between cell clusters with different condition... thank you again..!!
What do you mean by "quantitatively compare" ? Do you want to compare cell numbers (compositional) or gene expression ?
For the former you can use scCODA, for the latter I explained it in the last paragraph of my answer.
To ask more specifically, I am currently conducting research on aging and focusing on the differences in a specific cluster between old and young groups. My question is, for example, if the number of cells in the male_old group is different and I want to compare the cluster difference with the male_young group, should I match the total number of cells before making the comparison? In this case, would it be appropriate to compare by looking at the proportion of a specific cluster in each condition, or should I unify the cell count across all groups and then check for differences in the clusters? I’m not sure which method to use, so I wanted to ask.
To summarize, I want to compare clusters between different conditions! Specifically, I want to identify which clusters differ and determine which genes within those clusters show significant differences, with the goal of finding potential biomarkers.
No, you don't need to downsample your number of cells, but I hope you have replicates for each condition.
If you do a pseudobulk expression analysis, the number of cells will be normalized at the size factor estimation step after aggregating your cells by clusters-samples.
If you are interested in gene expression comparison between conditions (old/yound) within each cluster you can have a look at this workflow, from seurat counts object to DESeq2 analysis.