I have 2 samples, each a different condition). I will refer to them as sample1 and sample2. What I did is I created a Seurat object for each one and then merged those 2 Seurat objects together... I just started with scRNA-seq analysis and not sure if this is the correct way to go about it...
# Integrate the datasets
merged.Seurat <- merge(x = sample1.SeuratObject, y = sample2.SeuratObject, add.cell.ids = c("sample1", "sample2"))
merged.Seurat
I then followed the Satija lab tutorial to do the filtering + visualization (https://satijalab.org/seurat/articles/pbmc3k_tutorial)
I ran the the non-linear dimensional reduction to visualize the dataset and saw that I needed to correct for batch effects (I followed a YouTube tutorial on this)
# Split the object by sample since we see batch effects comming from each sample. Save it under object.list
obj.list <- SplitObject(merged.Seurat.filtered, split.by = "sample")
obj.list
# For each object in the list -- need to run normalization and identify highly variable features
for(i in 1:length(obj.list)){
obj.list[[i]] <- NormalizeData(obj.list [[i]])
obj.list[[i]] <- FindVariableFeatures(obj.list[[i]])
}
# Select integration features
features <- SelectIntegrationFeatures(object.list = obj.list)
features
# Find integration anchors across samples using Canonical Correlation Analysis (CCA) method
anchors <- FindIntegrationAnchors(object.list = obj.list, anchor.features = features)
# Use anchors to integrate the data
seurat.integrated <- IntegrateData(anchorset = anchors)
# Scale data, run PCA and UMAP and visualize integrated data
seurat.integrated <- ScaleData(seurat.integrated)
seurat.integrated <- RunPCA(seurat.integrated)
seurat.integrated <- RunUMAP(seurat.integrated, dims = 1:15)
# Create plots
integrated.UMAP <- DimPlot(seurat.integrated, reduction = "umap", group.by = "sample", cols = c("black", "red"))
integrated.UMAP
# Save the object at this point so that it can easily be loaded back in without having to rerun the computationally intensive steps performed above
saveRDS(seurat.integrated, file = "../seurat.integrated.analysis.rds")
To assign identities to the clusters I used the SingleR package (followed YouTube tutorial)
library(SingleR)
library(celldex)
# Using built in reference: set your ref object by loading the MouseRNAseqData
ref <- celldex::MouseRNAseqData()
# Data frame of the mapped cell labels: call the SingleR function and pass your Seurat Object as a single cell Experiment
results <- SingleR(test = as.SingleCellExperiment(seurat.integrated), ref = ref, labels = ref$label.main)
results
# Take labels column and append it to your meta data
seurat.integrated$singlr_labels <- results$labels
seurat.integrated[[]]
DimPlot(seurat.integrated, reduction = "umap", group.by = "singlr_labels", label = TRUE, label.size = 3, repel = TRUE)
Now moving forward I would like to subcluster some of the clusters... I am a bit confused on how to go about it. Do I use FindSubCluster() ? Will this give me the types of cells that are differential between the samples in a specific cluster? Or do I use other packages to do this?
Guidance on this is highly appreciate it as I am lost with no mentorship from a bioinformatician in my lab.
Thank you thank you!
First, could you please explain any reason for not using Seurat V5 approach for your data integration mentioned in here?
You can subcluster a specific cluster using Seurat and do not necessarily have to use other packages. There are different ways of doing it. One way of doing it will be subset the cluster of your interest from your whole dataset and redo everything of upstream process but using unprocessed data. Another way of subclustering a cluster will be yes, using
FindSubCluster()
function in Seurat. Please see the discussion in here How to use FindSubCluster in Seurat?.I am not sure what are you asking here. But you can check the distribution of cells in a specific cluster as follows (this will give you the number of cells coming from your samples):