Despite the layers being unique, Seurat still maintains a single Features/Cells validation. This means that when you mess with individual layers like that, it will still maintain the widest set of features that captures all features in all the layers and just zero out any missing features in an individual layer.
If you want to filter features where there are low individual gene counts, you can make a list of Features to keep and then just run:
filtered_seurat <- subset(filtered_seurat, features = keep.features)
Additionally, I have a general question: how should one handle analysis with multiple layers when performing filters, assigning cell cycle scores, and identifying/removing doublets?
I run DoubletFinder on individual libraries before merging. This is more to conserve memory as I run exploratory analysis on a workstation. You can use the LayerData() function above to pull out individual counts. Here are some generic functions I use adapted from DoubletFinder's github.
## spits out the expected multiplet rate based on input value of library size
get.10x.multiplets <- function(x){
expected.df <- data.frame(
"Cells.Recovered" = c(500, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000),
"Multiplet.Rate" = c(0.4, 0.8, 1.6, 2.4, 3.2, 4.0, 4.8, 5.6, 6.4, 7.2, 8.0)
)
lm.res <- lm(Multiplet.Rate~Cells.Recovered, data = expected.df)
test.df <- data.frame("Cells.Recovered" = x)
predict(lm.res, test.df)
}
## runs DoubletFinder on a Seurat v5 object
get.10x.doublets <- function(x){
require(DoubletFinder)
x <- NormalizeData(x)
x <- FindVariableFeatures(x, selection.method = "vst", nfeatures = 1000)
x <- ScaleData(x, features = VariableFeatures(x))
x <- RunPCA(x, features = VariableFeatures(x), verbose = FALSE)
x <- RunUMAP(x, reduction = "pca", dims = 1:20, reduction.key = "UMAPpca_", reduction.name = "UMAP_PCA")
x <- FindNeighbors(x, reduction = "pca", dims = 1:20)
x <- FindClusters(x, resolution = 0.3)
x.sweep <- paramSweep(x, PCs = 1:20, sct = FALSE)
x.sweep.stats <- summarizeSweep(x.sweep, GT = FALSE)
x.bcmvn <- find.pK(x.sweep.stats)
x.pK <- as.numeric(as.character(x.bcmvn$pK[which.max(x.bcmvn$BCmetric)]))
homotypic.prop <- modelHomotypic(x$seurat_clusters)
mlt.est <- get.10x.multiplets(nrow(x@meta.data))/100
nExp_poi <- round(mlt.est*nrow(x@meta.data))
nExp_poi.adj <- round(nExp_poi*(1-homotypic.prop))
x <- doubletFinder(x, PCs = 1:20, pN = 0.25, pK = x.pK, nExp = nExp_poi, sct = FALSE, reuse.pANN = FALSE)
x <- doubletFinder(x, PCs = 1:20, pN = 0.25, pK = x.pK, nExp = nExp_poi.adj, sct = FALSE,
reuse.pANN = grep("pANN", colnames(x@meta.data), value = TRUE))
doublet.df <- x@meta.data[,grep("DF.classifications", colnames(x@meta.data), value = TRUE)]
colnames(doublet.df) <- c("Estimate", "Adj.Estimate")
doublet.df$Cellname <- rownames(doublet.df)
doublet.df
}
## calculate doublet probabilities using DoubletFinder - takes a while
doublet.res.list <- lapply(seurat.list, get.10x.doublets)
doublet.res <- do.call(rbind, doublet.res.list)
If anyone has a better way or spots a mistake, please feel free to share.
I do all my filtering by defining cells/features to keep and then running the relevant seuratobject <- subset(seuratobject, ...)
command. Cell cycle scoring depends on what package you want to use eg: Seurat's own, tricycle etc. For Seurat's own, the vignettes on the site can walk you through an example analysis.
thank you so much yura for your support! I understand, however, that I should still find the genes that satisfy the filter criterion in all two layers and not leave the ones that satisfy only in one, right? So do something like this:
}
however I wonder: in this way don't I run the risk of a significant loss of potentially important genes if I find the genes that satisfy the filter only in both layers? Or am I making the analysis difficult for myself? Thank you so much for your help and clarification