Filtering seurat object
1
0
Entering edit mode
5 months ago
marco.barr ▴ 160

Hello everyone, I'm trying to implement a gene-level filter on my Seurat object. In my assay, I have two layers: one for a control sample and the other for a patient sample. This is my code.

layers <- c("counts.ctl_raw_feature_bc_matrix", "counts.pt_raw_feature_bc_matrix")
for (layer in layers) {
    counts <- LayerData(filtered_seurat, assay = "RNA", layer = layer)
    nonzero <- counts > 0
    keep_genes <- Matrix::rowSums(nonzero) >= 10
    filtered_counts <- counts[keep_genes, ]
    filtered_seurat <- SetAssayData(filtered_seurat, assay = "RNA", slot = layer, new.data = filtered_counts)
}

However, when I update the filtered data directly on the assay, I get the following warnings:

Warning: Different features in new layer data than already exist for counts.ctl_raw_feature_bc_matrix
Warning: Different features in new layer data than already exist for counts.pt_raw_feature_bc_matrix

Despite reducing the size, when I check the number of features with num_features_filtered <- nrow(filtered_seurat) on the new filtered object, I obtain the same number of rows as the unfiltered one. I'm unsure if the code is working correctly or not. How can I resolve this?

Additionally, I have a general question: how should one handle analysis with multiple layers when performing filters, assigning cell cycle scores, and identifying/removing doublets? Thank you very much.

seurat scRNA • 1.4k views
ADD COMMENT
0
Entering edit mode
5 months ago

Despite the layers being unique, Seurat still maintains a single Features/Cells validation. This means that when you mess with individual layers like that, it will still maintain the widest set of features that captures all features in all the layers and just zero out any missing features in an individual layer.

If you want to filter features where there are low individual gene counts, you can make a list of Features to keep and then just run:

filtered_seurat <- subset(filtered_seurat, features = keep.features)

Additionally, I have a general question: how should one handle analysis with multiple layers when performing filters, assigning cell cycle scores, and identifying/removing doublets?

I run DoubletFinder on individual libraries before merging. This is more to conserve memory as I run exploratory analysis on a workstation. You can use the LayerData() function above to pull out individual counts. Here are some generic functions I use adapted from DoubletFinder's github.

## spits out the expected multiplet rate based on input value of library size
get.10x.multiplets <- function(x){
  expected.df <- data.frame(
    "Cells.Recovered" = c(500, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000),
    "Multiplet.Rate" = c(0.4, 0.8, 1.6, 2.4, 3.2, 4.0, 4.8, 5.6, 6.4, 7.2, 8.0)
  )
  lm.res <- lm(Multiplet.Rate~Cells.Recovered, data = expected.df)
  test.df <- data.frame("Cells.Recovered" = x)
  predict(lm.res, test.df)
}

## runs DoubletFinder on a Seurat v5 object 
get.10x.doublets <- function(x){

  require(DoubletFinder)

  x <- NormalizeData(x)
  x <- FindVariableFeatures(x, selection.method = "vst", nfeatures = 1000)
  x <- ScaleData(x, features = VariableFeatures(x))
  x <- RunPCA(x, features = VariableFeatures(x), verbose = FALSE)
  x <- RunUMAP(x, reduction = "pca", dims = 1:20, reduction.key = "UMAPpca_", reduction.name = "UMAP_PCA")
  x <- FindNeighbors(x, reduction = "pca", dims = 1:20)
  x <- FindClusters(x, resolution = 0.3)

  x.sweep <- paramSweep(x, PCs = 1:20, sct = FALSE)
  x.sweep.stats <- summarizeSweep(x.sweep, GT = FALSE)
  x.bcmvn <- find.pK(x.sweep.stats)
  x.pK <- as.numeric(as.character(x.bcmvn$pK[which.max(x.bcmvn$BCmetric)]))
  homotypic.prop <- modelHomotypic(x$seurat_clusters)
  mlt.est <- get.10x.multiplets(nrow(x@meta.data))/100
  nExp_poi <- round(mlt.est*nrow(x@meta.data))
  nExp_poi.adj <- round(nExp_poi*(1-homotypic.prop))
  x <- doubletFinder(x, PCs = 1:20, pN = 0.25, pK = x.pK, nExp = nExp_poi, sct = FALSE, reuse.pANN = FALSE)
  x <- doubletFinder(x, PCs = 1:20, pN = 0.25, pK = x.pK, nExp = nExp_poi.adj, sct = FALSE, 
                     reuse.pANN = grep("pANN", colnames(x@meta.data), value = TRUE))
  doublet.df <- x@meta.data[,grep("DF.classifications", colnames(x@meta.data), value = TRUE)]
  colnames(doublet.df) <- c("Estimate", "Adj.Estimate")
  doublet.df$Cellname <- rownames(doublet.df)
  doublet.df
}

## calculate doublet probabilities using DoubletFinder - takes a while
doublet.res.list <- lapply(seurat.list, get.10x.doublets)
doublet.res <- do.call(rbind, doublet.res.list)

If anyone has a better way or spots a mistake, please feel free to share.

I do all my filtering by defining cells/features to keep and then running the relevant seuratobject <- subset(seuratobject, ...) command. Cell cycle scoring depends on what package you want to use eg: Seurat's own, tricycle etc. For Seurat's own, the vignettes on the site can walk you through an example analysis.

ADD COMMENT
0
Entering edit mode

thank you so much yura for your support! I understand, however, that I should still find the genes that satisfy the filter criterion in all two layers and not leave the ones that satisfy only in one, right? So do something like this:

layers <- c("counts.ctl_raw_feature_bc_matrix", "counts.pt_raw_feature_bc_matrix")
all_keep_genes <- list()

for (layer in layers) {
      counts <- LayerData(filtered_seurat, assay = "RNA", layer = layer)
      nonzero <- counts > 0
      keep_genes <- Matrix::rowSums(nonzero) >= 10
     all_keep_genes[[layer]] <- rownames(counts)[keep_genes]
}
 common_keep_genes <- Reduce(intersect, all_keep_genes)
 filtered_seurat <- subset(filtered_seurat, features = common_keep_genes)

for (layer in layers) {
   counts <- LayerData(filtered_seurat, assay = "RNA", layer = layer)
   filtered_counts <- counts[common_keep_genes, ]
  filtered_seurat <- SetAssayData(filtered_seurat, assay = "RNA", slot = layer, new.data = filtered_counts)

}

however I wonder: in this way don't I run the risk of a significant loss of potentially important genes if I find the genes that satisfy the filter only in both layers? Or am I making the analysis difficult for myself? Thank you so much for your help and clarification

ADD REPLY

Login before adding your answer.

Traffic: 3029 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6