Question

After doublet detection in 10x scRNA-Seq data, there is still evidence of doublets

0

Entering edit mode

6 weeks ago

yura.grabovska ▴ 760

I routinely run DoubletFinder on my 10x scRNA-Seq. This is the code for the 2 functions I use to run this and this works with a Seurat object. It should be fairly obvius how this works and this code is lifted more or less in its entirety from the tutorial of the package, with few additional steps added.

The issue that I have is after running and removing multiplets, I still notice populations that shouldn't necessarily exist - handfuls of cells in non-immune compartments expressing CD45, for example. When I look at those cells separately, they typically have higher nFeature and nCounts than the rest of the cells in that module suggesting that they are multiplets. However, it's not easy to filter for them systematically besides running DoubletFinder.

I am finding them often enough that it makes me think DoubletFinder is not working properly as they're more than just a handful of cells appearing sporadically but a systematic issue with most libraries I am running.

Is anyone able to take a look at the code and see if you spot any glaring issues? I have also fed this back to package authors.

## spits out the expected multiplet rate based on input value of library size (data taken from 10x site)
get.10x.multiplets <- function(x){
  expected.df <- data.frame(
    "Cells.Recovered" = c(500,1000,2000,3000,4000,5000,6000,7000,8000,9000,10000),
    "Multiplet.Rate" = c(0.4, 0.8, 1.6, 2.4, 3.2, 4.0, 4.8, 5.6, 6.4, 7.2, 8.0)
  )
  lm.res <- lm(Multiplet.Rate~Cells.Recovered, data = expected.df)
  test.df <- data.frame("Cells.Recovered" = x)
  return(predict(lm.res, test.df))
}

## runs DoubletFinder on a seurat object 
get.10x.doublets <- function(x){
  require(DoubletFinder)

  x <- NormalizeData(x)
  x <- FindVariableFeatures(x, selection.method = "vst", nfeatures = 500)
  x <- ScaleData(x, features = VariableFeatures(x))
  x <- RunPCA(x, features = VariableFeatures(x), verbose = FALSE)
  x <- RunUMAP(x, reduction = "pca", dims = 1:20, reduction.key = "UMAPpca_", reduction.name = "UMAP_PCA")
  x <- FindNeighbors(x, reduction = "pca", dims = 1:20)
  x <- FindClusters(x, resolution = 0.3)

  x.sweep <- paramSweep(x, PCs = 1:20, sct = FALSE)
  x.sweep.stats <- summarizeSweep(x.sweep, GT = FALSE)
  x.bcmvn <- find.pK(x.sweep.stats)
  x.pK <- as.numeric(as.character(x.bcmvn$pK[which.max(x.bcmvn$BCmetric)]))
  homotypic.prop <- modelHomotypic(x$seurat_clusters)
  mlt.est <- get.10x.multiplets(nrow(x@meta.data))/100
  nExp_poi <- round(mlt.est*nrow(x@meta.data))
  nExp_poi.adj <- round(nExp_poi*(1-homotypic.prop))

  x <- doubletFinder(x, PCs = 1:20, pN = 0.25, pK = x.pK, nExp = nExp_poi, sct = FALSE, reuse.pANN = FALSE)
  x <- doubletFinder(x, PCs = 1:20, pN = 0.25, pK = x.pK, nExp = nExp_poi.adj, sct = FALSE, 
                     reuse.pANN = grep("pANN", colnames(x@meta.data), value = TRUE))
  doublet.df <- x@meta.data[,grep("DF.classifications", colnames(x@meta.data), value = TRUE)]
  colnames(doublet.df) <- c("Estimate", "Adj.Estimate")
  doublet.df$Cellname <- rownames(doublet.df)
  return(doublet.df)
}

R DoubletFinder Seurat scRNA • 312 views

ADD COMMENT • link updated 6 weeks ago by fracarb8 ★ 1.7k • written 6 weeks ago by yura.grabovska ▴ 760

0

Entering edit mode

This may not necessarily be the case here, but in general, I would not assume these algorithms will detect all of the "true" doublets. In fact, if you try different methods you will probably get different lists of doublets (which hopefully should have a considerable overlap). I would suggest trying different methods (e.g. scDblFinder is another widely-used tool which also offers different methods). But be aware that these and other data QC cleaning steps are also often "manually" or "custom" curated especially if you have sound biologicall knowledge that the expression patterns are incompatible (e.g. the CD45 example you say).

ADD REPLY • link 6 weeks ago by Papyrus ★ 3.0k

0

Entering edit mode

Have you tried subsetting those compartments and see where these cells clusters without the 'noise' of other cell types?

ADD REPLY • link 6 weeks ago by fracarb8 ★ 1.7k