Question

Why runing scDblFinder before and after removing low QC cells gives different results

1

Entering edit mode

22 months ago

Assa Yeroslaviz ★ 1.9k

I was wondering how good the tool should work on a SMART-Seq 2 data set with "only" < 100 cells.

I'm getting the warning, that it might cause a problem. But my question is different.

I have run the scDblFinder command on my sce object after removing low qc cell identified via addPerCellQCMetrics and only two cells were identified as doublet.

For some reason I needed to repeat the analysis and this time I have first ran the filtering for doublets only after removing low QC cells. This time though it identified 9 cells as doublets. I know it is not much, but it's still >10% in my data set.

I'm mainly interested in understanding if I can trust the results for such a small data set, and if so why there is such a big difference, depending how (or when) one run the search.

thanks Assa

singleCellExperiment scDblFinder SMART-Seq • 1.7k views

ADD COMMENT • link updated 16 months ago by e.r.zakiev ▴ 250 • written 22 months ago by Assa Yeroslaviz ★ 1.9k

1

Entering edit mode

Being a user of the tool, without detailed knowledge of its internals though, I would say it mainly expects droplet-based data, having far more cells and hence expecting at least some doublets. SMART-seq2 is plate-based afaik, so doublet rate is expected to be low. Not sure if you even need a doublet detector. I would see how the UMAP and clustering looks, and whether there are cells that either cluster between two clusters or whether there is a cluster with odd markers (for example markers of two lineages) that could indicate doublets, and then as confirmation run this tool. With only 100 cells, it might not even be necessary to do it. Thinking aloud here.

ADD REPLY • link 22 months ago by ATpoint 87k

0

Entering edit mode

I have 10x droplet data and I was wondering at what stage of the QC and filtering the doublet detection should be applied.

The official vignette states that doulbet detection should be performed directly after filtering out empty droplets, but I wonder if its better to apply it after at least some QC like removing lower mode, consisting of presumably dead cells, in a bimodal distribution like the one below?

enter image description here

To clarify, in a distribution like this above I would filter out all cells from the lower mode, i.e. all cells with number of counts < ~15k, giving me a nice unimodal quasi-normal distribution. Isn't it better to apply the doublet detection step at this stage????

enter image description here

ADD REPLY • link 16 months ago by e.r.zakiev ▴ 250