I was wondering how good the tool should work on a SMART-Seq 2 data set with "only" < 100 cells.
I'm getting the warning, that it might cause a problem. But my question is different.
I have run the scDblFinder
command on my sce
object after removing low qc cell identified via addPerCellQCMetrics
and only two cells were identified as doublet.
For some reason I needed to repeat the analysis and this time I have first ran the filtering for doublets only after removing low QC cells. This time though it identified 9 cells as doublets. I know it is not much, but it's still >10% in my data set.
I'm mainly interested in understanding if I can trust the results for such a small data set, and if so why there is such a big difference, depending how (or when) one run the search.
thanks Assa
Being a user of the tool, without detailed knowledge of its internals though, I would say it mainly expects droplet-based data, having far more cells and hence expecting at least some doublets. SMART-seq2 is plate-based afaik, so doublet rate is expected to be low. Not sure if you even need a doublet detector. I would see how the UMAP and clustering looks, and whether there are cells that either cluster between two clusters or whether there is a cluster with odd markers (for example markers of two lineages) that could indicate doublets, and then as confirmation run this tool. With only 100 cells, it might not even be necessary to do it. Thinking aloud here.
I have 10x droplet data and I was wondering at what stage of the QC and filtering the doublet detection should be applied.
The official vignette states that doulbet detection should be performed directly after filtering out empty droplets, but I wonder if its better to apply it after at least some QC like removing lower mode, consisting of presumably dead cells, in a bimodal distribution like the one below?
To clarify, in a distribution like this above I would filter out all cells from the lower mode, i.e. all cells with number of counts < ~15k, giving me a nice unimodal quasi-normal distribution. Isn't it better to apply the doublet detection step at this stage????