Hello!
I've come across a really weird observation in the quality-control of my scRNAseq data that I really can't explain. I am working on scRNAseq data of T cells, obtained with BD Rhapsody (microwell-based assay). We have several batches of sequencing (for biological replicates), and for some of them we only did whole transcriptome sequencing (WTA), and others we did whole transcriptome+VDJ sequencing (WTA_VDJ). The weird observation is the following: when I plot the count per cell (nCount_RNA) x number of genes expressed per cell (nFeature_RNA) I see:
- for batches with WTA only there is one curve (e.g. batch C below)
- for batches with WTA_VDJ there are two curves (e.g. batch D below), so about half the cells have less genes detected for the same total count
Below is an example with two batches, but I see this consistently across multiple batches (WTA batches have 1 curve, WTA_VDJ batches have 2 curves). There is nothing in the covariates (e.g. mt/rb content, different cell state/cluster, TCR repertoire, ....) that can explain these two curves aside from the sequencing method (WTA vs WTA_VDJ).
I really can't understand this, as we only have one cell type in the data (T cells), they come from the same tissue type (although it's different biological replicates), and we work on healthy tissue (and we don't see much differences between biological replicates in the downstream analyses). At first glance these double curves don't seem to impact downstream analyses (cells from WTA or WTA_VDJ are very mixed across clusters, and cells from upper/lower curves from WTA_VDJ are also very mixed).
This still bugs me because I'm not sure how this may impact other downstream analyses (e.g. differential expression analyses), and I really can't explain why the WTA_VDJ method would have half of the cells with less genes detected and the other half with more genes detected. Has anyone had experience with this?...