Dear Community,
scRNA-seq analysis has its own conventions. For example, CD56 is a canonical protein marker for the identification of Natural Killer (NK) cells. However, in scRNA-seq analysis, NCAM1 (gene for CD56) is not commonly used. Instead, GNLY and NKG7 are popular markers for identification of NK cell cluster in t-SNE or UMAP.
To address this issue, I have investigated several scRNA-seq datasets from different technologies such as 10X Chromium (v1, v2, v3) and Smart-seq2. It turns out that NCAM1 expression is very low in even FACS-gated (as CD56+) NK cells while GNLY and NKG7 are detected very well in this group.
My question is that why do some genes are more prone to dropout effect? What are the biological or technical explanations behind this issue?
Thank you for your contributions in advance.
Dear Friederike, thank you so much for the very nice explanations. It makes sense now.
Additionally, I found this article, which suggests that there are gene detection biases based specifically on gene length in protocols such as Smart-seq2. However, it's not the case for UMI-based methods, which are influenced by other factors as you suggested above.
You raise a very good point -- different single-cell platforms will come with their own set of limitations and pitfalls.