I wanted to ask for advice about one puzzling issue with our 10X single cell RNA sequencing data analysis (gene expression, 3A V3.1).
I am working with 10X single cell RNASeq data from mouse brain tissue and trying to annotate/assign “cells” to cell types. My approach: • use the standard Seurat workflow to cluster cells into ~30 clusters; • determine top differentially expressed genes for each cluster; • analyze the lists of top DE genes for the presence of known cell type markers and manually assign cell type identity to each original cluster;
This works quite well and results in re-classification of all cells into 8-10 conventional "cell types" in this tissue like “neurons”, “astrocytes” etc. Now it is possible to select one very strong marker gene for each “cell type” (please see the violin plot below) to confirm its identity.
However, this doesn’t work for the “neurons"! Conventional markers of neurons such as Snap25 (and others; see violin plot below) are expressed in all “cell types” (and only slightly higher in “neurons” compared to other “cell types”). “Non-neuronal cells” are expected to have very low expression of neuronal markers. It looks as though “non-neuronal” clusters are either contaminated with “neurons” or with neuronal RNA.
The issue persists regardless of normalization strategy (log-transform/SCT) or doublet removal (by DoubletFinder).
What might be the explanation for this? (namely, high expression of neuronal markers almost in every cell in the dataset, even though other markers very clearly point to the presence of non-neuronal cells in the dataset; this is also expected from the experimental design).