I have an integrated dataset (3 developmental stage, sctransform method). I used FindConservativeMarkers to identify cluster marker genes for each cluster. However, for most clusters, the top marker genes are showing negative logFC in all three stages. I'm confused. Shouldn't the marker genes be highly expressed in the clusters they represent?
Could you help me interpret the marker genes showing negative logFC? Thanks.
By the way, when I set pos.only = TRUE, I lost most of the marker genes, leaving many clusters with 0 marker genes.
We need your code to properly answer this, as it's going to depend on how you're grouping your cells. Why do you feel that marker genes can only be positive? While most people only show positive markers, as they are typically more clear and easier to interpret, it's totally possible that the best markers for a cluster are the absence of other genes. Usually, people display this by highlighting that those markers are up in other clusters in relation to the one in question.
Thank you, Jared. My code is standard: FindConservedMarkers(obj_combined, ident.1 = 0, grouping.var = "stage", verbose = F). I mean how do I label the cluster with these lowly expressed genes? For example, the top marker genes for cluster 0 are stem cell marker genes but they are lowly expressed in cluster 0 (logfc < 0). Do I label cluster 0 "not stem cell"?
To be clear, this will find genes that separate cluster 0 from all others that are conserved across your developmental stages. If you're getting negative markers across all 3, then those are the genes that are conserved across all of those developmental stages for cluster 0 versus all other clusters.
Cell type annotation is another question entirely. There are multiple ways to go about it, the most simple (yet often the most confusing/ambiguous) is to do it manually via specific markers genes that depend on your biological expertise. This is fine if you have a clear idea of what cell types are in your sample, and they have robust markers that are picked up well by single cell RNAseq. More often, there are at least a few clusters that are relatively confusing/minutely different from others that are tougher to pin down.
There are also automated methods that try to predict cell type based on a reference from other scRNA studies that have cells labeled or sorted bulk RNA cells.
The OSCA book has an entire chapter devoted to cell type annotation methods (manual and automated) that will likely be worth your time. See also the SingleR book.
Thank you, Jared. I noticed some pipelines annotating cell types using top marker genes of the cluster. That's why I'm asking. It looks like cluster marker genes are not as useful in cell type annotation.
It can be done, but like I said, it can be an infuriating, confusing, and often pretty subjective process. If you can, I tend to annotate each cell on its own with a reference-based method and then label each cluster with its majority label. Occasionally with a few small populations that get labeled manually if necessary.
We need your code to properly answer this, as it's going to depend on how you're grouping your cells. Why do you feel that marker genes can only be positive? While most people only show positive markers, as they are typically more clear and easier to interpret, it's totally possible that the best markers for a cluster are the absence of other genes. Usually, people display this by highlighting that those markers are up in other clusters in relation to the one in question.
Thank you, Jared. My code is standard: FindConservedMarkers(obj_combined, ident.1 = 0, grouping.var = "stage", verbose = F). I mean how do I label the cluster with these lowly expressed genes? For example, the top marker genes for cluster 0 are stem cell marker genes but they are lowly expressed in cluster 0 (logfc < 0). Do I label cluster 0 "not stem cell"?