Entering edit mode
6.9 years ago
John
▴
270
Hi dear one, As far I have seen in literatures all the data of single cell RNA seq have been subgrouped by unsupervised clustering by using top 100/200/300 variable genes! Is there any way to do subgrouping single cell by the known genes which is specific to cell type? This could be more accurate (subgrouping) right?
Thanks in advance!
Well, just for the scRNA community, when one chooses the top 100/200/300 genes based on variance and then performs clustering, this is not unsupervised at all - it is supervised clustering based on highly variable genes.
When you say "specific to cell-type", do you mean that you want to relate your scRNA data to tissue-specific data so that you could, for example, segregate your scRNA population by tissue based on their different expression patterns? There was a recent question posted on this, here: Normalizing transcriptome data by tissue type
Other clustering methods include k-means, PAM, t-SNE, etc.
Dear Kevin,
The same thing you mentioned as supervised clustering is written as unsupervised clustering in the following paper
statement: "The “autoAnalysis()” command was used to perform unsupervised clustering, principal component analysis, and expression heat mapping of the remaining 64 cells using the top 400 most variable genes as deter- mined by ANOVA"
ARTICLE : Integrative Single-Cell Transcriptomics Reveals Molecular Networks Defining Neuronal Maturation During Postnatal Neurogenesis
can you please help me by differentiating unsupervised vs supervised clustering?
thanks in advance
Depends on your perspective, but for me that is not unsupervised clustering:
Therefore, the clustering is biased due to the fact that it is being generated from a set of highly variable genes that will segregate better the sample cohort. If it were entirely unbiased, then they would have performed the clustering on all genes that passed QC.
That said, they may have used the term unbiased in the sense that the clustering was performed on a hypothesis free basis. Even still, the clustering is biased by only using highly variable gene - it's a neat trick to segregate better your cohort.
Also, forgive me, I would be wary of using a function called
autoAnalysis()
. We need less automation and more human brains looking over things.