Question

Alternative for unsupervised clustering in scRNA Data

0

Entering edit mode

6.9 years ago

John ▴ 270

Hi dear one, As far I have seen in literatures all the data of single cell RNA seq have been subgrouped by unsupervised clustering by using top 100/200/300 variable genes! Is there any way to do subgrouping single cell by the known genes which is specific to cell type? This could be more accurate (subgrouping) right?

Thanks in advance!

RNA-Seq unsupervised clustering gene expression • 2.0k views

ADD COMMENT • link 6.9 years ago by John ▴ 270

1

Entering edit mode

Well, just for the scRNA community, when one chooses the top 100/200/300 genes based on variance and then performs clustering, this is not unsupervised at all - it is supervised clustering based on highly variable genes.

When you say "specific to cell-type", do you mean that you want to relate your scRNA data to tissue-specific data so that you could, for example, segregate your scRNA population by tissue based on their different expression patterns? There was a recent question posted on this, here: Normalizing transcriptome data by tissue type

Other clustering methods include k-means, PAM, t-SNE, etc.

ADD REPLY • link 6.8 years ago by Kevin Blighe 88k

0

Entering edit mode

Dear Kevin,

The same thing you mentioned as supervised clustering is written as unsupervised clustering in the following paper

statement: "The “autoAnalysis()” command was used to perform unsupervised clustering, principal component analysis, and expression heat mapping of the remaining 64 cells using the top 400 most variable genes as deter- mined by ANOVA"

ARTICLE : Integrative Single-Cell Transcriptomics Reveals Molecular Networks Defining Neuronal Maturation During Postnatal Neurogenesis

can you please help me by differentiating unsupervised vs supervised clustering?

thanks in advance

ADD REPLY • link 6.8 years ago by John ▴ 270

0

Entering edit mode

Depends on your perspective, but for me that is not unsupervised clustering:

They look at their original dataset
They decide to filter out genes based on high/low variance for whatever reason
They perform hierarchical clustering using the highly-variable genes

Therefore, the clustering is biased due to the fact that it is being generated from a set of highly variable genes that will segregate better the sample cohort. If it were entirely unbiased, then they would have performed the clustering on all genes that passed QC.

That said, they may have used the term unbiased in the sense that the clustering was performed on a hypothesis free basis. Even still, the clustering is biased by only using highly variable gene - it's a neat trick to segregate better your cohort.

Also, forgive me, I would be wary of using a function called autoAnalysis(). We need less automation and more human brains looking over things.

ADD REPLY • link 6.8 years ago by Kevin Blighe 88k