Question

General approach for unsupervised clustering of bulk RNAseq samples and deriving/applying gene signature

0

Entering edit mode

21 months ago

Mat ▴ 80

PCA of the top variable genes didn't reveal any grouping of the samples (they are all in one cluster). Therefore, I am looking for alternative ways to derive a grouping of the samples. I am not sure what the best approach is for each of the three steps.

1. Perform unsupervised clustering on bulk RNAseq data to derive molecular subtypes

Correcting for library size and variance stabilized transformation (Deseq2)
Gene selection (e.g. by variance, uni modality test)
Apply kmeans/hierarchical clustering algorithm on distance matrix
Decide for the best number of clusters using e.g. sum of squared error (SSE) scree plot and/or based on correlation with clinical variables

==> What other preprocessing steps are recommended for clustering? E.g. Z score, quantile normalization?

2. Extract a gene signature that describes each of the clusters

Look for significant gene expression differences between cluster using likelihood ratio test (Deseq2), and manually select based on heatmap ==> Is there a better/easier way to do this?

3. Classify a 2nd independant bulk RNAseq dataset (different sequencing protocol) using the gene signature

Clustering of the genes in the gene signature using number of clusters preprocessing steps from step 1 and manually assign cluster name based on heatmap ==> Is there a better/easier way to do this?

clustering RNAseq DESeq2 • 1.7k views

ADD COMMENT • link 21 months ago by Mat ▴ 80

0

Entering edit mode

What sample sizes are you analyzing? The approach will depend on the scale of the study.

ADD REPLY • link 21 months ago by rpolicastro 13k

0

Entering edit mode

PCA didn't reveal any clustering.

ADD REPLY • link 21 months ago by Mat ▴ 80

0

Entering edit mode

Roughly 200 samples

ADD REPLY • link 21 months ago by Mat ▴ 80