General approach for unsupervised clustering of bulk RNAseq samples and deriving/applying gene signature
0
0
Entering edit mode
21 months ago
Mat ▴ 80

PCA of the top variable genes didn't reveal any grouping of the samples (they are all in one cluster). Therefore, I am looking for alternative ways to derive a grouping of the samples. I am not sure what the best approach is for each of the three steps.

1. Perform unsupervised clustering on bulk RNAseq data to derive molecular subtypes

  • Correcting for library size and variance stabilized transformation (Deseq2)
  • Gene selection (e.g. by variance, uni modality test)
  • Apply kmeans/hierarchical clustering algorithm on distance matrix
  • Decide for the best number of clusters using e.g. sum of squared error (SSE) scree plot and/or based on correlation with clinical variables

==> What other preprocessing steps are recommended for clustering? E.g. Z score, quantile normalization?

2. Extract a gene signature that describes each of the clusters

Look for significant gene expression differences between cluster using likelihood ratio test (Deseq2), and manually select based on heatmap ==> Is there a better/easier way to do this?

3. Classify a 2nd independant bulk RNAseq dataset (different sequencing protocol) using the gene signature

Clustering of the genes in the gene signature using number of clusters preprocessing steps from step 1 and manually assign cluster name based on heatmap ==> Is there a better/easier way to do this?

clustering RNAseq DESeq2 • 1.7k views
ADD COMMENT
0
Entering edit mode

What sample sizes are you analyzing? The approach will depend on the scale of the study.

ADD REPLY
0
Entering edit mode

PCA didn't reveal any clustering.

ADD REPLY
0
Entering edit mode

Roughly 200 samples

ADD REPLY

Login before adding your answer.

Traffic: 2529 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6