I want to identify subgroups in one of cancer dataset. Before using it, I have few questions :
1) What are minimum sample size required to run ConsensusClusterPlus . I have data of 19 samples . Shall I use it for clustering or I should go for other method (eg PCA).
2) While going through manual, I found it will take input data of expression values (normalised or unnormalised ?? ). I also have z score (from expression data) obtained from another analysis for 19 samples. Can I use z score directly and perform clustering using ConsensusClusterPlus software.
You can use consensus clustering on 19 samples - there is no intrinsic minimum sample size required. Typically, for this and other clustering methods the results will be very dependent on how you select informative genes - see the ConsensusClusterPlus manual for one gene selection approach of picking the most variable genes by MAD. Data should be normalized. Z scores might be OK, but very dependent on how they were computed (sample-wise, or gene-wise?). If you are using a typical expression readout (like normalized read counts or intensities from RNA-Seq), then using those normalized expression levels (not the Z-scores) for an informative subset of the genes with a Pearson correlation distance measure is probably a good place to start to look for sample groupings.
I will go with MAD approach for variable selection. As z score were predicted sample Wise. First, I will start with normalised intensities and will try to get results. Then will try get subgrouping with z score and later on will see how common results are coming from both approach.
I have bit problem in understanding the plots and results. I just run sample data in the clusterconsesusplus and got Following results.
k cluster clusterConsensus
2 1 0.90794831578128
2 2 0.758432628514517
3 1 0.624620046443652
3 2 0.911135863955618
3 3 0.986412256470072
4 1 0.890835574988102
4 2 0.886960582630877
4 3 0.666394932640416
4 4 0.98295225849986
5 1 0.86123474251129
5 2 0.884872156152216
5 3 0.556828374192177
5 4 0.839098318290865
5 5 1
6 1 0.825649752799388
6 2 0.937773728911312
6 3 0.649644539921365
6 4 0.726792776419238
6 5 0.698201730147844
6 6 1
How to decide the k and sample membeship based on the clutserconsensus values. Is there need to fix any threshold and then choose specific k.
Similarly, how to decide the item membership based on this results.
I wouldn't bother consensus clustering with really small sample sizes like that because your statistical power is poor, IMO better to use standard hierarchical clustering with aheatmap or complexheatmap.
I usually consensus cluster with large samples sizes like at least over 60. I think the resampling samples method the Monti algorithm uses isn't going to behave well until you have a larger number of samples.
Hi Ahill
Thanks for your valuable answer.
I will go with MAD approach for variable selection. As z score were predicted sample Wise. First, I will start with normalised intensities and will try to get results. Then will try get subgrouping with z score and later on will see how common results are coming from both approach. I have bit problem in understanding the plots and results. I just run sample data in the clusterconsesusplus and got Following results.
k cluster clusterConsensus
2 1 0.90794831578128
2 2 0.758432628514517
3 1 0.624620046443652
3 2 0.911135863955618
3 3 0.986412256470072
4 1 0.890835574988102
4 2 0.886960582630877
4 3 0.666394932640416
4 4 0.98295225849986
5 1 0.86123474251129
5 2 0.884872156152216
5 3 0.556828374192177
5 4 0.839098318290865
5 5 1
6 1 0.825649752799388
6 2 0.937773728911312
6 3 0.649644539921365
6 4 0.726792776419238
6 5 0.698201730147844
6 6 1
How to decide the k and sample membeship based on the clutserconsensus values. Is there need to fix any threshold and then choose specific k.
Similarly, how to decide the item membership based on this results.
k cluster item itemConsensus
1 2 1 28031 0.5002183
2 2 1 28003 0.4185504
3 2 1 28042 0.4727976
4 2 1 43012 0.5462791
5 2 1 LAL5 0.4682668
6 2 1 08018 0.5090733
7 2 1 57001 0.5897417
8 2 1 22010 0.5834408
9 2 1 01007 0.2090324
10 2 1 01003 0.2036311
Thanks in advance A