Hi,
I have a fairly vague question but maybe you guys can help me out. I clustered a large proteomic expression matrix into several subclusters (hclust, euclidean, complete) that I now used for a differential expression and enrichment analysis (DE done via LIMMA, GSEA via fgesea).
My question now is that while from a statistical point of view, the cluster structure is of course valid, I want to understand whether from a biological point of view, I can find a way to quantify if the (dis)similarity of the enriched gene sets in the subclusters. Or to putit the other way around, I want to find out if the enriched sets for e.g. cluster 1 and 2 are widely different or if they have common sets so that from a enrichment point of view they be much better treated as an combined cluster (1+2). Is there something similar to an R² metric for gene sets that give me an idea of how well they capture the underlying biology and if a combined set is synergistic compared to the single sets (e.g. "R²"clus(1+2) > "R²"clus1 + "R²"clus2) ?
I hope my remarks are somewhat clear.
Thanks!