I have expression data across multiple time points. I've clustered the data using kmeans (after per-gene standardization). There are two kmeans clusters which seem interesting, and I wish to compare them.
I thought of performing enrichment analysis for each of the clusters, and then seeing which of the enriched sets identify each of the clusters uniquely.
Is there a correct, well-established way of doing this?
Are there pitfalls to look for?
For example, if a specific set (eg biological pathway) has an adjusted p-value of 0.04 in one of the clusters, but a value of 0.06 in another, then it makes no sense to consider this set unique to only one of the clusters.
Clarification :
This is time-course data, so there is no need to cluster by samples, as they are ordered by time. I'd rather not post my own data, but I have found a published image which demonstrates this :
Suppose one would be interested to know which are the molecular functions that identify cluster III, versus the ones that identify cluster IV - how should that be done?
What I intend to do is to see the enrichments of the two genes lists (for the two clusters) in a network view-mode (with ShinyGo or a similar tool). Like this one:
The network view is not a rigorous statistical framework for comparing enrichments. However, imho, it can give a good bird-eye overview of the differences between the two clusters. In this way, I hope that if GO term X would have a p-value of 0.049 in cluster 1, but 0.051 in cluster 2, then there would be other terms in cluster 2 that are similar to GO term X in cluster one, but would be significant. I think that if I get a strong network of interrelated terms in cluster 1, and this specific interrelated network is missing in cluster 2, then it is reasonable evidence for difference in enrichments.
How does that sound?