I am working with PCA results derived from PLINK output files for a dataset comprising 674 samples. After performing K-Means clustering on the PCA data, I observed the following:
K-Means Clustering Results: The K-Means algorithm identified three well-defined clusters in the PCA plot. The clusters appear distinct and separated.
Severity-Based Coloring: When I colored the PCA plot based on severity categories (Mild, Moderate, Severe), I noticed that the clusters include samples from all severity groups. Specifically:
- Each of the three clusters contains samples across the severity categories.
- No specific clustering pattern emerges related to the severity groups.
Question
Given the observation that severity groups are distributed across the identified clusters, what could be the reasons for not observing severity-specific clustering? Could it indicate that severity does not directly influence the PCA clusters, or might there be other factors at play?
How can I further analyze or adjust my approach to potentially uncover any severity-specific clustering patterns? Are there additional methods or considerations that might help in understanding the relationship between severity and clustering in this context?
You could try looking into other PC combinations. You are only looking at the first 2 PC axes at the moment, so the severity-based result, if actually significant, is likely small and showing up elsewhere.
Thanks for the suggestion dthorbur
Also, when applied ANOVA for each PC pair individually (in total have, 10 PC eigenvectors); found significance in between severity groups for none of the pair as per ANOVA p value (all with >0.10 in my case). Having said that, how do I interpret it now? Should the severity aspect be clearly removed from the story? Is it all population stratification impact? as this is a GWAS study but then all samples are from one particular hospital only.
Any comments/suggestions are appreciated!
I think you should use a permanova instead of a series of ANOVAs given it's built for this use-case.
If you have data on population stratification, then you can analyse the data with stratification as a random effect in a mixed effects model, but to me it sounds like severity is not an important factor in your study. This can still be a reported result to show you've looked into this.
the output PCA results: