Dear community,
I finally got some Gene Expression Data on some patients with cancer vs. healthy patients as control. I want to investigate if the expression data form distinct clusters. That's why I performed PCA for dimensionality reduction and plotted it with sns.scatterplot:
You can see that patients with cancers form a cluster (blue) and the healthy patients form another (orange) cluster. But:
1.) Is it valid, to perform PCA to identify cluster or do I need to do other clustering methods, like t-SNE plots or k-means clustering? 2.) How can I show that the clusters are significantly different? Can I calculate p-values? Is it also possible to plot confidence ellipses?
I would be glad for every help!
This is a case where you don’t need a statistical test to find out whether the clusters are distinct.