Hello.
I am analyzing a set of samples from different experimental groupings, and I do Principal Component Analysis to differentiate the experimental groups visually. However I notice that certain groups have the highest loading scores for certain principal components, f.ex. PC1. And I would like to prove this statistically.
Assume I have a factor for experimental groups:
[ A A A A A A B B B B B B C C C C C C C C C C C C ]
And want to correlate them to PC1, which has some scores...
[ -0.12 -0.52 -0.12 ... etc ... 0.64 0.11 0.69 0.33 ]
As can be seen, group "C" has higher scores. What is the most ideal , or commonly used statistical test to show this?
Currently, I have tried simple pearson correlation, setting all groups other than C as 0, and C as 1. This however is not ideal if there is a lot of variance between groups.
I also thought of doing logistic regression, I tried it but it fails if the groups are perfectly separated and isn't really useful for small sample sizes.
So I am going to try to do a simple z-score and then a Welch's t-test to obtain the p-value (if number of samples was the same in each group, it could be a paired t-test). However I didn't really find any examples online of others doing the same, maybe this https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4015128/ but nothing clear.
Am I justified in using such a statistical test or do I want to take another approach?
I'm not sure what you mean by "I also thought of doing logistic regression, I tried it but it fails if the groups are perfectly separated and isn't really useful for small sample sizes." Logistic regression should work, and your groups being perfectly separated should make it work even better.
Hello, it should, but there is a problem with the logistic algorithm, it seems it doesn't converge if the groups are perfectly separated. See the regression in this image: https://files.catbox.moe/kyjhk5.png
It looks good, however I am getting an error message:
My data looks like this
You can try it for yourself and see what happens.
It is not just me who has had this problem, see:
https://stats.stackexchange.com/questions/254124/why-does-logistic-regression-become-unstable-when-classes-are-well-separated https://stats.stackexchange.com/questions/11109/how-to-deal-with-perfect-separation-in-logistic-regression