Hello, I have two sets of different genes I have identified from my previous analysis. Let's call these sets "A" and "B". A contains 10 genes "g1","g2", "g3", "g4".."g10", B contains 14 genes "g11", "g12"..."g24". I want to compare in each sample (transcriptome) the distribution of these two sets of genes A vs B. These sets of genes are predictive for survival. I know that when A is strongly expressed with respect B the patient has a bad survival. I thought to use the Kolmogorov-Smirnov test (Ks,test) to compare the distributions of A vs B. It works very well...all the patients whose pvalue is significant show a different survival. Do you think that the Ks.test is statistically correct? Do you recommend other methods to classify each single patient based on these two sets of genes? any other suggestions is more than welcome. Thank you
Thank you Jean Karim. I am aware of the assumptions underlying the Ks test, this why posted here. the data set is small, so that I can't use any ML approach.Do you know an alternative test to the Ks test that doesn't assume independency? ..in other words: if you have 1 sample ..only 1 sample where you want to compare 2 set of genes, which test do you suggest?
You can still use ML approaches when the data set is small. It depends on how small is small. For example, if you want to associate a probability to two classes (e.g. good/bad prognosis), you could try logistic regression. If you still want to do a statistical test for some difference between set A and set B, go for a permutation test. You could go with the KS statistics if it works well for you, only compute the p-value using permutations.
Thank you Jean Karim!