Hello, I have a normalized expression matrix that has many genes. I also have clinical data for the same samples of the matrix. Some data is numeric (age, levels of LDL, tumor size...), and some is categorical (sex, response to therapy, subtype of tumour...)
What kind of tests can I use to assess correlation or association between certain genes and these clinical features? I am trying with Spearman for the numeric- numeric comparison, but what should I do for the categorical-numeric comparison? I read some people recommend ANOVA, if so, would this be correct:
exp_clinic is a dataframe with columns that has all the information (gene expression, clinical features)
for 1 in all genes do:
res<-aov( gene ~ SubtypeTumor , data = exp_clinic)
end
And then check for the p values below 0.05 and r squared above 0.5 in the results of the anova to get the most associated genes to SubtypeTumor
Would this be a correct way of doing it? Or should I use another method, if so, how?
Thank you