I have a gene expression dataset that I want to investigate. Particularly, I would like to understand whether there is any correlation between each gene's expression and some quantitative or qualtitative data (say, correlation between gene 'XPTO' , body mass index, and race).
One possible way to test this would be through logistic regression, but is this a good approach or are there caveats that I should know about using such a statistic?
My question is the following: which methods would you advise to measure such correlations, and why?
(This question was crossposted on Stackexchange)
Thank you so much, this is very interesting and useful! But if I understood correctly, the authors only consider quantitative variables for the association (e.g. BMI), not qualitative correct? How would you proceed if you had instead qualitative data?
Then you can choose either DESeq2, edgeR or limma that allows multi factor designs. For mixing quantitative and qualitative data, I'm not sure if it's feasible easily. You should maybe add the qualitative data into the model.
Example
You apply your model to all genes separately and extract the p-value + correlation metrics. Never tried but should work