Hi all,
I am trying to correlate a continuous physiological trait with gene expression from RNA-seq, particularly trying to identify genes associated with a change in a trait between 2 treatments (n = 5 per treatment). I've already ran standard differential expression analysis but have trait+expression data for each individual so thought could be interesting to attempt correlations. From literature, I've seen linear regressions used, using counts data for each gene as explanatory variable and trait as a response variable.
So my question is how to best incorporate multiple treatments using this approach?
regression of genes and trait data in each treatment separately? then afterwards compare/contrast genes that correlate with the trait in each treatment
regression of genes and traits for both treatments together in same analysis? For this I have seen studies transform physiological trait values for the experimental condition to a measure of difference from the control group.
Any advice/alternate suggestions welcomed. Thanks.
My suggestion would be try WGCNA on you datasets. you can identify modules based on gene expression correlation and also can perform module-trait relationship. you can find details here
Thanks for your response, its much appreciated. I thought about WGCNA but thought it required much larger sample sizes (> 15) for reliable results to be produced?
Also, was just curious about performing regressions between genes and traits more generally. I guess using WGCNA may be more appropriate for an entire transcriptome. But say to correlate a small number of genes of interest with a trait for multiple treatments, would you favour option 1 or 2 from the original post (or another method?) ?
I would try both 1 and 2 from your original post. That is how we perform research.