Hi all,
I'm using DEP R package to perform analysis (including DE analysis) on proteins across different conditions https://bioconductor.org/packages/devel/bioc/vignettes/DEP/inst/doc/DEP.html
The package uses limma for DE analysis
My experiment is structured as :
sample - disease_state( disease / healthy) - environment (env1 / env2)
I wish to perform DE analysis for
- env1 Vs env2 samples
- within diseased, env1 vs env2 samples
- within healthy, env1 vs env2 samples
For 1, I'm just ignoring the disease_state factor and performing differential expression analysis across conditions env1 vs env2. Is this the correct approach ?
For 2, filter out only diseased samples and then then perform differential expression analysis across conditions env1 vs env2. Is this correct ? or do all the healthy samples also need to be somehow included so as not to lose information ?
Please do share any articles which would explain the fundamentals in terms of why one of these approaches are incorrect
I'm not familiar with protein data or DEP but if the package uses limma then you can take a look at the article. A guide to creating design matrices for gene expression experiments. If your study design is a 2 by 2 experiment, simply merge the two factors into one factor. That's what the article suggests. Hope it helps.
Thank you for the suggestion. Merging the 2 factors seems to be a good approach.
However for comparison 1, env1 vs env2, is it okay to just perform with 1 factor ? Or should factors be combined and some averaging performed within the subgroups for env1 and env2 ? (i.e. some equivalent of (disease_env1 + healthy_env1)/2 vs (disease_env2 + healthy_env2)/2 contrast formula specified in 7.2 in the link you provided)