Question

DE analysis with multiple factors

0

Entering edit mode

2.2 years ago

Abhishek • 0

Hi all,

I'm using DEP R package to perform analysis (including DE analysis) on proteins across different conditions https://bioconductor.org/packages/devel/bioc/vignettes/DEP/inst/doc/DEP.html

The package uses limma for DE analysis

My experiment is structured as :

sample - disease_state( disease / healthy) - environment (env1 / env2)

I wish to perform DE analysis for

env1 Vs env2 samples
within diseased, env1 vs env2 samples
within healthy, env1 vs env2 samples

For 1, I'm just ignoring the disease_state factor and performing differential expression analysis across conditions env1 vs env2. Is this the correct approach ?

For 2, filter out only diseased samples and then then perform differential expression analysis across conditions env1 vs env2. Is this correct ? or do all the healthy samples also need to be somehow included so as not to lose information ?

Please do share any articles which would explain the fundamentals in terms of why one of these approaches are incorrect

DE DEP limma • 1.0k views

ADD COMMENT • link 2.2 years ago by Abhishek • 0

1

Entering edit mode

I'm not familiar with protein data or DEP but if the package uses limma then you can take a look at the article. A guide to creating design matrices for gene expression experiments. If your study design is a 2 by 2 experiment, simply merge the two factors into one factor. That's what the article suggests. Hope it helps.

ADD REPLY • link 2.2 years ago by jkim ▴ 190

0

Entering edit mode

Thank you for the suggestion. Merging the 2 factors seems to be a good approach.

However for comparison 1, env1 vs env2, is it okay to just perform with 1 factor ? Or should factors be combined and some averaging performed within the subgroups for env1 and env2 ? (i.e. some equivalent of (disease_env1 + healthy_env1)/2 vs (disease_env2 + healthy_env2)/2 contrast formula specified in 7.2 in the link you provided)

ADD REPLY • link 2.2 years ago by Abhishek • 0

score 1 · Answer 1 · 2022-09-20

1

Entering edit mode

2.2 years ago

Matthias Zepper 5.0k

As a rule of thumb never exclude samples as this information is still used when calculating the observed variance - this article by Ji & Lui is a good primer on the problem. Instead, use a GLM approach and specify both factors as covariates in your model terms. If you would like to test for the most appropriate model and a possible interaction, you can use the glht function from the multicomp package.

ADD COMMENT • link 2.2 years ago by Matthias Zepper 5.0k

0

Entering edit mode

Thank you for the response. (As you suggested), so as not to exclude samples, cant I use the same limma based DEP package, but combine the 2 factors into 1 ? (based on jkim's suggestion).

ADD REPLY • link 2.2 years ago by Abhishek • 0