Entering edit mode
4.7 years ago
tianshenbio
▴
180
I have a dataset associated with two factors(Stage and Form). Stage has 4 conditions and Form has two conditions:
Stage Form
DS1_Wr60 Wr60 DS
DS2_Wr60 Wr60 DS
DS3_Wr60 Wr60 DS
DS4_Wr60 Wr60 DS
WS1_Wr60 Wr60 WS
WS2_Wr60 Wr60 WS
WS3_Wr60 Wr60 WS
WS4_Wr60 Wr60 WS
DS1_PP50 PP50 DS
DS2_PP50 PP50 DS
DS3_PP50 PP50 DS
DS4_PP50 PP50 DS
WS1_PP50 PP50 WS
WS2_PP50 PP50 WS
WS3_PP50 PP50 WS
WS4_PP50 PP50 WS
DS1_P15 P15 DS
DS2_P15 P15 DS
DS3_P15 P15 DS
DS4_P15 P15 DS
WS1_P15 P15 WS
WS2_P15 P15 WS
WS3_P15 P15 WS
WS4_P15 P15 WS
DS1_P50 P50 DS
DS2_P50 P50 DS
DS3_P50 P50 DS
DS4_P50 P50 DS
WS1_P50 P50 WS
WS2_P50 P50 WS
WS3_P50 P50 WS
WS4_P50 P50 WS
I tried to get DE genes between different stages using two different design:
1. design = ~ Stage
2. design = ~ Stage+Form
Results:
1. > resultsNames(dds_out)
[1] "Intercept" "Stage_P50_vs_P15" "Stage_PP50_vs_P15" "Stage_Wr60_vs_P15"
2. > resultsNames(dds_out)
[1] "Intercept" "Stage_P50_vs_P15" "Stage_PP50_vs_P15" "Stage_Wr60_vs_P15"
[5] "Form_WS_vs_DS"
I noticed that the results for the same comparison, for example 'Stage_P50_vs_P15", are different. I wonder how design 1 and 2 work? How should I design if I hope to get DE genes between Stages (consider the effect of "Stage" only)?
You posted about this before. In order to keep it now focused in this thread: What is the question you want to answer? Both designes are valid but it depends on the question. If you are interested in only Stage then use design 1. Please exactly describe what this experiment is and what you want to answer.
Hi, thank you for your reply. "Form" indicates that the organism is reared under two temperatures (WS and DS), and "Stage" indicates four developmental stages of the organism (Wr60, PP50, P15, and P50). There are four biological replications for each combination of the factors. Now I hope to find how genes are differentially expressed between two stages. Since both designs make comparison between stages, I wonder how design1 is different from design2.
Your second model assesses the effect of stage in a manner that is 'controlled' for the different baseline expression levels you'd get due to the different forms.
As per russhh, for your formula
~ Stage + Form
, what is happening is that DESeq2 is 'adjusting' the statistical inferences for yourStage
variable based onForm
. That is, in this model,Form
is treated as a covariate.This is how we 'adjust' for variables in regression modeling. Say I wanted to adjust for smoking status and menopausal status while testing Arthritis against my gene's expression, my model would be:
The p-valus for
Arthritis
will be adjusted for the estimated effects ofSmoking
andMenopause
.Thank you for your reply Kevin Blighe In your example you mentioned that the p-values for Arthritis is 'adjusted' for the effects of smoking and menopause, do you mean that the effects of smoking and menopause are 'eliminated/reduced' so that the DE result would reveal the effects of arthritis only? Is it the same as eliminating batch effect? In my case, I hope to examine the effect of stage only, but definitely Form also affects gene expression so it should be considered covariates, thus design 2 (~Stage+Form) would be more appropriate for my purpose since it eliminates the effects of Form, am I correct?
Yes, that is correct.
When we adjust for batch by including
batch
in a design formula, it is indeed the exact same as, for example, including BMI or smoking status in the design formula. However, this does not adjust the actual expression data for these covariates - it just 'adjusts' the statistical inferences that we are making of the expression data in the context of the design formula (ultimately, it is p-values that are modified). If we want to actually modify the expression data and eliminate the effects of batch or anything else, then we need to apply other methods.This is clear, thank you so much!