I have a count matrix from an RNA-seq experiment that I'd like to normalize using DESeq2 and perform DE analysis on. My code is below:
dds <- DESeqDataSetFromMatrix(countData = cts,
colData = coldata,
design= ~ condition)
My experiment is performed over two time periods, week1 (with treated vs untreated) and week2, (untreated vs untreated). Samples were collected at the end of week 1 and week 2 without replacement. So essentially, week 2 we should see the reversal of any unregulated genes from week 1 (and the data is clustering this way).
I have two possible coldata files
coldata1
sample_id condition week
treated1 treated 1
treated2 treated 1
treated3 treated 1
untreated1 untreated 1
untreated2 untreated 1
untreated3 untreated 1
treated4 treated 2
treated5 treated 2
treated6 treated 2
untreated4 untreated 2
untreated5 untreated 2
untreated6 untreated 2
coldata2
sample_id condition week
treated1 treatedA 1
treated2 treatedA 1
treated3 treatedA 1
untreated1 untreated 1
untreated2 untreated 1
untreated3 untreated 1
treated4 treatedB 2
treated5 treatedB 2
treated6 treatedB 2
untreated4 untreated 2
untreated5 untreated 2
untreated6 untreated 2
So coldata2 would have three treatments instead of two. I'm a bit lost on which is better, and what the best way to fill the design section. I was thinking about making it time-series, but since the treatment was reversed, I'm not sure it's appropriate.
Any help would be greatly appreciated! Apologies if it is not clear, please let me know and I'll try to reexplain.
Edit, for clarification:
During week1: treated vs untreated samples. End of week 1: harvested half of the samples and isolated RNA, etc. During week2: untreated (were treated in week 1) vs untreated (were untreated in week 1). End of week 2: harvested rest of samples and terminated experiment.
Did you also mean treated vs untreated for week 2?
No, both were untreated. Essentially, week 2 was reversing the environmental stressor that was performed during week 1. It wasn't a staggered reversal.
The design formula depends on your research question. I suggest you to take a look at this vignette to clarify what is the best design to answer your question. Respect to
coldata2
, I suggest you collapse the factors of both variables (i.e. treatedA, treatedB, untreated, 1, and 2) into a single one by creating a new column calledgroup
.Best regards!