Hello all,
Hope you’re well.
I would really appreciate it if you take some time and give me feedback on my experimental design. It is valuable to me. I am doing single nucleus RNA sequencing and using DESeq2 package for my DE analysis.
My sample information is as below:
Case: 5 (4 different regions for each individual: region1, region2, region3, region4)
Control: 4+5 (4: 2 different regions for each individual: (region1 & region2), 5: 2 different regions for each individual: (region3 & region4))
I expect that the region1 in case respond differently to the disease compared to other regions (2, 3, and 4). So, I am interested to compare case vs control in each region and how the effect of case is different in region 2, 3, and 4 compared to 1. I know I can consider the interaction of region and condition. I did so. But I got very few numbers of DEGs for interaction coefficients. I think the reason was that my sample size is small.
I was wondering which approach makes more sense:
To integrate all samples of all regions and condition, and then, using the below design model: design = ~ 0 + sex + batch + group_id + region
The issue I have here is that when I don’t consider interaction term, I cannot extract coefficient which gives logfc of comparing case vs control in each region.
To integrate each region separately, and then, build the below design model:
model: design = ~ 0 + sex + group_id
and then, see. If the 3 sets of DEGs for each region are correlated or not.
I also have some questions regarding my experimental design:
First, I was wondering if it makes sense to include “individual” as a variable to account for repeated measure as I have more than one tissue for one individual (from different regions of brain and spinal cord).
Second, the controls of regions 3, 4+ one case of region 3 were processed altogether in one batch later than other samples. I was wondering if including only one case with controls would avoid the confounding as I am comparing case vs control in each region.
I appreciate any feedback.
Thanks,
Paria
the DESeq2 vignette addresses this more or less explicitly.
more generally, there is no correct answer to your question in the abstract. there is only what maximizes statistical power for the data you have in hand.