Hi, I’m doing my first RNAseq analysis and am struggling to choose the correct model for my analysis aims. The study design: I have a number of study volunteers (100), each of whom has been treated with one of five specific treatments. Samples were taken before treatment (time =1) and after treatment with three different time points (time 2-4).
Volunteer Treatment Time
1 A 1
1 A 2
1 A 3
1 A 4
2 B 1
2 B 2
2 B 3
2 B 4
…
I want to compare treatments in general as well as treatment in relation to time points.
I was considering the following models:
1.) Volunteer + treatment:time
2.) Volunteer + treatment + treatment:time
3.) Treatment + time
4.) Treatment + treatment:time
For options 1) and 2), I’m not sure how much of an issue it is that the information of treatment and volunteer is redundant to some extent (as the treatment is the same across all samples of that volunteer). However, some volunteer-specific effects can be expected and I would like to control for them. Are any of these the correct option or do you have other suggestions? I’m particularly unsure how to handle the fact that the first time point for each volunteer is essentially a control for that specific volunteer and treatment.
Thanks!
You can't include volunteer in the model. Also you'll have to determine if you expect a linear correlation between time and outcome. If you don't then use time as factor, which will result with a lot of parameters in your model, in this case I think including the interaction between treatment and time might be to much, depending on the amount of data that you have.
Thanks for your feedback. Is the reason I can't use volunteer because of the partial overlap between volunteer and treatment or due to other reasons? Would it maybe be an option to use ~Volunteer + time (which would maybe implicitly capture treatments)?
No, I don't expect a linear correlation between time and outcome. I'm already using the time points as a factor.
Take a look at this section: https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#group-specific-condition-effects-individuals-nested-within-groups Group is treatment in your case
Could you expand on why volunteer can't be used in the model. It looks entirely appropriate to include a volunteer baseline in this model to me.
Because each volunteer was treated with one treatment so
~ volunteer + treatment
would be redundant. See the link above for the section in the Vignette that deals with this problem.I would recommend using DESeq2, the updated version of DESeq. Are the treatments different dosages of the same treatment, or completely different treatments?
Thanks, I was thinking about trying both and comparing the results.
The treatments are completely different treatments.
You'll need to nest volunteer within treatment. See the link Asaf posted.
Basically, you'll use the formula
~treatment + time + treatment:voluneer + treatment:time
You'll need to renumber the voluneer so the are number 1,2,3 etc within each treatment (so you'll have a treatmentA, volunteer 1 and a treatment B volunteer 1). If you've got a different number of volunteers in each treatment group, you'll need remove empty columns with: