Hi,
I'm having trouble performing differential analysis using DESeq2 on a data set as I'm getting this error:
One or more variables or interaction terms in the design formula are linear combinations of the others and must be removed.
Currently my design formula is ~cell+days+group
. I'm wanting to use 'cell' as a blocking factor as some samples are from the same cell line. I'm also wanting to account for the 'days' factor whilst ultimately measuring the difference in gene expression between groups using contrast
My colData is set up like this:
row.names cell days group
sample1 p1 16.5 A_PRE
sample2 p1 17.5 A_BLK
sample3 p2 17.0 B_PRE
sample4 p3 15.5 A_PRE
sample5 p3 16.5 A_BLK
sample6 p3 17.0 A_T
sample7 p4 17.0 B_PRE
sample8 p4 18.0 B_BLK
sample9 p5 17.5 A_BLK
sample10 p5 18.0 A_T
sample11 p6 18.5 B_BLK
sample12 p6 19.5 B_E
sample13 p7 16.0 A_BLK
sample14 p7 18.5 A_T
sample15 p8 19.0 B_E
sample16 p9 16.0 A_PRE
sample17 p9 17.5 A_BLK
sample18 p10 16.0 A_PRE
sample19 p10 17.5 A_BLK
sample20 p10 18.5 A_T
sample21 p11 16.0 A_PRE
sample22 p11 19.5 A_T
sample23 p12 20.0 B_E
sample24 p13 16.5 B_PRE
sample25 p13 18.5 B_BLK
sample26 p13 19.0 B_E
sample27 p14 19.5 B_E
sample28 p15 17.5 B_PRE
sample29 p15 19.0 B_BLK
sample30 p16 15.5 B_PRE
sample31 p16 18.0 B_BLK
I've looked over the DESeq2 vignette and I think the issue may be caused by the fact that some of the data is confounded - days 18 and 18.5 both only fall into the groups A_T
and B_BLK
. I've looked into adding an additional column to account for nested factors, however I'm confused how I apply this to my dataset.
Any help is really appreciated, thanks!
Even before I get to the 18/18.5 scenario you describe, I see a problem - using all three variables results in 2 singleton groups for
p1
. You either need to perform DE using each factor separately or ensure that each combination has at least 3 samples (or at the very least, isn't a singleton).Sorry I've not come across the issue of singletons before, however each 'p' is unique to both a day and group so this stands for all p's. Why is this not an issue with regards to matched patient samples across groups, as normally there is only one sample per patient in these cases?
I don't understand your question.
What do you mean by "this" and what matched patient scenario are you talking about?
This is a very complicated study design. A deep understanding of the experiment is needed and, honestly, I would consult a statistician.