Hi all,
I have a longitudinal study consisting of two groups of samples (Healthy vs Sick). Each subject has two time points (Baseline and Post). Because of the longitudinal nature of the study, I know I need to control for repeated sampling when looking for potential biomarkers using DESeq2.
Each individual has a individual (unique) ID associated with them.
Here is a simplified example of my data:
condition timepoint individual
healthy baseline 001
healthy post 001
healthy baseline 002
healthy post 002
sick baseline 003
sick post 003
sick baseline 004
sick post 004
Since I'm interested in the effect of Condition but want to control for Individual, first I tried:
~ individual + condition
but this results in the "matrix is not full rank" error. I read the associated DESeq2 vignette but am still having trouble understanding what terms my formula needs to have (https://www.bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#model-matrix-not-full-rank).
I can't determine whether the time point term needs to be included as well, though including it in the design formula still causes problems anyway. I feel like there is some "linear combination" I'm just not seeing or understanding here.
Any help is appreciated.
it's not full rank b/c individual and condition are confounded. I believe you would need to have at least one healthy sample from individuals 003 or 004 or one sick sample from individuals 001 or 002.
Right, but he can't do that. Patient 003 is sick. There is no healthy version of that individual.
correct - just wanted to expand on why the design set by OP was producing a not full rank model matrix
This is helpful clarification, thank you. Using the "workaround" where I changed individual numbering by group (healthy and sick) I can get DESeq to run, but now I'm thinking I need to add an interaction term (something like ~ individual + condition + individual*condition; not sure if that'll work yet).