Question

Trouble Understanding DESeq2 Design Formula ("Model Matrix is not Full Rank")

0

Entering edit mode

9 months ago

Rachel • 0

Hi all,

I have a longitudinal study consisting of two groups of samples (Healthy vs Sick). Each subject has two time points (Baseline and Post). Because of the longitudinal nature of the study, I know I need to control for repeated sampling when looking for potential biomarkers using DESeq2.

Each individual has a individual (unique) ID associated with them.

Here is a simplified example of my data:

condition timepoint individual
healthy      baseline   001
healthy      post       001
healthy      baseline   002
healthy      post       002
sick         baseline   003
sick         post       003
sick         baseline   004
sick         post       004

Since I'm interested in the effect of Condition but want to control for Individual, first I tried:

~ individual + condition

but this results in the "matrix is not full rank" error. I read the associated DESeq2 vignette but am still having trouble understanding what terms my formula needs to have (https://www.bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#model-matrix-not-full-rank).

I can't determine whether the time point term needs to be included as well, though including it in the design formula still causes problems anyway. I feel like there is some "linear combination" I'm just not seeing or understanding here.

Any help is appreciated.

deseq2 biomarkers 16S • 969 views

ADD COMMENT • link 9 months ago by Rachel • 0

1

Entering edit mode

it's not full rank b/c individual and condition are confounded. I believe you would need to have at least one healthy sample from individuals 003 or 004 or one sick sample from individuals 001 or 002.

ADD REPLY • link 9 months ago by jv ★ 1.8k

1

Entering edit mode

Right, but he can't do that. Patient 003 is sick. There is no healthy version of that individual.

ADD REPLY • link 9 months ago by swbarnes2 14k

0

Entering edit mode

correct - just wanted to expand on why the design set by OP was producing a not full rank model matrix

ADD REPLY • link 9 months ago by jv ★ 1.8k

0

Entering edit mode

This is helpful clarification, thank you. Using the "workaround" where I changed individual numbering by group (healthy and sick) I can get DESeq to run, but now I'm thinking I need to add an interaction term (something like ~ individual + condition + individual*condition; not sure if that'll work yet).

ADD REPLY • link 9 months ago by Rachel • 0

score 0 · Answer 1 · 2024-02-06

0

Entering edit mode

9 months ago

swbarnes2 14k

Read the vignette. Your individuals are nested in condition. So either drop individual from the analysis, or use the workaround in the vignette.

ADD COMMENT • link 9 months ago by swbarnes2 14k