Question

DESeq2 model matrix not full rank, potentially confounding factors?

0

Entering edit mode

20 months ago

sg197 ▴ 40

Hi,

I'm having trouble performing differential analysis using DESeq2 on a data set as I'm getting this error:

One or more variables or interaction terms in the design formula are linear combinations of the others and must be removed.

Currently my design formula is ~cell+days+group. I'm wanting to use 'cell' as a blocking factor as some samples are from the same cell line. I'm also wanting to account for the 'days' factor whilst ultimately measuring the difference in gene expression between groups using contrast

My colData is set up like this:

row.names         cell  days group
sample1        p1 16.5 A_PRE
sample2        p1 17.5 A_BLK
sample3        p2 17.0 B_PRE
sample4        p3 15.5 A_PRE
sample5        p3 16.5 A_BLK
sample6        p3 17.0   A_T
sample7        p4 17.0 B_PRE
sample8        p4 18.0 B_BLK
sample9        p5 17.5 A_BLK
sample10       p5 18.0   A_T
sample11       p6 18.5 B_BLK
sample12       p6 19.5   B_E
sample13       p7 16.0 A_BLK
sample14       p7 18.5   A_T
sample15       p8 19.0   B_E
sample16       p9 16.0 A_PRE
sample17       p9 17.5 A_BLK
sample18      p10 16.0 A_PRE
sample19      p10 17.5 A_BLK
sample20      p10 18.5   A_T
sample21      p11 16.0 A_PRE
sample22      p11 19.5   A_T
sample23      p12 20.0   B_E
sample24      p13 16.5 B_PRE
sample25      p13 18.5 B_BLK
sample26      p13 19.0   B_E
sample27      p14 19.5   B_E
sample28      p15 17.5 B_PRE
sample29      p15 19.0 B_BLK
sample30      p16 15.5 B_PRE
sample31      p16 18.0 B_BLK

I've looked over the DESeq2 vignette and I think the issue may be caused by the fact that some of the data is confounded - days 18 and 18.5 both only fall into the groups A_T and B_BLK. I've looked into adding an additional column to account for nested factors, however I'm confused how I apply this to my dataset.

Any help is really appreciated, thanks!

DESeq2 differential-expression RNA-seq • 1.3k views

ADD COMMENT • link updated 20 months ago by Asaf 10k • written 20 months ago by sg197 ▴ 40

0

Entering edit mode

Even before I get to the 18/18.5 scenario you describe, I see a problem - using all three variables results in 2 singleton groups for p1. You either need to perform DE using each factor separately or ensure that each combination has at least 3 samples (or at the very least, isn't a singleton).

ADD REPLY • link 20 months ago by Ram 45k

0

Entering edit mode

Sorry I've not come across the issue of singletons before, however each 'p' is unique to both a day and group so this stands for all p's. Why is this not an issue with regards to matched patient samples across groups, as normally there is only one sample per patient in these cases?

ADD REPLY • link 20 months ago by sg197 ▴ 40

0

Entering edit mode

I don't understand your question.

Why is this not an issue with regards to...?

What do you mean by "this" and what matched patient scenario are you talking about?

ADD REPLY • link 20 months ago by Ram 45k

0

Entering edit mode

This is a very complicated study design. A deep understanding of the experiment is needed and, honestly, I would consult a statistician.

ADD REPLY • link 20 months ago by Asaf 10k