DESeq2: error in checkFullRank(modelMatrix)
1
0
Entering edit mode
5 weeks ago
tilia • 0

Hi everyone,

I got this well known error when using DESeq2:

Error in checkFullRank(modelMatrix):
the model matrix is not full rank, so the model cannot be fit as specified. One or more variables or interaction terms in the design formula are linear combinations of the others and must be removed.

This is my colData:

SampleID    Phenotype   Batch   Condition
Sample01    LLF LLFbatch1   A
Sample02    LLF LLFbatch1   A
Sample03    LLF LLFbatch1   A
Sample04    LLF LLFbatch1   B
Sample05    LLF LLFbatch1   B
Sample06    LLF LLFbatch1   B
Sample07    LLF LLFbatch2   A
Sample08    LLF LLFbatch2   A
Sample09    LLF LLFbatch2   A
Sample10    LLF LLFbatch2   B
Sample11    LLF LLFbatch2   B
Sample12    LLF LLFbatch2   B
Sample13    WT  WTbatch1    A
Sample14    WT  WTbatch1    A
Sample15    WT  WTbatch1    A
Sample16    WT  WTbatch1    B
Sample17    WT  WTbatch1    B
Sample18    WT  WTbatch1    B
Sample19    WT  WTbatch2    A
Sample20    WT  WTbatch2    A
Sample21    WT  WTbatch2    A
Sample22    WT  WTbatch2    B
Sample23    WT  WTbatch2    B
Sample24    WT  WTbatch2    B

I was asked to explore DEGs for Condition B vs. A.

I already analyzed the phenotypes seperately, for example "B vs. A" in WT by taking into account the batch effect (~ Batch + Condition). I also tried ~Phenotype+Condition, which works, but lacks the batch information.

Now I want to consider all sources of variation, which are phenotype and the two batches for each phenotype. There are always three biological replicates for each combination of them, e.g. three times Condition 'A' for phenotype 'LLF' with batch 'LLFbatch1' (Sample 1-3).

This is my design formula:

design= ~ Phenotype + Batch + Condition

I have read the section “Model matrix not full rank” in the DESeq2 vignette and I think this might be an example of "Group-specific condition effects, individuals nested within groups". Even though I understand this example, I still don't understand why my design also produces this error. A solution was given by introducing an additional "ind.n" column. However, I am not sure how to apply this to my situation.

What can I do? Can anyone help please?

DESeqDataSet design DESeq2 RNA-Seq • 459 views
ADD COMMENT
1
Entering edit mode

Were the samples actually sequenced/generated in (a) two total batches (batch1 and batch2), or (b) four total batches (two per phenotype: LLFbatch1, LLFbatch2, WTbatch1, WTbatch2)? If it was (a), you should reformulate the batch column to be batch1 and batch2 (i.e., remove the phenotype prefix) and then the model will be full rank. If (b), then the model matrix isn't full rank because of perfect collinearity: batch is nested within phenotype.

ADD REPLY
0
Entering edit mode
5 weeks ago

The reason that fitting this model doesn't work is that your Phenotype is a linear combination of your batches. That is LLF=LLFbatch1+LLFbatch2 and WT=WTbatch1 + WTbatch2. Because of this, under a standard design the linear model is uable to distinguish what part of an effect is from phenotype, and what part of an effect is a batch effect that happens to be share by, for example, LLFbatch1 and LLFbatch2.

ADD COMMENT

Login before adding your answer.

Traffic: 2154 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6