Hi everyone,
I got this well known error when using DESeq2:
Error in checkFullRank(modelMatrix):
the model matrix is not full rank, so the model cannot be fit as specified. One or more variables or interaction terms in the design formula are linear combinations of the others and must be removed.
This is my colData:
SampleID Phenotype Batch Condition
Sample01 LLF LLFbatch1 A
Sample02 LLF LLFbatch1 A
Sample03 LLF LLFbatch1 A
Sample04 LLF LLFbatch1 B
Sample05 LLF LLFbatch1 B
Sample06 LLF LLFbatch1 B
Sample07 LLF LLFbatch2 A
Sample08 LLF LLFbatch2 A
Sample09 LLF LLFbatch2 A
Sample10 LLF LLFbatch2 B
Sample11 LLF LLFbatch2 B
Sample12 LLF LLFbatch2 B
Sample13 WT WTbatch1 A
Sample14 WT WTbatch1 A
Sample15 WT WTbatch1 A
Sample16 WT WTbatch1 B
Sample17 WT WTbatch1 B
Sample18 WT WTbatch1 B
Sample19 WT WTbatch2 A
Sample20 WT WTbatch2 A
Sample21 WT WTbatch2 A
Sample22 WT WTbatch2 B
Sample23 WT WTbatch2 B
Sample24 WT WTbatch2 B
I was asked to explore DEGs for Condition B vs. A.
I already analyzed the phenotypes seperately, for example "B vs. A" in WT by taking into account the batch effect (~ Batch + Condition). I also tried ~Phenotype+Condition, which works, but lacks the batch information.
Now I want to consider all sources of variation, which are phenotype and the two batches for each phenotype. There are always three biological replicates for each combination of them, e.g. three times Condition 'A' for phenotype 'LLF' with batch 'LLFbatch1' (Sample 1-3).
This is my design formula:
design= ~ Phenotype + Batch + Condition
I have read the section “Model matrix not full rank” in the DESeq2 vignette and I think this might be an example of "Group-specific condition effects, individuals nested within groups". Even though I understand this example, I still don't understand why my design also produces this error. A solution was given by introducing an additional "ind.n" column. However, I am not sure how to apply this to my situation.
What can I do? Can anyone help please?
Were the samples actually sequenced/generated in (a) two total batches (batch1 and batch2), or (b) four total batches (two per phenotype: LLFbatch1, LLFbatch2, WTbatch1, WTbatch2)? If it was (a), you should reformulate the batch column to be batch1 and batch2 (i.e., remove the phenotype prefix) and then the model will be full rank. If (b), then the model matrix isn't full rank because of perfect collinearity: batch is nested within phenotype.