Hello,
I am currently having an error when I am using the estimateDisp function on edgeR, my pheno table and design matrix looks like this:
GROUP AGE GENDER
1 no_woman 26.00000 woman
2 no_man 22.18962 man
2 yes_woman 27.86650 woman
3 no_woman 21.55446 woman
4 no_woman 24.60362 woman
5 no_woman 26.47429 woman
6 no_woman 24.00000 woman
7 no_woman 29.08834 woman
8 no_woman 20.85291 woman
9 yes_man 26.28884 man
10 no_man 30.54005 man
11 no_man 25.92953 man
[...]
And that is my matrix:
Group <- y_transcripts$samples$group
Age <- y_transcripts$samples$AGE
Gender <- y_transcripts$samples$GENDER
design <- model.matrix(~0+Group+Gender+Age)
Contrasts <- makeContrasts((Groupyes_woman-Groupno_woman)-(Groupyes_man-Groupno_man),levels=colnames(design))
design
Groupno_man Groupno_woman Groupyes_man Groupyes_woman Genderwoman Age
1 0 1 0 0 1 26.00000
2 1 0 0 0 0 22.18962
3 0 0 0 1 1 27.86650
4 0 1 0 0 1 21.55446
5 0 1 0 0 1 24.60362
6 0 1 0 0 1 26.47429
7 0 1 0 0 1 24.00000
8 0 1 0 0 1 29.08834
9 0 1 0 0 1 20.85291
10 0 0 1 0 0 26.28884
11 1 0 0 0 0 30.54005
[....]
attr(,"assign")
[1] 1 1 1 1 2 3
attr(,"contrasts")
attr(,"contrasts")$Group
[1] "contr.treatment"
attr(,"contrasts")$Gender
[1] "contr.treatment"
Afterwards I filtered by minimum expression using defaults and normalized the library size
keep <- filterByExpr(y_transcripts, group=Group)
y_transcripts <- y_transcripts[keep, , keep.lib.sizes=FALSE]
y_transcripts <- calcNormFactors(y_transcripts)
I got an error when using estimateDisp
y_transcripts <- estimateDisp(y_transcripts, design, robust=TRUE)
> Error in glmFit.default(sely, design, offset = seloffset, dispersion = 0.05, :
> Design matrix not of full rank. The following coefficients not estimable:
> Genderwoman
When I exclude Gender from the design matrix it works, I don't know what is my mistake.
I might be wrong but think this comes from variable Age being numerical and not categorial, so that when you build your design matrix, it doesn't have all possible combination of factors, that is what "not of full rank" is trying to tell you. For example you have a combination:
no_woman 26.00000 woman
(whatever that is supposed to mean) but you don't have e.g.no_woman 26.00000 man
oryes_woman 26.00000 woman
. Of course you couldn't, because each age value is float and unique. You could try to leave out Age from the formula likedesign <- model.matrix(~Group+Gender)
and it might work, I am not sure if and how edgeR can deal with non-categorial variables, so you could investigate that further. If it cannot dealt with that, you could try to discretize the age into ranges.Edit :So I found this: https://support.bioconductor.org/p/109814/ with a similar setting, indicating it should work just fine. Try to follow the work-flow given there.
I have been trying to solve the issue, even without Age as a covariate I still get the same error. With the design matrix you suggested containing the intercept, its not possible to build up a contrast matrix.
I think you should follow the advice by Gordon, it seems like gender was encoded twice, if you encode group only with levels yes and no instead of yes_woman, no_woman, etc. it might work.