This is a really important point.
Yes, I believe the method remains valid despite the unbalanced numbers in the design. Hopefully I can show you this is the case (and hopefully I'm not completely wrong...):
library(Biobase)
library(limma)
set.seed(1)
Unbalanced designs in standard 'limma' workflow:
Randomly sample some differential expression:
Assume all of the 500 genes have mean of 0 in normal
and mean in medium and high sampled as follows:
coefs <- matrix(c(rep(0, 500), rnorm(1000, mean = 0, sd = 10)), ncol = 3)
head(coefs)
[,1] [,2] [,3]
[1,] 0 11.3496509 8.500435
[2,] 0 11.1193185 -9.253130
[3,] 0 -8.7077763 8.935812
[4,] 0 2.1073159 -9.410097
[5,] 0 0.6939565 5.389521
[6,] 0 -16.6264885 -1.819744
So for the first gene we expect mean of: 0 in normal; 11.35 in medium; 8.50 in high
Take all genes to have sd of 0.1 within the classes so the original coefs should be easily fitted
labels <- factor(
c(rep('Normal', 111), rep('Medium', 58), rep('High', 31)),
levels = c('Normal', 'Medium', 'High')
)
design <- model.matrix(~ -1 + labels)
colnames(design) <- levels(labels)
# sample some linear model coefficients for the genes
assay.mat <- t(apply(
coefs,
1,
function(coef.row){
means = design %*% coef.row
rnorm(length(means), mean = means, sd = 0.1)
}
))
You can see that this should give you some statistical differences by eye, if needs be (not shown here):
par(mfrow = c(2,2))
for(i in 1:4){
plot(assay.mat[i, ] ~ labels, ylim = c(-20, 20))
}
Now we run the limma analysis workflow:
-- Convert assay.mat to an eset
eset <- ExpressionSet(
assayData = assay.mat
)
Note the subtle change to the sign of the contrasts (compared with your contrast matrix)
contrast.matrix <- makeContrasts(
MN = Medium - Normal,
HN = High - Normal,
HM = High - Medium,
levels = design
)
fit <- lmFit(eset, design)
fit.cont <- contrasts.fit(fit, contrast.matrix)
fit.eb <- eBayes(fit.cont)
The raw coefficients are close to what we wanted:
head(fit$coefficients, 2) # the fitted coefficients
Normal Medium High
1 -0.012117565 11.3503072 8.514598
2 -0.001331074 11.1288127 -9.240092
head(coefs[1:2, ]) # the sampled coefficients
[,1] [,2] [,3]
[1,] 0 11.34965 8.500435
[2,] 0 11.11932 -9.253130
What should the expected contrast value be?
expected.contrasts <- data.frame(
MN = coefs[, 2] - coefs[, 1],
HN = coefs[, 3] - coefs[, 1],
HM = coefs[, 3] - coefs[, 2]
)
The fitted contrasts were:
head(as.data.frame(fit.eb$coefficients), n = 2)
MN HN HM
1 11.36242 8.526716 -2.835709
2 11.13014 -9.238761 -20.368905
... and the expected contrasts were
head(expected.contrasts, n = 2)
MN HN HM
1 11.34965 8.500435 -2.849216
2 11.11932 -9.253130 -20.372448
Which is smashing, so the limma workflow holds up despite the unbalanced numbers.
The applicability of limma to proteomic datasets is outwith my knowledge though: do the data even look normal?
I have some reservations about your research question though. The most important contrasts are surely i) 'disease' vs 'normal' (which you haven't tested); and ii) severe disease vs moderate disease (which you have tested).
Just to make sure, why did you apply eBayes twice, and to fit2 (which isn't created in the code you included)?