Hi,
I used edgeR for differential expression analysis with 5 conditions relative to the baseline condition.
The design was a simple linear model with the condition factor variable re-ordered so that the baseline is the first value. The code was the following:
design <- model.matrix(~ condition, data = y$samples)
y <- estimateDisp(y, design, robust=TRUE)
fit <- glmFit(y,design)
conditionA_minus_base <- glmTreat(fit, coef = "conditionA", lfc = minlfc) ### coefficient corresponds to A - baseline
up_A <- rownames(y)[decideTestsDGE(conditionA_minus_base, p.value = pvalue, adj = "fdr") == 1]
down_A <- rownames(y)[decideTestsDGE(conditionA_minus_base, p.value = pvalue, adj = "fdr") == -1]
However, when I checked the genes down-regulated, they are enriched for many terms which are known to be up-regulated. All in all, directions seems reversed for a large majority of genes. I checked labeling and pre-processing steps many times. Could you please let me know if the 1 and -1 values should be the other way around?
I tested two designs against each other:
design1 <- model.matrix(~ condition, data = y$samples)
design2 <- model.matrix(~ 0 + condition, data = y$samples)
Results are the same from:
conditionA_minus_base1 <- glmTreat(fit, coef = "conditionA", lfc = minlfc) ### coefficient corresponds to A - baseline
conditionA_minus_base2 <- glmTreat(fit, contrast = c(-1,1,0,0,0,0), lfc = minlfc)
where contrast = c(-1,1,0,0,0,0) coresponds to -1Baseline + 1 Condition A
Thank you.
As long as the ordering is correct then what you're doing should work. The most common mistake here is when making the
condition
column iny$samples
. Triple check that nothing is swapped there (hint: if you aren't already, load this from a text file).