Hello, I am attempting to run a differential expression using limma voom on a set of 10 samples, 5 in condition A and 5 in condition B.
I'm getting some bizarre results, namely that:
- All the input genes to the program after expression filtering (9461) come out as differentially expressed
- My volcano plot is exceedingly lopsided (e.g. no negative log fold changes)
- There’s no ‘logFC’ column in the topTable output.
I'm following the documentation verbatim to the best of my knowledge.
Experimental design: samples were split and treated with either treatment A or treatment B then RNA sequenced.
- The ratio of the maximum to minimum library size is 10.
- My plan was to create a design matrix which includes a coefficient for the A vs. B difference (like p 41 of the limma voom guide)
- My plan was to run voomWithQualityWeights. The voom mean-variance trend indicates that adequate gene filtering was performed.
The ordering of the rows in the design matrix match the ordering of the rows in the counts matrix. My voom: mean-variance trend after running voomWithQualityWeights indicated that genes were adequately filtered.
counts <- read.csv("counts.txt")
dge <- DGEList(counts = counts)
# make the design matrix
A <- c()
BvsA <- c()
for (i in colnames(counts)) {
A <- append(A, 1)
if grepl("B", i, fixed = TRUE) {
BvsA <- append(BvsA, 0)
} else {
BvsA <- append(BvsA, 1)
}
}
design <- cbind(A, BvsA)
# filter lowly expressed genes
keep <- filterByExpr(dge, design)
dge <- dge[keep, , keep.lib.sizes = FALSE]
dge <- calcNormFactors(dge, method = "TMM")
vwts <- voomWithQualityWeights(counts = dge, design = design,
plot = TRUE, method = "genebygene") # dge vs logCPM
vfit2 <- lmFit(vwts)
vfit2 <- eBayes(vfit2)
voomDEG <- topTable(vfit2, number = Inf, sort.by = "B", resort.by = "logFC", p.value = 0.1, adjust = "BH")
I'm so confused why all the genes are coming out as differentially expressed. If I run raw correlation analyses on the samples (B vs. A) I get a great correlation, which would indicate that not all the genes are DEG.
Does anyone have any indications for what's going on here?
thank you so much!!! noted on the design matrix and the importance of specifying the coefficient in topTable!