Question

Error when using edgeR with only two samples

0

Entering edit mode

2.5 years ago

Apex92 ▴ 320

Dear all,

I have only two samples (control vs KD) from an RNA-Seq experiment and want to see which genes are differentially expressed in the experiment condition.

Based on what has been discussed here I take the approach below:

designMat <- matrix(1,2,1)
dgList <- estimateGLMCommonDisp(dgList, design=designMat, method="deviance", robust = TRUE, subset=NULL)

fit <- glmFit(dgList)
lrt <- glmLRT(fit)

edgeR_result <- topTags(lrt, n=Inf)

Everything works fine except the lrt <- glmLRT(fit) which gives me this error:

Error in glmLRT(fit) : 
  Need at least two columns for design, usually the first is the intercept column

Got stuck with this for a while and could not find any solution for it so far - could you please help me with this?

By the way: I also tried with designMat <- matrix(1,2,2) but this gives dispersion error setting dispersion to NA.

Thank you in advance.

RNA-seq DGE sequencing edgeR • 1.0k views

ADD COMMENT • link 2.5 years ago by Apex92 ▴ 320

score 4 · Answer 1 · 2022-06-07

Please read Section 2.12 "What to do if you have no replicates" of the edgeR User's Guide. You are trying to implement the third suggestion of that section. Please note the caveats of that section: using edgeR without replicates cannot be considered to be reliable, although it is nevertheless much better than the common practice of using Poisson regression in that circumstance.

The point of the suggestion is that you have to simplify the design matrix temporarily in order to estimate the dispersion, but then you go back to the full design matrix to find DE genes. So you need

group <- factor(c("control","KD"))
design <- model.matrix(~group)
design.reduced <- matrix(1,2,1)
dgList <- estimateGLMCommonDisp(dgList, design.reduced,
               method="deviance", robust = TRUE, subset=NULL)
fit <- glmFit(dgList, design)
lrt <- glmLRT(fit)
edgeR_result <- topTags(lrt, n=Inf)

Here is a complete example. In this simulation, only the first gene is differentially expressed.

> ngenes <- 10000
> mu <- matrix(50, ngenes, 2)
> mu[1,] <- c(10,1000)
> phi <- 0.1
> counts <- matrix(rnbinom(ngenes*2, mu=mu, size=1/phi),ngenes,2)
> dgList <- DGEList(counts)

Then the analysis. Only the first gene is found to be significantly DE:

> dgList <- calcNormFactors(dgList)
> group <- factor(c("control","KD"))
> design <- model.matrix(~group)
> design.reduced <- matrix(1,2,1)
> dgList <- estimateGLMCommonDisp(dgList, design.reduced,
+           method="deviance", robust = TRUE, subset=NULL)
> fit <- glmFit(dgList, design)
> lrt <- glmLRT(fit)
> topTags(lrt)
Coefficient:  groupKD 
     logFC logCPM   LR   PValue      FDR
1     6.82  10.03 56.9 4.58e-14 4.58e-10
2219  3.68   5.93 15.8 7.02e-05 3.19e-01
5944  3.04   6.97 14.6 1.36e-04 3.19e-01
4489  3.15   6.55 14.4 1.44e-04 3.19e-01
4253 -3.06   6.75 14.3 1.60e-04 3.19e-01
2301 -2.92   6.63 13.1 2.99e-04 4.98e-01
5643  2.81   6.77 12.5 4.10e-04 5.29e-01
6091 -2.70   7.20 12.4 4.23e-04 5.29e-01
1834 -2.80   6.25 11.4 7.36e-04 7.04e-01
5112 -2.79   6.09 11.0 8.99e-04 7.04e-01