Error when using edgeR with only two samples
1
0
Entering edit mode
2.5 years ago
Apex92 ▴ 320

Dear all,

I have only two samples (control vs KD) from an RNA-Seq experiment and want to see which genes are differentially expressed in the experiment condition.

Based on what has been discussed here I take the approach below:

designMat <- matrix(1,2,1)
dgList <- estimateGLMCommonDisp(dgList, design=designMat, method="deviance", robust = TRUE, subset=NULL)

fit <- glmFit(dgList)
lrt <- glmLRT(fit)

edgeR_result <- topTags(lrt, n=Inf)

Everything works fine except the lrt <- glmLRT(fit) which gives me this error:

Error in glmLRT(fit) : 
  Need at least two columns for design, usually the first is the intercept column

Got stuck with this for a while and could not find any solution for it so far - could you please help me with this?

By the way: I also tried with designMat <- matrix(1,2,2) but this gives dispersion error setting dispersion to NA.

Thank you in advance.

RNA-seq DGE sequencing edgeR • 1.0k views
ADD COMMENT
4
Entering edit mode
2.5 years ago
Gordon Smyth ★ 7.7k

Please read Section 2.12 "What to do if you have no replicates" of the edgeR User's Guide. You are trying to implement the third suggestion of that section. Please note the caveats of that section: using edgeR without replicates cannot be considered to be reliable, although it is nevertheless much better than the common practice of using Poisson regression in that circumstance.

The point of the suggestion is that you have to simplify the design matrix temporarily in order to estimate the dispersion, but then you go back to the full design matrix to find DE genes. So you need

group <- factor(c("control","KD"))
design <- model.matrix(~group)
design.reduced <- matrix(1,2,1)
dgList <- estimateGLMCommonDisp(dgList, design.reduced,
               method="deviance", robust = TRUE, subset=NULL)
fit <- glmFit(dgList, design)
lrt <- glmLRT(fit)
edgeR_result <- topTags(lrt, n=Inf)

Here is a complete example. In this simulation, only the first gene is differentially expressed.

> ngenes <- 10000
> mu <- matrix(50, ngenes, 2)
> mu[1,] <- c(10,1000)
> phi <- 0.1
> counts <- matrix(rnbinom(ngenes*2, mu=mu, size=1/phi),ngenes,2)
> dgList <- DGEList(counts)

Then the analysis. Only the first gene is found to be significantly DE:

> dgList <- calcNormFactors(dgList)
> group <- factor(c("control","KD"))
> design <- model.matrix(~group)
> design.reduced <- matrix(1,2,1)
> dgList <- estimateGLMCommonDisp(dgList, design.reduced,
+           method="deviance", robust = TRUE, subset=NULL)
> fit <- glmFit(dgList, design)
> lrt <- glmLRT(fit)
> topTags(lrt)
Coefficient:  groupKD 
     logFC logCPM   LR   PValue      FDR
1     6.82  10.03 56.9 4.58e-14 4.58e-10
2219  3.68   5.93 15.8 7.02e-05 3.19e-01
5944  3.04   6.97 14.6 1.36e-04 3.19e-01
4489  3.15   6.55 14.4 1.44e-04 3.19e-01
4253 -3.06   6.75 14.3 1.60e-04 3.19e-01
2301 -2.92   6.63 13.1 2.99e-04 4.98e-01
5643  2.81   6.77 12.5 4.10e-04 5.29e-01
6091 -2.70   7.20 12.4 4.23e-04 5.29e-01
1834 -2.80   6.25 11.4 7.36e-04 7.04e-01
5112 -2.79   6.09 11.0 8.99e-04 7.04e-01
ADD COMMENT
0
Entering edit mode

Dear Gordon - thank you for your complete and comprehensive answer, it helped a lot.

I agree with you that having more replicates s much better than trying edgeR with only two samples - but I have no choice rather than trying this approach.

Thanks again.

ADD REPLY

Login before adding your answer.

Traffic: 1592 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6