I only get two samples without replicates for the edgeR analysis,but the results look unnormal,I set the common dispersion value equal to 0.4,most FDR equal to 1.
> raw.data<-read.table(file="48_50_1",header=T)
> d<-raw.data[,2:3]
> rownames(d)<-raw.data[,1]
> group<-factor(c("L","H"))
> design<-model.matrix(~group)
> d<-DGEList(counts=d,group=group)
> dim(d)
[1] 22928 2
> d <- calcNormFactors(d)
> keep <- rowSums(cpm(d)>100) >= 2
> d <- d[keep,]
> dim(d)
[1] 184 2
> d<-estimateGLMCommonDisp(d,design,method="deviance",robust="TRUE",subset=NULL)
Warning message:
In estimateGLMCommonDisp.default(y = y$counts, design = design, :
No residual df: setting dispersion to NA
> d$samples$lib.size <- colSums(d$counts)
> d
An object of class "DGEList"
$counts
low high
AATK 45 77
ABCB1 56 120
ARHGAP4 261 383
C3 920 628
COL20A1 26 33
179 more rows ...
$samples
group lib.size norm.factors
low L 62017 1.046646
high H 83398 0.955433
$AveLogCPM
[1] 9.742665 10.243595 12.110485 13.423755 8.758667
179 more elements ...
$common.dispersion
[1] NA
$design
(Intercept) groupL
1 1 1
2 1 0
attr(,"assign")
[1] 0 1
attr(,"contrasts")
attr(,"contrasts")$group
[1] "contr.treatment"
> d$common.dispersion=0.4
> et <- exactTest(d)
> top<-topTags(et)
> top
Comparison of groups: L-H
logFC logCPM PValue FDR
LOC101119889 -4.609729 10.669772 0.004828116 0.8883734
LOC101112298 1.983919 8.613781 0.201633226 1.0000000
LOC101121082 -1.577096 12.194037 0.257687080 1.0000000
PER2 1.556200 11.766520 0.265899128 1.0000000
GPT -1.374615 12.820088 0.321135663 1.0000000
LOC101121333 -1.354515 10.613348 0.336138933 1.0000000
LOC101103050 -1.416573 8.305962 0.355318326 1.0000000
LOC101112300 1.420906 7.982203 0.380212348 1.0000000
LOC101110133 -1.174812 10.631434 0.406782402 1.0000000
LOC101108558 -1.141834 12.802543 0.407262691 1.0000000
> summary(de <- decideTestsDGE(et))
[,1]
-1 0
0 184
1 0
> plotMDS(d)
Error in plotMDS.DGEList(d) : Need at least 3 columns of data
This is what the sessionInfo() gives
> sessionInfo()
R version 3.0.0 (2013-04-03)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] edgeR_3.2.4 limma_3.16.8
It was difficult for me to read the data you pasted. However, it is not uncommon that most FDR values are equal to one. It depends on the number of tests and on pvalues. You can find details on computation of FDR here A: How to estimate the false discovery rate (FDR) using bootstrap
I'm sorry, I'm just a newbie to edgeR. Because there are no replicates, after the step (d<-estimateGLMCommonDisp(d,design,method="deviance",robust="TRUE",subset=NULL) I don't get the dispersion value, so I set it handly to 0.4 as suggested by edgeR manual.But I get only 10 differential expression genes, basically no significant one gene. But the cufflinks results have 69 significant genes. So my question is that if there are some problems with this edgeR analysis.
I'm a newbie to edgeR, too. However, after a careful reading of your code, I think there might be a little misunderstanding of the manual, which led to your problem. As shown in code below, you directly set the dispersion value to 0.4,which actually should be given to BCV , since dispersion value is equal to BCV^2. In this way, your dispersion value is equal to 0.16, which will guarantee you a list of more DE genes.
According to solutions offered in chapter 2.11"What to do if you have no replicates", the standard way to manually set dispersion value includes 2 steps(,code below is copied directly from the manual,option 2):
In your case, you can try adding the first & last line into your script, and see if a larger number of DE genes are detected. Hope this can help.