I'm trying to decide whether to use edgeR's QLFTest or ExactTest in a 2 x 2 experimental design. I'm finding that while the two methods largely agree, there are distinct subsets of genes in my data that are only detected by one method or the other.
My experiment includes two genotypes (WT and B8) and two time intervals (0, 24 Hours). Each condition has three replicates. I have several questions I'd like to answer with this data:
Which genes have a significant difference between 0 Hours and 24 Hours? in WT samples; which genes have a significant difference between 0 and 24 hours in B8 samples?
Which genes have a significant difference at 0 Hours between WT and B8; which at 24 Hours?
- Which genes with a significant change between 0 and 24 Hours in WT remain unchanged in B8?
For the Quasi Likelihood I'm using the following design matrix:
design
WT_0H WT_24H B8_0H B8_24H
B8_0_hr_1_S9 0 0 1 0
B8_0_hr_3_S10 0 0 1 0
B8_0_hr_4_S11 0 0 1 0
B8_24hr_1_S12 0 0 0 1
B8_24hr_2_S13 0 0 0 1
B8_24hr_3_S14 0 0 0 1
WT_0_hr_1_S1 1 0 0 0
WT_0_hr_2_S2 1 0 0 0
WT_0_hr_3_S3 1 0 0 0
WT_24_hr_1_S6 0 1 0 0
WT_24_hr_2_S7 0 1 0 0
WT_24_hr_3_S8 0 1 0 0
attr(,"assign")
[1] 1 1 1 1
attr(,"contrasts")
attr(,"contrasts")$group
[1] "contr.treatment"
And contrast matrix:
Contrasts
Levels Hour_0 Hour_24 WT B8
WT_0H -1 0 -1 0
WT_24H 0 -1 1 0
B8_0H 1 0 0 -1
B8_24H 0 1 0 1
And estimating differential expression with the following call:
dge<-calcNormFactors(dge)
dge<-estimateDisp(dge, desigDEn)
qlf<-glmQLFit(dge, design)
qlt<-glmQLFTest(qlf, contrast = cmat)
deg<-as.data.frame(topTags(qlt, n=Inf))
degQ<-data.table(deg)
For the Exact Test I'm using the following calling sequence:
dge$samples$group<-factor(
paste(
dge $samples$genotype,
dge$samples$interval, sep="_"
), levels=c("WT_0H","WT_24H","B8_0H", "B8_24H")
)
et<-exactTest(dge, pair=c('WT_0H', 'WT_24H'))
deg<-as.data.frame(topTags(et, n=Inf))
degE<-data.table(deg)
For the contrast between WT 0 Hours and WT 24 Hours the exact test detects 4174 significant genes and the QLTest detects 3924 genes (FDR < 0.05, | log2 fold change| > 1).
> et_genes<-degE[FDR < 0.05 & abs(logFC) > 1, ]$gene_id
> length(et_genes)
[1] 4174
> qt_genes<-degQ[FDR < 0.05 & abs(logFC.WT) > 1, ]$gene_id
> length(qt_genes)
[1] 3924
This in and of itself is not surprising, however there are 571 genes that have an FDR < 0.05 by the exact test only, and 321 genes that have and FDR < 0.05 by the QLFTest only.
> length(setdiff(et_genes, qt_genes))
[1] 571
> length(setdiff(qt_genes, et_genes))
[1] 321
My understanding was that the QLFTest was less prone to type 1 error, and I would have expected all (or nearly all) of the genes detected using the QLFTest to be a subset of the genes detected by the exact test. I was hoping someone could identify some resources that might help me decide which testing approach is more appropriate for my data, and my questions.