Hi there, I'm new to working with EdgeR's generalized linear models, but I have a data set where I'm trying to get complex comparisons between groups, times, and treatments, so here I am...
I've figured out that the best approach for me is to use the cell means model. Right now I have three time points, two groups and two treatments at each time point.
Here is my code:
#countsDGE_d3 is my count matrix
>key_3d
sample group condition
F.3dayInjury.IP1 F.3dayInjury forelimb injury
F.3dayInjury.IP2 F.3dayInjury forelimb injury
F.3dayInjury.IP3 F.3dayInjury forelimb injury
F.Intact.IP1 F.Intact forelimb no_lesion
F.Intact.IP2 F.Intact forelimb no_lesion
F.Intact.IP3 F.Intact forelimb no_lesion
H.3dayInjury.IP1 H.3dayInjury hindlimb injury
H.3dayInjury.IP2 H.3dayInjury hindlimb injury
H.3dayInjury.IP3 H.3dayInjury hindlimb injury
H.Intact.IP1 H.Intact hindlimb no_lesion
H.Intact.IP2 H.Intact hindlimb no_lesion
H.Intact.IP3 H.Intact hindlimb no_lesion
H.Intact.IP4 H.Intact hindlimb no_lesion
design3d <- model.matrix(~0+key_3d$sample)
colnames(design3d) <- levels(key_3d$sample)
countsDGE_d3 <- calcNormFactors(countsDGE_d3)
countsDGE_d3 <- estimateDisp(countsDGE_d3, design3d, robust = TRUE)
countsDGE_d3$common.dispersion
fit <- glmQLFit(countsDGE_d3, design3d)
cont.matrix <- makeContrasts(NvsINJinFL=F.3dayInjury-F.Intact,
NvsINJinHL=H.3dayInjury-H.Intact,
Diff=(F.3dayInjury-F.Intact)-(H.3dayInjury-H.Intact),
levels=design3d)
qlf <- glmQLFTest(fit, contrast=cont.matrix[,"NvsINJinFL"])
topTags(qlf, n = 20)
table <- topTags(qlf, n = Inf)
write.csv( as.data.frame(table), file="All_FLDay3_InjuredvsIntact_results.csv")
I then repeat running glmQLFTest on all three contrasts and export the topTags table.
My question is, due to the data being pretty noisy, I used the RUVseq package to calculate the factors of unwanted variation for my samples. I would like to incorporate these into my design matrix and into my contrasts. How do I construct this model matrix and contrast matrix?
Does this make sense? But then what should my contrasts be?
design3d <- model.matrix(~0+key_3d$sample+W_1, data=pData_3d)
Where pData_3d is a matrix of the factors of unwanted variation from:
seq_set <- newSeqExpressionSet(as.matrix(dge_counts, phenoData = key$sample))
design <- model.matrix(~0 + key$sample)
colnames(design) <- levels(key$sample)
dge <- DGEList(counts = dge_counts, group = key$sample)
dge <- calcNormFactors(dge, method="upperquartile", rubust = TRUE)
dge <- estimateGLMCommonDisp(dge, design)
dge <- estimateGLMTagwiseDisp(dge, design)
fit <- glmFit(dge, design)
res <- residuals(fit, type = "deviance")
#use all the genes to estimate the factors of unwanted variation.
seqUQ <- betweenLaneNormalization(seq_set, which="upper")
controls <- rownames(seq_set)
seqRUVr <- RUVr(seqUQ, controls, k=1, res)
Thank you!