Hi everybody,
I am trying to compare gene expression using RNA Seq data from 50 and 50 human tissue samples with two phenotypically similar diseases that may exhibit distinct pathophysiological differences on the transcriptomic level (= my hypothesis).
Several, but not all subjects in the study provided more than one sample, in one case even one from each disease type. Thus, I would normally want to correct for the subject as a random effect.
Additional covariates include gender and a measure of disease severity.
In order to be able to incorporate both my fixed and random factors (as a blocking factor), I first used limma/voom
design <- model.matrix(~1+ targets$disease + targets$gender + targets$severity)
y <- voom(x,design,plot=F, normalize="quantile")
corfit <- duplicateCorrelation(y, design, block=targets$Patient)
y <- voom(x,design,plot=FALSE, block=targets$Patient, correlation=corfit$consensus, normalize="quantile")
fit=lmFit(y, design, block=targets$Patient, correlation=corfit$consensus)
fit <- eBayes(fit)
But ran into problems:
1) weird results with many ribosomal proteins differentially expressed (does not make sense biologically)
2) difficulty to filter properly
In both cases, I was recommended to use DESeq2 instead.
As per the recommendation here by Devon, I have first incorporated the subject factor as a fixed factor (and not random), even though I have my doubts about doing so. Because of overlaps, I could not correct for gender and subject at the same time ("not full rank"), which is not optimal either.
dds <- DESeqDataSetFromMatrix(countData = countData, colData = targets, design = ~ subject + severity + disease) # + gender does not work
Do you have any idea how to incorporate our random effect "subject" properly in DESeq2, and, if so, how to circumvent the "not full rank" error with gender/subject ?
Many thanks!
Hi Devon,
many thanks for your comment - so the model looks ok to you in its present form?
I am debating the question of whether and how to include the random factor with a several colleagues - could you point me to some information that supports your claim about the limited usefulness of random factors in the context of shrinking variance?
Many thanks!
I don't see anything obviously wrong with the model. Regarding random effects, the question becomes "What is the purpose of random effects and how is that drastically different from the shrinkage performed by common RNAseq packages...particularly given the many difficulties and issues surrounding mixed-effect models?" For more information, google "DESeq random effect" and see the posts from Mike Love and Simon Anders about this and the occasional reference therein.