I would like to reach to the community to get some input regarding a particular experimental design that is troubling me for RNA-Seq analysis. I will discuss this with a statistician, but I also want to make sure that I am modeling it right in DESeq (edgeR or limma for that matter).
All patients are taken a biopsy at baseline of the tumor (Time=PRE), afterwards are randomized to TREATMENT or CONTROL. After some time another sample is collected (Time: POST). I want to compare the effect of the Treatment, accounting for the baseline of each patient.I have followed the DESEQ2 vignette Section: "Group-specific condition effects, individuals nested within groups". Briefly:
metadata <- data.frame(
SAMPLE = c("PATIENT6_PRE", "PATIENT6_POST", "PATIENT12_PRE", "PATIENT12_POST",
"PATIENT5_PRE", "PATIENT5_POST", "PATIENT8_PRE", "PATIENT8_POST",
"PATIENT15_PRE", "PATIENT15_POST", "PATIENT16_PRE", "PATIENT16_POST",
"PATIENT17_PRE", "PATIENT17_POST", "PATIENT19_PRE", "PATIENT19_POST"),
Time = c("PRE", "POST", "PRE", "POST", "PRE", "POST", "PRE", "POST", "PRE", "POST",
"PRE", "POST", "PRE", "POST", "PRE", "POST"),
Patient = c("PATIENT6", "PATIENT6", "PATIENT12", "PATIENT12", "PATIENT5", "PATIENT5",
"PATIENT8", "PATIENT8", "PATIENT15", "PATIENT15", "PATIENT16", "PATIENT16",
"PATIENT17", "PATIENT17", "PATIENT19", "PATIENT19"),
TUMOR_TYPE = c("A", "A", "B", "B", "C", "C", "C", "C", "D", "D", "B", "B", "D", "D", "B", "B"),
Intervention = c("TREATMENT", "TREATMENT", "TREATMENT", "TREATMENT", "TREATMENT", "TREATMENT",
"CONTROL", "CONTROL", "TREATMENT", "TREATMENT", "CONTROL", "CONTROL",
"TREATMENT", "TREATMENT", "CONTROL", "CONTROL"),
ind.n = c(1, 1, 2, 2, 3, 3, 1, 1, 4, 4, 2, 2, 5, 5, 3, 3)
)
dds<- DESeqDataSetFromMatrix(countData = round(data),
colData = metadata,
design = ~Patient + Time)
design <- model.matrix(~ Intervention + Intervention:ind.n + Intervention:Time, data = metadata)
design_clean <- design[,!colSums(design)==0]
dds <- DESeq(dds, full= design_clean, betaPrior = FALSE)
resultsNames(dds)
[1] "Intercept" "InterventionCONTROL" "InterventionTREATMENT.ind.n2" "InterventionCONTROL.ind.n2" "InterventionTREATMENT.ind.n3" "InterventionCONTROL.ind.n3"
[7] "InterventionTREATMENT.ind.n4" "InterventionTREATMENT.ind.n5" "InterventionTREATMENT.POST" "InterventionCONTROL.POST"
res <- results(dds, contrast = list(c("InterventionTREATMENT.TissueResection","InterventionCONTROL.TissueResection")))
If this is correct I would be comparing the effect of Treatment, accounting for the baseline for each patient?. I also wanted to account for the tumor type but, since I am already "blocking" the Patient term, would this be necessary?. I would really appreciate any input on how to analyze this design! Thanks a lot!.
I completely agree with most points. The PCA shows that the biggest driver of difference is the patients themselves, with PRE and POST timepoints for each patient clustering closer to each other. I also should point out that the "tumor type" is actually different subtypes of the same cancer. I am still interested to see if the treatment produced a common gene signature or some common changes among these cases. Is there any particular model you think I could you to tease this out? Thanks a lot!