Hi, there!
I am trying to build a contrast matrix, in order to run a fit linear model. It is a basic comparison between different histologic types of tumors - benign or BL; early stage; late stage. And the goal here is to investigate whether FFPE (formalin-fixed) material differs from fresh-frozen material in terms of methylation pattern (we're using the illumina's EPIC). To that end, we collected FFPE and fresh-frozen samples from the same patient.
The basic experiment looks something like this:
> clindata
Subject Material_Source Tumor_stage ID2
1 P235 FFPE Benign_or_BL P235_FFPE
2 P432 FFPE Benign_or_BL P432_FFPE
3 P421 FFPE Early P421_FFPE
4 P93 FFPE Early P93_FFPE
5 P876 FFPE Early P876_FFPE
6 P543 FFPE Late P543_FFPE
7 P532 FFPE Late P532_FFPE
8 P152 FFPE Late P152_FFPE
9 P235 Fresh Benign_or_BL P235_Fresh
10 P432 Fresh Benign_or_BL P432_Fresh
15 P421 Fresh Early P421_Fresh
16 P93 Fresh Early P93_Fresh
17 P876 Fresh Early P876_Fresh
24 P543 Fresh Late P543_Fresh
25 P532 Fresh Late P532_Fresh
26 P152 Fresh Late P152_Fresh
Where clindata$Subject
refers to patient ID; and the following 2 columns refers to the source of material and tumor stage, respectively. clindata$ID2
is a merge between values in clindata$Subject
and clindata$Material_Source
.
So, now comes my question: How to build the contrast matrix for comparison between different tumor stages, but accounting for the patient and material source variables?
My idea is the following:
#preparing data:
> TS <- factor(clindata$Material_Source)
> SubMS <- factor(clindata$ID2)
#designing the matrix:
design <- model.matrix(~0+Tumor_stage+ID2, data=clindata)
colnames(design) <- c(levels(TS), levels(SubMS)[-1])
I can run the lmFit()
and makeContrasts()
functions after that, together with the array data. Now, of course the n for each group is rather small, but this is just an example (there will be more samples added to each group on the final experiment). But my question is:
Does that design make sense?
Would you suggest anything different (e.g. (A) treat all 3 classes separately, instead of merging the 2 co-variates as one co-variate; or (B) consider only the "Subject" group as a co-variate, since the pairwise comparison would already account for one sample being FFPE and the other fresh-frozen)?
Any help is greatly appreciated here. Thanks!