Hi I am trying to create an appropriate design matrix to determine gene that are differentially expressed between cancer and normal sample. Below table is the information file for my dataset.
Sample Subtype Cancer
A Normal Normal
B Normal Normal
C Normal Normal
D stage 1 Cancer
E stage 1 Cancer
F stage 2 Cancer
G stage 2 Cancer
H stage 2 Cancer
I NA Biopsy
At the moment I create a design matrix using these command:
f=factor(information$cancer)
design=model.matrix(~f)
fit=lmFit(exp_sample,design)
fit=eBayes(fit)
However I am not sure how to construct my contrast matrix for 2 levels (compare cancer and normal) only? Also do i need to discard those biopsy data or not? It would be great if you guys can give me some advice! Many thanks