Hi All,
I was reading a paper that performed differential expression analysis for each sample vs. samples from a control group. Here was the code used:
expresion<-expression[,c(controles,patientes[b])]
control<-colnames(expression)[controles]
case<-colnames(expression)[patientes[b]]
controls<-cbind(paste("Control_",1:length(control),sep=""),rep("control",length(control)))
cases<-cbind(paste("Patient_",1:length(case),sep=""),rep("case",length(case)))
targets<-rbind(controls, cases)
design<-cbind(CONTROL=c(rep(1,length(control)),rep(0,length(case))), CASE=c(rep(0,length(control)),rep(1,length(case))))
rownames(design)<-targets[,1]
cont.matrix<-makeContrasts(CASEvsCONTROL=CASE-CONTROL,levels=design)
fit<-lmFit(expresion,design) ##getting DEGs from IQR
fit2<-contrasts.fit(fit, cont.matrix)
fit2<-eBayes(fit2)
Here, basically the case/patient is just one sample compared to multiple patients in the control group. This would output a "patient-specific" profile - list of differentially expressed genes for further analysis in a patient-specific manner. Can I apply the same method for my datasets?
Thanks!
Thanks - really nice explanation. If I read the OP's correctly, s/he is subsetting the expression matrix to retain one patient and all the controls and presumably s/he's looping through the patients one by one in this way. One could partially address your concerns by avoiding to subset the data and have instead a design matrix where each patient is assigned to a distinct group and all controls assigned to the "control" group. Then you set up a contrast matrix comparing each patient vs the controls. In this way the estimates of variation are based on all samples and only the testing is done "1 patient vs all controls". Am I getting it right...?
Hmmmm.... I'm not sure how the variance is estimated in that case. I think normally the variance is estimated for each condition separately. Might need to ask @gordonsmyth's opinion.