In the past, I've used GeneSpring to analyze microarray data. Now I'm teaching myself how to do it all (normalization, statistical testing and clustering) with R. I've worked out everything up to the very end and have several specific questions. Also, please let me know if there's an easier or better way to do something.
I started off with a simple experiment: 25 CEL files - 9 normal samples, 16 Parkinson's disease samples.
http://www.ncbi.nlm.nih.gov/projects/geo/query/acc.cgi?acc=GSE7621
My goal is to RMA the data, perform a t-test and cluster the results at a given statistical cutoff.
Here's the code I've been using:
library(simpleaffy)
raw.data<-read.affy("sampledesc.txt")
rma.data<-call.exprs(raw.data,"rma")
results<-pairwise.comparison(rma.data,"class",c("normal","pd"),raw.data)
results.f<-pairwise.filter(results,tt=0.1,fc=1)
hmap.pc(results.f,rma.data,col="bwr",scluster=standard.pearson,pcluster=standard.pearson))
I can't figure out how to cluster the samples using euclidian distance. I can't find any documentation on the different types of metrics available or how to call them (scluster=standard.euclidian doesn't work). Also, I'd like to reference the expression changes to the normal samples such that the normals all show expression around 0 and the PD samples reflect changes up or down from normal. I'm not sure how to do this in R.
For a more complex data set, I couldn't figure out how to use the genefilter package to do an anova (hence the use of the stats package). For this example, I'm using an experiment with 48 CEL files (3 samples at each dose and day) divided into 4 doses (control, doses 1-3) collected on 4 days.
http://www.ncbi.nlm.nih.gov/projects/geo/query/acc.cgi?acc=GSE18753
My goal is to RMA the data, perform a one-way anova to identify genes differentially expressed between days and/or doses, and cluster the results at a given statistical cutoff.
Here's the code I've been using:
library(affy)
library(simpleaffy)
library(stats)
pd<-read.AnnotatedDataFrame("sampledesc.txt",sep="\t",header=T)
raw.data<-ReadAffy(filenames=pData(pd)$FileName,phenoData=pd)
rma.data<-call.exprs(raw.data,"rma")
lm.coef<-function(y) lm(y ~ factor(Day) * factor(Dose), data=pData(pd))$coefficients
eff<-esApply(rma.data,1,lm.coef)
Everything works up to this point. However, I can't figure out how to filter the anova results (say, to only look at genes <0.001) and cluster them in a heat map.
Any examples or advice would be greatly appreciated!