Hey,
Download the raw data CEL files and use the oligo
and limma
packages to process these. I do not believe you need any other files other than these [CEL files].
Here is how I processed similar arrays (your CEL files will be located in a directory called SampleFiles/):
source("http://bioconductor.org/biocLite.R")
require("limma")
#'oligo' is more suited for the Gene ST Affymetrix arrays
require("oligo")
#Disable scientific notation
options(scipen=999)
targetinfo <- readTargets("Targets.txt", sep="\t")
CELFiles <- list.celfiles("SampleFiles/", full.names = TRUE)
#Raw intensity data
project <- read.celfiles(CELFiles)
#Background correct, normalize, and calculate gene expression
project.bgcorrect.norm.avg <- rma(project, background=TRUE, normalize=TRUE, target="core")
project.bgcorrect.norm.avg.Exons <- rma(project, background=TRUE, normalize=TRUE, target="probeset")
#Perform some diagnostics on the arrays
#Generate chip images to diagnose spatial artifacts for each array and a merged boxplot of intensities
pdf("Output/ChipImageQC.pdf")
image(project)
dev.off()
pdf("Output/BoxPlotQC.pdf")
par(mar=c(5,5,5,5), cex=1, cex.axis=0.8, mfrow=c(2,1))
boxplot(project, which="all", transfo=log2, main="Raw chip fluorescent intensities", names=samplenames, las=2)
boxplot(project.bgcorrect.norm.avg, transfo=log2, main="Background-corrected, RMA normalised, log2 expression values\nAll probes", names=samplenames, las=2)
dev.off()
#Write out the normalised expression values
write.table(project.bgcorrect.norm.avg, "NormalisedCounts.GeneSummarised.tsv", sep="\t", quote=FALSE)
write.table(project.bgcorrect.norm.avg.Exons, "NormalisedCounts.ExonSummarised.tsv", sep="\t", quote=FALSE)
For annotation, see the working example here: A: Affymetrix Human Genome U133 Plus 2.0 Array
Also see the biomaRt vignette (section 4.1).