Entering edit mode
5.1 years ago
Natasha
▴
40
I am following the tutorial given here to perform RMA normalization of dataset GSE1133 using the following code in R. cdf libraries of platform gpl1073 and gpl1074 have been installed following the description given here
library(Biobase)
library(GEOquery)
library(affy)
library(limma)
library(hgu133a.db)
library(hgu133acdf)
library(gpl1073cdf)
library(gpl1074cdf)
# Filter samples of platform 1073
eset <- getGEO("GSE1133", GSEMatrix=TRUE, getGPL=TRUE)
if(length(eset) > 1)
idx <- grep("GPL1073", attr(eset, "names")) else idx <- 1
eset <- eset[[idx]]
phenoData(eset)
files <- sampleNames(eset) # only cel files of platform GPL1073
setwd("/home/GSE1133")
#Download data
getGEOSuppFiles("GSE1133")
setwd("/home/GSE1133/GSE1133")
untar("GSE1133_RAW.tar", exdir="data")
cels = list.files("data/", pattern="CEL")
pattern <- paste(files, sep="", collapse="|")
cels <- grep(pattern, cels, value=TRUE)
# RMA normalization
raw.data = ReadAffy(verbose=FALSE, filenames=cels, cdfname="gpl1073cdf")
data.rma.norm = rma(raw.data)
However, after running this code I get an error that says some of the cel files are not valid. Could someone look into this? I am accessing the files that are downloaded using GEOquery and I am not sure why this error occurs.
Error: the following are not valid files:
GSM18584.CEL.gz
GSM18585.CEL.gz
:
:
Hi I tried to unzip using (sapply(paste("data", cels, sep = "/"), gunzip) and run the code again. Get the same error for some reason
Additionally, I do not think your current working directory is correct.
Your current working directory is:
/home/GSE1133/GSE1133
but your CEL files are stored in thedata
directory.However, when you run
cels = list.files("data/", pattern="CEL")
, thecels
variable will only containing the filenames, e.g. GSM18584.CEL rather than data/GSM18584.CELWhy don't you try the following instead:
raw.data = ReadAffy(verbose=FALSE, filenames=paste("data", cels, sep = "/"), cdfname="gpl1073cdf")
Hi, Thanks a lot. That was my mistake. I could run the code and perform normalization
However, I am not sure how to map the probe ids to gene ids.
The data is from platform https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL1073
Any suggestion on how the probes can be mapped to gene ids?
That is a separate question, but the annotation is available at the link that you have provided. Click on
Download full table...
- the gene names can be found in that file, but you will have to parse them out.@Kevin Blighe Thanks a lot for the response. I created a new post for this question here.