Hello everyone, I wonder why there is a different between the number of probes after I process the data from GEO in CEL compared to SOFT files. For example, data set GSM272923 just have 18382 probes as described in the table near the end of the page while as I downloaded CEL file, there were 54675 probes?
Your help would be really appreciated!
Dataset GSM272923 from GSE10810: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM272923
My code to process the CEL files:
RAWstr = "GSE10810_RAW.tar"
RAWdirectory = paste("./RAWsets", substr(RAWstr, 1, nchar(RAWstr) - 8), sep = "/")
tardir = paste("./RAWsets", RAWstr, sep = "/")
untar(tardir, exdir = RAWdirectory, verbose = F)
cels = list.files(RAWdirectory, pattern = "CEL.gz", full.names = T)
sapply(cels, gunzip)
data.raw = ReadAffy(filenames = substr(cels, 1, nchar(cels)-3), verbose = F, cdfname = "hgu133plus2cdf")
data.matrix = rma(data.raw)
tmp = exprs(data.matrix)