Question on number of probes in CEL and SOFT from GEO
2
1
Entering edit mode
7.3 years ago
landscape95 ▴ 190

Hello everyone, I wonder why there is a different between the number of probes after I process the data from GEO in CEL compared to SOFT files. For example, data set GSM272923 just have 18382 probes as described in the table near the end of the page while as I downloaded CEL file, there were 54675 probes?

Your help would be really appreciated!

Dataset GSM272923 from GSE10810: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM272923

My code to process the CEL files:

  RAWstr = "GSE10810_RAW.tar"
  RAWdirectory = paste("./RAWsets", substr(RAWstr, 1, nchar(RAWstr) - 8), sep = "/")
  tardir = paste("./RAWsets", RAWstr, sep = "/")
  untar(tardir, exdir = RAWdirectory, verbose = F)
  cels = list.files(RAWdirectory, pattern = "CEL.gz", full.names = T)
  sapply(cels, gunzip)
  data.raw = ReadAffy(filenames = substr(cels, 1, nchar(cels)-3), verbose = F, cdfname = "hgu133plus2cdf")
  data.matrix = rma(data.raw)
  tmp = exprs(data.matrix)
GEO • 1.7k views
ADD COMMENT
0
Entering edit mode
7.3 years ago

The original submitters apparently filtered the data. That is not uncommon.

ADD COMMENT
0
Entering edit mode
7.3 years ago
theobroma22 ★ 1.2k

Those 18K probes could be the result of significance tests between a P-value of 0-1, and may included duplicated probes.

The paper should clue you into your questions, like what is contained in the M&M section.

ADD COMMENT

Login before adding your answer.

Traffic: 1590 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6