I am a student who wants to conduct analysis using microarray data. So, I would like to use the expression matrix values, but I need to know what each value represents.
There's an example. Suppose we have data named GSE20307. We can downloaded two files, GSE20307_RAW.tar and GSE20307_series_matrix.txt.gz. I have specified the folder and proceeded with the following code.
m_20307 <- getGEO(filename = 'GSE20307_series_matrix.txt.gz')
m_20307_anno <- pData(m_20307)
m_20307_anno <- m_20307_anno[,c('geo_accession', 'original diagnosis:ch1')]
m_20307_anno <- m_20307_anno[m_20307_anno$`original diagnosis:ch1` %in% c('healthy control', 'systemic JIA'),]
rownames(m_20307_anno) <- NULL
colnames(m_20307_anno) <- c('Sample', 'State')
m_20307_anno$State <- ifelse(m_20307_anno$State == 'healthy control', 'Normal', 'Disease')
m_20307_anno$Sample <- paste0(m_20307_anno$Sample, '.CEL')
# GSE20307 expression # Affymetrix Human Genome U133 Plus 2.0 Array
m_20307_cel <- read.celfiles(paste0('GSE20307/', m_20307_anno$Sample), pkgname = 'pd.hg.u133.plus.2')
m_20307_exp <- exprs(m_20307_cel)
I wanted to confirm the probe IDs using rownames() on the obtained m_20307_exp at the end. However, I only found a series of sequential numbers, and couldn't identify the probe IDs.
I'm curious about what could be wrong with this code.
You can find the array probe ID's in the GPL570 design: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL570
It seems that my question might not have been communicated properly, leading to misunderstandings. I expected to find probes like 100_af in the rownames() of the matrix obtained through exprs(), but there aren't any at all. So, my question was about how to address this issue and whether there might be a problem in the code.
Someone else will help you with the R code but the file you are reading above has the
_at
ID's in column 1.Thank you for suggesting alternative ways to solve the problem. Once I verify the results in R and add comments below, please feel free to review if needed.