How can I see probe IDs in a microarray?
1
0
Entering edit mode
9 months ago
wyt1995 ▴ 40

I am a student who wants to conduct analysis using microarray data. So, I would like to use the expression matrix values, but I need to know what each value represents.

There's an example. Suppose we have data named GSE20307. We can downloaded two files, GSE20307_RAW.tar and GSE20307_series_matrix.txt.gz. I have specified the folder and proceeded with the following code.

m_20307 <- getGEO(filename = 'GSE20307_series_matrix.txt.gz')
m_20307_anno <- pData(m_20307)
m_20307_anno <- m_20307_anno[,c('geo_accession', 'original diagnosis:ch1')]
m_20307_anno <- m_20307_anno[m_20307_anno$`original diagnosis:ch1` %in% c('healthy control', 'systemic JIA'),]
rownames(m_20307_anno) <- NULL
colnames(m_20307_anno) <- c('Sample', 'State')
m_20307_anno$State <- ifelse(m_20307_anno$State == 'healthy control', 'Normal', 'Disease')
m_20307_anno$Sample <- paste0(m_20307_anno$Sample, '.CEL')
# GSE20307 expression # Affymetrix Human Genome U133 Plus 2.0 Array
m_20307_cel <- read.celfiles(paste0('GSE20307/', m_20307_anno$Sample), pkgname = 'pd.hg.u133.plus.2')
m_20307_exp <- exprs(m_20307_cel)

I wanted to confirm the probe IDs using rownames() on the obtained m_20307_exp at the end. However, I only found a series of sequential numbers, and couldn't identify the probe IDs.

I'm curious about what could be wrong with this code.

R microarray probe Affymetrix • 904 views
ADD COMMENT
0
Entering edit mode

You can find the array probe ID's in the GPL570 design: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL570

ADD REPLY
0
Entering edit mode

It seems that my question might not have been communicated properly, leading to misunderstandings. I expected to find probes like 100_af in the rownames() of the matrix obtained through exprs(), but there aren't any at all. So, my question was about how to address this issue and whether there might be a problem in the code.

ADD REPLY
0
Entering edit mode

Someone else will help you with the R code but the file you are reading above has the _at ID's in column 1.

$ zgrep "_at" GSE20307_series_matrix.txt.gz | awk -F "\t" '{print $1}' - | head -10
!Sample_characteristics_ch1
"1007_s_at"
"1053_at"
"117_at"
"121_at"
"1255_g_at"
"1294_at"
"1316_at"
"1320_at"
"1405_i_at"
ADD REPLY
0
Entering edit mode

Thank you for suggesting alternative ways to solve the problem. Once I verify the results in R and add comments below, please feel free to review if needed.

ADD REPLY
0
Entering edit mode
8 months ago
wyt1995 ▴ 40

When using exprs() with m_20307_cel, it is natural that rownames (or probe id) cannot be seen. We only read raw data (.CEL), and then we need to process to match raw data in probe id. oligo::rma() can accomplish this, so we need to add this code first before starting exprs().

m_20307_cel <- read.celfiles(paste0('GSE20307/', m_20307_anno$Sample), pkgname = 'pd.hg.u133.plus.2')
m_20307_rma <- oligo::rma(m_20307_cel) # Add this code
m_20307_exp <- exprs(m_2681_rma)
ADD COMMENT
0
Entering edit mode

'm_20307_cel' is ExpressionFeatureSet, and m_2681_rma is ExpressionSet.

ADD REPLY

Login before adding your answer.

Traffic: 1726 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6