How to annotate probes to a GEO serie matrix?
1
0
Entering edit mode
4.9 years ago

Hello!,

I am working with a GEO series matrix file (ID > GSE48452, Platform GPL11532) that corresponds to HuGene-1_1st Affymetrix Human Gene 1.1 ST array. I want to have the probes with the annotations, for example: Gene Symbol in order to create a data table like the following:

                 Sample1       Sample2  Sample3 Sample4 Sampl5  
    #CLASS:CANCER   case    case    case    case    
    #CLASS:SEX  F   F   M   M   F   M   F   M
 Gene Symbol  
  Gene1           -3.06 -2.25   -1.15   -6.64   0.4        
    Gene2          -1.36    -0.67   -0.17   -0.97   -2.0
    Gene3           1.61    -0.27    0.71        -0.62  0.14        
    Gene4           0.93    1.29           -0.23          -0.74          -2

How can map the probes with the gene symbol mantaining the order?

I was using the following Rscript without sucess, I don't know how to proceed........

getGEOdataObjects <- function(x, getGSEobject=FALSE){
  # Make sure the GEOquery package is installed
  require("GEOquery")
  # Use the getGEO() function to download the GEO data for the id stored in x
  GSEDATA <- getGEO(x, GSEMatrix=T, AnnotGPL=FALSE)
  # Inspect the object by printing a summary of the expression values for the first 2 columns
  print(summary(exprs(GSEDATA[[1]])[, 1:2]))

  # Get the eset object
  eset <- GSEDATA[[1]]
  # Save the objects generated for future use in the current working directory
  save(GSEDATA, eset, file=paste(x, ".RData", sep=""))

  # check whether we want to return the list object we downloaded on GEO or
  # just the eset object with the getGSEobject argument
  if(getGSEobject) return(GSEDATA) else return(eset)
}
# Store the dataset ids in a vector GEO_DATASETS just in case you want to loop through several GEO ids
GEO_DATASETS <- c("GSE48452")

# Use the function we created to return the eset object
eset <- getGEOdataObjects(GEO_DATASETS[1])
# Inspect the eset object to get the annotation GPL id
eset 
# Get the annotation GPL id (see Annotation: GPL10558)
gpl <- getGEO('GPL11532', destdir=".")
Meta(gpl)$title

# Inspect the table of the gpl annotation object
colnames(Table(gpl))

# Get the gene symbol and entrez ids to be used for annotations
Table(gpl)[1:10, c(1, 2, 6, 12)]
dim(Table(gpl))

# Get the gene expression data for all the probes with a gene symbol
geneProbes <- which(!is.na(Table(gpl)$Symbol))
probeids <- as.character(Table(gpl)$ID[geneProbes])

probes <- intersect(probeids, rownames(exprs(eset)))
length(probes)

geneMatrix <- exprs(eset)[probes, ]

inds <- which(Table(gpl)$ID %in% probes)
# Check you get the same probes
head(probes)
head(as.character(Table(gpl)$ID[inds]))

# Create the expression matrix with gene ids
geneMatTable <- cbind(geneMatrix, Table(gpl)[inds, c(1, 2, 6, 12)])
head(geneMatTable)

# Save a copy of the expression matrix as a csv file
write.csv(geneMatTable, paste(GEO_DATASETS[1], "_DataMatrix.csv", sep=""), row.names=T)

Thank you in advance for your help!!!!

R microarrays • 2.7k views
ADD COMMENT
0
Entering edit mode

Please use the formatting bar (especially the code option) to present your post better. I've done it for you this time.
code_formatting

Thank you!

ADD REPLY
0
Entering edit mode
4.9 years ago
MatthewP ★ 1.4k

I recommend use left_join function from R package dplyr.

ADD COMMENT

Login before adding your answer.

Traffic: 1724 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6