How to annotate GEO microarray datasets with GEOquery?
1
4
Entering edit mode
8.2 years ago
grayapply2009 ▴ 300

Hi, I read the GSE file into R as follows.

gse <- getGEO("GSE4928", GSEMatrix=TRUE)

Now I want to convert all probe IDs to gene symbols and write the entire annotated dataset back to my computer. What should I do? By the way, which dataset is stored in gse? SOFT formatted family file, MINiML formatted family file or Series Matrix File?

annotation microarray geo • 7.0k views
ADD COMMENT
3
Entering edit mode
8.2 years ago

Take a look at this code:

gse = gse[[1]] # get just the first element in the list
head(fData(gse))
symbols = fData(gse)[,'Gene Symbol']

Now, you have the gene symbols. The gse object (after choosing the first element in the list using gse[[1]]) contains the information from the Series Matrix file AND the GPL file.

ADD COMMENT
0
Entering edit mode

Thank you Sean. Now how can I export the Series Matrix file with the probes replaced by gene symbols?

ADD REPLY
1
Entering edit mode

After that, you can use:

expr_mat = exprs(gse)        # get the expression matrix
rownames(expr_mat) = symbols # Annotate the row names with gene symbols
ADD REPLY
0
Entering edit mode

Thank you, gaoce. That works perfectly. Where can I get the tutorial for this? The GEOquery document doesn't seem to have what you guys showed me above.

ADD REPLY
0
Entering edit mode

Unfortunately, this solution is not a general one due to the fact that not all GEO series have gene symbol or other annotation. What that means is that it is best to understand why the steps above work so that when you have another data set, you can follow the logic to come up with your own solution. That said, I am always interested in documentation improvements, so it you'd like to contribute, GEOquery is on github where I can accept pull requests.

ADD REPLY
0
Entering edit mode

Hi, can you help me with a similar problem? I used the code Sean put here and I change the series number to ones from the data that I want to use. But I face an error:

> symbols = fData(gse)[,'Gene Symbol']
Error in `[.data.frame`(fData(gse), , "Gene Symbol") : 
  undefined columns selected

I looked in the fData(gse) corresponding to my own GSE file and there isn't a Gene Symbol column.

ID of the dataset that I'm using is GSE65106 Could you possibly be able to help me? I would really appreciate it. Best regards

ADD REPLY

Login before adding your answer.

Traffic: 2580 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6