I've performed RMA normalization of intensity data in raw files of dataset GSE1133. The output obtained after normalization is in the following format
GSM18584.CEL GSM18585.CEL GSM18586.CEL GSM18587.CEL
AFFX-18SRNAMur/X00686_3_at 10.324639 10.309749 7.978267 7.784038
AFFX-18SRNAMur/X00686_5_at 9.080051 9.401111 5.540294 5.539700
The data is from platform https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL1073
I would like to map the probe ids to gene ids. I had a look at the table presented in the above link.
The table header presents the following ids
Data table header descriptions
ID Probe Set Name
Identifier_Source Identifier_Source
Description Description
CLONE_ID clone identifier
Sequence_Type Sequence Type
SEQUENCE
SPOT_ID Column added by GEO staff to facilitate sequence tracking in Entrez GEO
GB_ACC GenBank Accession Number
I also downloaded the complete file , I could find gene names but I am not able to find mappings like Entrez gene ids. I also read that http://genome.ucsc.edu browser can be used. But I am not sure which tool has to be used from the genome browser.
Could someone suggest how to proceed?
This will not work, in this case, because the samples in which the user is interested are not from the U133 chip - they are from what seems to be a customised chip called 'GNF1M' (GPL1073).
Natasha, the easiest way is probably to download the 'Annotation SOFT table...' from HERE, read that into R, and then match up this annotation data with your expression matrix. Gene symbols are in column 3 of this annotation file.
Glad to know the answer :)