Converting Affymetrix Probes To Gene Ids
6
14
Entering edit mode
11.4 years ago
Josh ▴ 140

I downloaded the CGP cell line project expression data and would like to convert the affy probes to official gene symbols. It's the HG U133A v2 platform and the dataset has a total of around 22000 probes. What's the best way to do this? I tried using IDconverter, but it froze after around 100 genes. When I used DAVID to convert to official gene symbol, the results only had about 9800 genes. Using DAVID to convert to entrez returned about 24000 ids, as for some probes, multiple entrez gene ids were returned. How should I deal with these duplicated entrez ids, or is there a better way to do the conversion altogether? Thanks!

affymetrix conversion entrez • 72k views
ADD COMMENT
3
Entering edit mode

I just tried to figure it out today, The code provided by Diwan, is for Rats, it depends which type of Samples you used, Human/Rat/Mouse etc and also it depends on R and Bio conductor versions. I am using R 3.3.2 and Bioconductor 3.4. The following codes works for me, but I am not able to see all Probe IDs ( Keytype = "PROBEID") got results for only few genes.

However, Affymetrix id information is present in Thermofisher database. https://www.thermofisher.com/us/en/home/life-science/microarray-analysis/microarray-data-analysis/genechip-array-annotation-files.html

## Converting PROBEIDs to Gene name and symbols
## Depends of Organism (Human /Rat/Mice) and depends on R Version and Bioconductor version
source("http://bioconductor.org/biocLite.R")
biocLite("hgu95av2.db")
library("AnnotationDbi")
library("hgu95av2.db")    ##for Human
select(hgu95av2.db, c("1007_s_at","1053_at"), c("SYMBOL","ENTREZID", "GENENAME")) ##  This is just a trying example
PROBES<- as.character(GSE22483$ID_REF)
OUT <- select(hgu95av2.db,keys= PROBES, columns=c("SYMBOL", "ENTREZID", "GENENAME"),keytype="UNIGENE")
keytypes(hgu95av2.db)
ADD REPLY
0
Entering edit mode

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized.

ADD REPLY
0
Entering edit mode

You said that ''it depends on R version and Bioconductor version. Could you please explain what changes with the version and how to control/check that to decide on the method and to get reliable & repeatable results?

ADD REPLY
0
Entering edit mode
  1. You state first that you want official gene symbols (presumably HUGO?), but then talk about Entrez IDs.
  2. The brief answer to your question is "BioMart". Please search this site for that term, there are many answers to questions virtually identical to this one.
ADD REPLY
1
Entering edit mode

Analogous mapping questions been asked continuously (here and elsewhere) over at least a decade because no one (?) ever made a decent 3' UTR probe set that would have had much cleaner gene mappings including paralogue resolution

ADD REPLY
2
Entering edit mode

4 years on so this will never happen?

ADD REPLY
11
Entering edit mode
9.8 years ago
Diwan ▴ 650

In R, for example if I want to convert affy ids 1368587_at and 1385248_a_at (rat2302 chip) to their gene ids, I will use the following below:

library("annotate")
library("rat2302.db")    # here use your chip hgu133a.db

select(rat2302.db, c("1368587_at","1385248_a_at"), c("SYMBOL","ENTREZID", "GENENAME"))

For all probes, create a vector of probes and then use select:

PROBES<- as.character(FCMATRIX$probe)
OUT <- select(rat2302.db, PROBES, c("SYMBOL", "ENTREZID", "GENENAME"))

# Install your chip .db package from bioc
source("http://bioconductor.org/biocLite.R")

biocLite("hgu133a.db")

HTH

Diwan

ADD COMMENT
1
Entering edit mode

For anyone swaying between this and biomaRt - I've worked with biomaRt in the past and though very useful and programmatically accessible, practically the database goes down often and you frequently find yourself waiting around between queries. Downloading a database to select against like this is preferable.

ADD REPLY
0
Entering edit mode

Hi Diwan

After I install annotate package and... I run your script but I gave an error

Error in select(rat2302.db, c("1368587_at", "1385248_a_at"), c("SYMBOL",  :
  unused argument (c("SYMBOL", "ENTREZID", "GENENAME"))

I'm new in using R, please explain for me, what's the problem

ADD REPLY
3
Entering edit mode
11.4 years ago

If you are an R user, consider: http://www.bioconductor.org/packages/release/data/annotation/html/hgu133a2.db.html

Details on the use can be seen in the AnnotationDbi vignettes.

Alternatively, consider the biomaRt package and see the biomaRt user guide

ADD COMMENT
3
Entering edit mode
7.6 years ago

You can use BioMart:

library("biomaRt")
ensembl = useMart(biomart= "ensembl",dataset="hsapiens_gene_ensembl")
affy_ensembl= c("affy_hg_u133_plus_2", "ensembl_gene_id")
getBM(attributes= affy_ensembl, mart= ensembl, values = "*", uniqueRows=T)

The problem in conversion from probe ID to entrez or ensembl gene ID is, one probe ID can represent more than one ensembl gene id and visa versa.

The solution is:

  1. Get rid of a probe ID represent more than one ensembl gene ID
  2. Take the mean or max of multiple prob IDs represent one ensembl or entrez ID

Other solution is you can use Brainarray's custom cdfs. (I prefer this one)

download.file("http://mbni.org/customcdf/21.0.0/ensg.download/hgu133plus2hsensgcdf_21.0.0.tar.gz", "/home/hgu133plus2hsensgcdf")
install.packages("/home/hgu133plus2hsensgcdf",repos = NULL)
library(hgu133plus2hsensgcdf)

library(affy)
RawData=ReadAffy(verbose=TRUE, celfile.path=celfilepath, cdfname= "hgu133plus2hsensgcdf", filenames=celfilenames)
ADD COMMENT
0
Entering edit mode

How would you do this if you had already gotten the normalized gene expression?

ADD REPLY
0
Entering edit mode
ADD REPLY
1
Entering edit mode
9.8 years ago
macmath ▴ 170

Another easy way to annotate Affymetrix Probes to Gene IDS using this link

Upload your Probe list and it will give you all the needful information

Additionally it also helps in cross platform orthologs among probes

ADD COMMENT
1
Entering edit mode
7.9 years ago
jananir1803 ▴ 20
eset <- ExpressionSet(assayData=dat)

ID     <- featureNames(eset)

out <- mapIds(hgu133a.db, keys=as.character(ID), c("SYMBOL"), keytype="PROBEID")
ADD COMMENT
1
Entering edit mode
ADD COMMENT

Login before adding your answer.

Traffic: 2656 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6