Hello.
I'm doing some microarray analysis, and I figured out that some probe ids have not any GENE SYMBOL. Am I suppose to delete those probes or let them exist ?
The way I apply annotation info for an Affymetrix HGU133plus2 array after RMA application, is the following:
probes=row.names(expressions)
Symbols = unlist(mget(probes, hgu133plus2SYMBOL, ifnotfound=NA))
Entrez_IDs = unlist(mget(probes, hgu133plus2ENTREZID, ifnotfound=NA))
expressions=cbind(probes,Symbols,Entrez_IDs,expressions)
Is something wrong with my code or such behavior is expected ? What do you guys do with these NA genes ?
It would be helpful enough if you could explain the procedure with more details because I'm not so experienced with such tasks. Thank you very much.
You could get the probe sequences from the available Affymetrix annotation files, and then paste the sequences into a BLAT tool like the one at UCSC here to find where the probes map to the latest genome build.
I think this paper is doing exactly what you want.
https://www.biorxiv.org/content/biorxiv/early/2017/04/11/126573.full.pdf
Those probe sets that didn't map in gene symbols, were about 12000 . So I needed a more automatic way to retrieve a gene symbol for them and not one by one . So I found this site in which you configure the input as Affyid and the output as gene symbol. Then, I wrote these lines in R to retrieve the results (for anyone else that will need such a thing):
The thing now is that for many probe sets, there are more than one gene symbols. :s And maybe I have to decide randomly which one to keep :-p
EDIT: Also I found these sites that do this work.
http://biit.cs.ut.ee/gprofiler/gconvert.cgi
http://idmap.genestimuli.org/