Question

Annotation Of Affymetrix Human Exon St. 1.0

5

Entering edit mode

14.2 years ago

Some ▴ 50

Dear community,

it is the first time for me working with the Affy Human Exon St. 1.0. I followed the instructions of the 'oligo' package in Bioconductor - doing the RMA with 'core' parameter. Thus, I get an ExpressionSet and an expression matrix using exprs(exonCore).

Now I wanted to map the Affy Probe Ids using Biomart - so I got a lookup table, having all ids from the HuEx St.1.0 and e.g. RefSeq IDs.

When I do a simple lookup, only 1365 out of 22011 have RefSeq IDs.

1) What am I doing wrong? Why is that?

2) How is the best way to retrieve annotation?

3) My biologists want to see the expression of several genes - one value! Not only probe ids. What is the best way to do this?

Thank you very much in advance

microarray annotation affymetrix biomart exon • 9.9k views

ADD COMMENT • link updated 6.9 years ago by popantrop ▴ 50 • written 14.2 years ago by Some ▴ 50

Ram · Answer 1 · 2011-03-11

The best way to retrieve annotation is definitely BioMart. If you are not using it already, try the Bioconductor biomaRt package. Search this site and you will find some examples of how to use it. I also wrote a couple of short tutorials: here and here.
It is possible that some of your probeset IDs do not map to a RefSeq ID, but 1365/22011 seems like a very small proportion. However, note that there are ~ 1.4 million exon probeset IDs, but currently, 531 333 are represented in the BioMart database. This issue came up in a question on the aroma.affymetrix list, but take care with the code example in the answer (it contains incorrect spaces).
If your biologists want to see expression values for single genes, not exons, perhaps they should not have used a platform designed specifically for exon expression! People deal with this in a couple of ways. One is to combine exon values for each gene (e.g. by taking the median). Another is to generate plots for each gene showing the exon expression values: these can be very informative. The packages GenomeGraphs and exonmap are good tools for this task.

score 2 · Answer 2 · 2011-03-11

2

Entering edit mode

14.2 years ago

Bert Overduin ★ 3.7k

Hi,

Your BioMart data indeed don't make sense to me:

If I use BioMart v61, it turns out that of the 53630 annotated human genes, 21450 have a RefSeq DNA ID. Of these 21104 have a probe from the Affy Human Exon St. 1.0 mapped to them.

If I only look at the protein-coding genes, out of 21244 Ensembl genes, 18976 have a RefSeq DNA ID. Of these 18932 have a probe from the Affy Human Exon St. 1.0 mapped to them.

ADD COMMENT • link 14.2 years ago by Bert Overduin ★ 3.7k

0

Entering edit mode

nice stats, thanks for doing that.

ADD REPLY • link 14.2 years ago by Some ▴ 50

score 1 · Answer 3 · 2011-03-11

1

Entering edit mode

14.2 years ago

Michael Imbeault ▴ 30

Use the standaloneannotation functions from OneChannelGUI in BioC - you don't have to use the actual GUI, just look in the docs for those functions.

ADD COMMENT • link 14.2 years ago by Michael Imbeault ▴ 30

0

Entering edit mode

thank you, i will check that out

ADD REPLY • link 14.2 years ago by Some ▴ 50

score 1 · Answer 4 · 2018-06-05

Your specific problem was most likely the fact that at the "core" annotation level RMA produces a file annotated with transcription cluster IDs, and biomart only speaks probeset IDs. So using biomart you can't translate from transcription cluster IDs to gene names.

I've produced a csv file, which maps from transcription cluster IDs to HGNC gene names (per most recent genecode annotations).