Annotation Of Affymetrix Human Exon St. 1.0
4
5
Entering edit mode
13.8 years ago
Some ▴ 50

Dear community,

it is the first time for me working with the Affy Human Exon St. 1.0. I followed the instructions of the 'oligo' package in Bioconductor - doing the RMA with 'core' parameter. Thus, I get an ExpressionSet and an expression matrix using exprs(exonCore).

Now I wanted to map the Affy Probe Ids using Biomart - so I got a lookup table, having all ids from the HuEx St.1.0 and e.g. RefSeq IDs.

When I do a simple lookup, only 1365 out of 22011 have RefSeq IDs.

1) What am I doing wrong? Why is that?

2) How is the best way to retrieve annotation?

3) My biologists want to see the expression of several genes - one value! Not only probe ids. What is the best way to do this?

Thank you very much in advance

microarray annotation affymetrix biomart exon • 9.5k views
ADD COMMENT
7
Entering edit mode
13.8 years ago
Neilfws 49k
  1. The best way to retrieve annotation is definitely BioMart. If you are not using it already, try the Bioconductor biomaRt package. Search this site and you will find some examples of how to use it. I also wrote a couple of short tutorials: here and here.
  2. It is possible that some of your probeset IDs do not map to a RefSeq ID, but 1365/22011 seems like a very small proportion. However, note that there are ~ 1.4 million exon probeset IDs, but currently, 531 333 are represented in the BioMart database. This issue came up in a question on the aroma.affymetrix list, but take care with the code example in the answer (it contains incorrect spaces).
  3. If your biologists want to see expression values for single genes, not exons, perhaps they should not have used a platform designed specifically for exon expression! People deal with this in a couple of ways. One is to combine exon values for each gene (e.g. by taking the median). Another is to generate plots for each gene showing the exon expression values: these can be very informative. The packages GenomeGraphs and exonmap are good tools for this task.
ADD COMMENT
0
Entering edit mode

your tutorials look great, thanks

ADD REPLY
2
Entering edit mode
13.8 years ago
Bert Overduin ★ 3.7k

Hi,

Your BioMart data indeed don't make sense to me:

If I use BioMart v61, it turns out that of the 53630 annotated human genes, 21450 have a RefSeq DNA ID. Of these 21104 have a probe from the Affy Human Exon St. 1.0 mapped to them.

If I only look at the protein-coding genes, out of 21244 Ensembl genes, 18976 have a RefSeq DNA ID. Of these 18932 have a probe from the Affy Human Exon St. 1.0 mapped to them.

ADD COMMENT
0
Entering edit mode

nice stats, thanks for doing that.

ADD REPLY
1
Entering edit mode
13.8 years ago

Use the standaloneannotation functions from OneChannelGUI in BioC - you don't have to use the actual GUI, just look in the docs for those functions.

ADD COMMENT
0
Entering edit mode

thank you, i will check that out

ADD REPLY
1
Entering edit mode
6.6 years ago
popantrop ▴ 50

Your specific problem was most likely the fact that at the "core" annotation level RMA produces a file annotated with transcription cluster IDs, and biomart only speaks probeset IDs. So using biomart you can't translate from transcription cluster IDs to gene names.

I've produced a csv file, which maps from transcription cluster IDs to HGNC gene names (per most recent genecode annotations).

ADD COMMENT

Login before adding your answer.

Traffic: 2475 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6