Question

Human Exon 1.0 ST Probesets with multiple gene symbols associated with them

0

Entering edit mode

8.6 years ago

lcordeiro ▴ 40

Hello everyone,

I'm learning to analyse data from Human Exon arrays and found something curious, which I don't know how to handle. I searched BioStar and couldn't find anything closely related to this issue.

I've done all processing up to generating a list of "differentially expressed probe sets" (DEPS) with RMA/limma without any problems. I run RMA at the probeset level and used biomart to get the gene annotation information based on the DEPS. (I tried the getNetAffx function as well to no avail; I still didn't know which gene symbol to choose for some probesets.)

When I looked at the annotated results I noticed that more than 600 probesets annotated to more than gene symbol (or Entrez, Emsembl, it didn't matter...). I know that the converse is absolutely fine (2 or more probesets annotating to the same gene) but wasn't expecting it to be the other way around.

I then batch-searched for annotation information directly on the NetAffx website and, still, got more than 1 gene symbol for some of the probesets.

My question is: how to choose the appropriate gene symbol for a given probeset when there are multiple hits? I'm leaning towards picking the first gene symbol returned from the NetAffx query but this seemed too crude...

Perhaps a related question would be: should I forget about analyzing data at the probeset level and simply do it at the transcript cluster (gene) level instead?

Cheers,

Leonardo

microarray annotation affymetrix • 2.3k views

ADD COMMENT • link updated 8.6 years ago by mastal511 ★ 2.1k • written 8.6 years ago by lcordeiro ▴ 40

score 0 · Answer 1 · 2016-04-22

I don't know that much about the exon arrays, but with the 3' arrays, some probesets were designed against regions where genes on opposite strands overlap at the 3' ends or the 5' ends, so that it's difficult to know which gene to assign the probeset to. Sometimes probesets would have been designed based on UniGene clusters, and the annotation of the UniGene clusters might have changed over the years, so that they might be associated with more than one gene. You can try aligning some of the probes in question to the genome with BLAT, and see where they align, if they align in more than one place.