Question

Genes With Multiple Probe Ids

3

Entering edit mode

12.5 years ago

ftp ▴ 140

Why does some genes have more than one probe_id using U133-A Affymetrix arrays. for example; creb1 has 4 probe id's:

'204312_x_at'
'204313_s_at'
'204314_s_at'
'214513_s_at'

Which one of these should I use in my analysis as CREB1? Thanks!

id • 8.1k views

ADD COMMENT • link updated 12.5 years ago by Obi Griffith 20k • written 12.5 years ago by ftp ▴ 140

score 7 · Answer 1 · 2012-10-10

7

Entering edit mode

12.5 years ago

VS ▴ 740

Probeset composition

I think the accompanying image will make it clear to you. So, here S is an individual gene sequence belonging to gene family G. Now, your creb1 would be similar to S5 gene sequence represented in this figure.

For more detailed description, read here

ADD COMMENT • link 12.5 years ago by VS ▴ 740

score 3 · Answer 2 · 2012-10-10

As Istvan and VS have explained there is some amount of redundancy on these Affy arrays at the gene locus level and often at the transcript level as well. Sometimes this can be useful for distinguishing one transcript isoform from another. In other cases you will find that the probe sets are apparently measuring the same transcript and gene but that one probe set works better than others. For these reasons, I typically analyze Affy array data at the probe set level and then only map to transcripts or genes at a late stage in analysis (i.e., after filtering, statistics, etc). This allows you to see where multiple probe sets produce the same results (perhaps increasing confidence) or do not produce the same results (indicating probe set quality issues or measurement of different isoforms). Before really believing in a probe set I often manually align the probe set sequences to the reference genome to verify that they unambiguously map to the expected gene locus. Finally, I recommend that you check out the custom CDFs provided by UMichigan. They have done a generally good job of remapping probes to new probe sets at the gene level.

score 1 · Answer 3 · 2012-10-10

1

Entering edit mode

12.5 years ago

Istvan Albert 102k

Each of them represent different regions of your gene.

Depending on the platform and gene they could correspond to the same transcript or isoforms. There is documentation with the array that describes the location that corresponds to each probe.

ADD COMMENT • link 12.5 years ago by Istvan Albert 102k

score 1 · Answer 4 · 2012-10-10

The BioConductor affy (and relatives) packages for R handle that kind of information very well. The data related to your particular array (I think it's a best-seller) is available for analysis in the R framework. I'm not updated as I've not used them for at least 5 years, but I remembered that there were some wrappers that could process a large part of the analysis, like expresso() , rma() or gcrma()... including the "probes" to "probesets" summaries.