Genes With Multiple Probe Ids
4
3
Entering edit mode
12.1 years ago
ftp ▴ 140

Why does some genes have more than one probe_id using U133-A Affymetrix arrays. for example; creb1 has 4 probe id's:

'204312_x_at'
'204313_s_at'
'204314_s_at'
'214513_s_at'

Which one of these should I use in my analysis as CREB1? Thanks!

id • 7.9k views
ADD COMMENT
7
Entering edit mode
12.1 years ago
VS ▴ 740

Probeset composition

I think the accompanying image will make it clear to you. So, here S is an individual gene sequence belonging to gene family G. Now, your creb1 would be similar to S5 gene sequence represented in this figure.

For more detailed description, read here

ADD COMMENT
3
Entering edit mode
12.1 years ago

As Istvan and VS have explained there is some amount of redundancy on these Affy arrays at the gene locus level and often at the transcript level as well. Sometimes this can be useful for distinguishing one transcript isoform from another. In other cases you will find that the probe sets are apparently measuring the same transcript and gene but that one probe set works better than others. For these reasons, I typically analyze Affy array data at the probe set level and then only map to transcripts or genes at a late stage in analysis (i.e., after filtering, statistics, etc). This allows you to see where multiple probe sets produce the same results (perhaps increasing confidence) or do not produce the same results (indicating probe set quality issues or measurement of different isoforms). Before really believing in a probe set I often manually align the probe set sequences to the reference genome to verify that they unambiguously map to the expected gene locus. Finally, I recommend that you check out the custom CDFs provided by UMichigan. They have done a generally good job of remapping probes to new probe sets at the gene level.

ADD COMMENT
0
Entering edit mode

Didn't know about the custom CDFs from UMichigan. Thanks for pointing out this great resource!

ADD REPLY
1
Entering edit mode
12.1 years ago

Each of them represent different regions of your gene.

Depending on the platform and gene they could correspond to the same transcript or isoforms. There is documentation with the array that describes the location that corresponds to each probe.

ADD COMMENT
1
Entering edit mode
12.1 years ago

The BioConductor affy (and relatives) packages for R handle that kind of information very well. The data related to your particular array (I think it's a best-seller) is available for analysis in the R framework. I'm not updated as I've not used them for at least 5 years, but I remembered that there were some wrappers that could process a large part of the analysis, like expresso() , rma() or gcrma()... including the "probes" to "probesets" summaries.

ADD COMMENT

Login before adding your answer.

Traffic: 2695 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6