Why does some genes have more than one probe_id using U133-A Affymetrix arrays. for example; creb1 has 4 probe id's:
'204312_x_at'
'204313_s_at'
'204314_s_at'
'214513_s_at'
Which one of these should I use in my analysis as CREB1? Thanks!
Why does some genes have more than one probe_id using U133-A Affymetrix arrays. for example; creb1 has 4 probe id's:
'204312_x_at'
'204313_s_at'
'204314_s_at'
'214513_s_at'
Which one of these should I use in my analysis as CREB1? Thanks!
As Istvan and VS have explained there is some amount of redundancy on these Affy arrays at the gene locus level and often at the transcript level as well. Sometimes this can be useful for distinguishing one transcript isoform from another. In other cases you will find that the probe sets are apparently measuring the same transcript and gene but that one probe set works better than others. For these reasons, I typically analyze Affy array data at the probe set level and then only map to transcripts or genes at a late stage in analysis (i.e., after filtering, statistics, etc). This allows you to see where multiple probe sets produce the same results (perhaps increasing confidence) or do not produce the same results (indicating probe set quality issues or measurement of different isoforms). Before really believing in a probe set I often manually align the probe set sequences to the reference genome to verify that they unambiguously map to the expected gene locus. Finally, I recommend that you check out the custom CDFs provided by UMichigan. They have done a generally good job of remapping probes to new probe sets at the gene level.
Each of them represent different regions of your gene.
Depending on the platform and gene they could correspond to the same transcript or isoforms. There is documentation with the array that describes the location that corresponds to each probe.
The BioConductor affy (and relatives) packages for R handle that kind of information very well. The data related to your particular array (I think it's a best-seller) is available for analysis in the R framework. I'm not updated as I've not used them for at least 5 years, but I remembered that there were some wrappers that could process a large part of the analysis, like expresso() , rma() or gcrma()... including the "probes" to "probesets" summaries.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Didn't know about the custom CDFs from UMichigan. Thanks for pointing out this great resource!