I have a differential gene expression experiment, and get out microarray probe IDs with p-values and fold changes.
I can map them to the Entrez Gene IDs using the standard annotation platforms or the Ensembl ID mapper. However, consider the following situation:
- gene gA has got associated probes pA and pB
- gene gB has got associated probes pB and pC
- probes pA and pB show significant upregulation, pC no change
In this case, it is likely that gA is upregulated and gB is not. But if I, like is often the case, take the smallest p-value, I would assume that both gA and gB are upregulated.
So, my questions are:
- how likely is it that such a situation occurs in a given DE experiment (vs., e.g. splice variants)?
- are there tools that address this?
May be this mapping issue can be resolved by calculating the sequence similarity between the genes and probes, ie gA with pA, pB and gB with pB,pC and mapping the best matching probe to gene (may be a reciprocal blast)
Which array is this? Some of the Affymetrix arrays have multiple summarization levels. So, you can choose to summarize over genes vs. exons vs. probes. If your array is one of those, then that might be the easiest route.
It's different but fairly standard arrays (including HG-U133A). Do you have a link for the different levels you are talking about?
The xps package in R/Bioconductor does that for some arrays. I've done that with Mouse Gene 1.0 ST Arrays, for example, and presume that the HG-U133A would be similarish (though checking would be a good idea!). Some of the commands have an "exonlevel" parameter, which toggles what subset of probes to actually pay attention to (and likely other parameters, I've never looked under the hood to see how it works). The downside to XPS is that it's somewhat annoying to use if you rarely do so. Hopefully someone else knows of a nicer solution!