Hi all,
I'm analyzing some Affy data and not sure how to interpret what appears to be a weird MA plot. In essence, more highly expressed genes tend to have positive foldChange and vice versa. I'm not super familiar with array analysis (just using ComBat + limma). Has anyone run across something like this before? Am I doing something wrong..?
EDIT: in response to @Devon and @ddiez comments below:
To explain a bit more, I have data from 3 timepoints (0D, 1D, 7D). The problematic comparison is 1D-0D, which corresponds to the first MA plot in the thumbnails along the left side of the attached image. The second and third thumbnails are 7D-0D and 7D-1D, respectively, and these seem to confirm that there is an enrichment of high expression values in the 1D samples. Looking at a density plot of the expression values, I see that indeed 1D is depleted for values ~ 6 (green peak is lower than others in main image). Is that really the root cause of this? My off-the-cuff guess would have been that this difference is negligible!
Thanks
Yikes, my only guess is that something went seriously wrong when performing between-library normalization. First try this without combat just to make sure that nothing is going wonky there, then (presuming the problem persists) try to find some diagnostic plots to look at. Hopefully it'll be apparent from that what went wrong.
There is a clear correlation between the abundance (average log-expression) and the fold-change. This suggests to me some problem at the code level. Maybe you are not plotting what you think you are plotting (?).
Besides Devon's advice, and given that you said you are not familiar with microarray analysis, I would post the code used to obtain that plot. If you used public data also give the details. If the plot is reproducible it might be a problem with the experiment. Or, it might be a problem with the code you used to obtain the plot. But for now it is impossible to say.
@Devon @ddiez thanks for advice and comments. I tried without ComBat - same result. I then tried to look at some diagnostic plots - see attached images.
EDIT: moved to main question, as it appears I cannot link images in comments.
You should be able to. Nothing special about comments.
"and these seem to confirm that there is an enrichment of high expression values in the 1D samples" This is not the case, because you see both values with negative log2ratio and values with positive log2ratio. The issue (as @ddiez said) is that when the average expression is low, you tend to have engative log2ratio, and when the average expression is high, you tend to have positive log2ratio. Which is weird. More than this, I cannot say. Maybe you can share a subset of your data to see if other people produce the same MA-plot (in that case, the problem is before the plot) or if they obtain a "normal" MA-plot (and in this case they can probably share the MA-plot code with you).
Given that all other comparisons (that don't include the 1D timepoint) produce normal looking MA-plots, I have a hard time understanding why code would be an issue. In any case, something like below. Note that I'm using the
plotMA
function from thelimma
package, nothing home rolled.I should add that I unfortunately can't share the data as it belongs to a collaborator.