I am reading an old paper of Bolstad 'Probe Level Quantile Normalization of High Density Oligonucleotide Array Data'
One problem with this method is that in the tails in particular,where we might expect greater differentiation between chips,the normalized values are going to be identical
Somehow I could not find information about that by ploting the density of the intensities we might expect greater differentiation in the tails. I do understand that in the tails of the density plot the number of the corresponding intensities is smaller, so it means it is unique, doesn't it? Why we expect to find greater differentiation there?
Thanks in advance
So, just to be sure that I understood, the density plot is made at probe-level (and not at ptobeset level) and most of the probes have average intensities. Some probes have different high intensities and it results in the greater variance in the high end of the extrema. Is it right? But since the probeset expression value is measured from values of multiple probes so the quantil normalization cannot be that poor, can it?
It can, because you only average the probes after normalization. Quantile normalization in this case ends up setting the average intensity value for a given probeset. If all probes in probeset x are set to 4, my probeset average is 4. So yes, you really can lose important information. I think there were other issues with quantile normalization, but it's been a while since I read on it (all I remember is that I came to like VSN the best).
However, quantile normalization seems to be more commonplace, but this might be an issue of "the last lab published like this, we will too". I doubt a reviewer will give you flack about normalization, if they do it should be trivial to change methods. If it is a concern, I would just try them both, at least with
limma
it was trivial to just run all of the methods (I did it in parallel withsnow
).Just one small question, the probability that all probes in a probeset x are set to 4 is very small, isn't it?
Suppose the bottom quantile holds all values between 0 and 0.01, you could end up with a huge number of probes here that will all get set to the same value. This is less of a problem in the lower tails as they're already likely to be close to zero.
I think the effect is stronger for the upper limit, if you look at the difference between PM Normalized and PM Tail adjusted normalized in the paper you reference, you can see how all of the chips get pushed to the maximum value in the normalized. When you look at the tail corrected normalization, you can see that it behaves much better in the upper extrema.
Think of it as a form of lossy compression, in the middle you have a nice dynamic range of probes and normalize nicely. In the ends you can get an effect where the normalization rounds to the nearest limit. Near the bottom is made equal to the bottom, near the top is made equal to near the top. As you move from the middle the magnitude by which an intensity is to be adjusted increases.