Question

Microarrays, Greater Differentiation Between Chips Is In The Tails

0

Entering edit mode

10.6 years ago

tonja.r ▴ 600

I am reading an old paper of Bolstad 'Probe Level Quantile Normalization of High Density Oligonucleotide Array Data'

One problem with this method is that in the tails in particular,where we might expect greater differentiation between chips,the normalized values are going to be identical

Somehow I could not find information about that by ploting the density of the intensities we might expect greater differentiation in the tails. I do understand that in the tails of the density plot the number of the corresponding intensities is smaller, so it means it is unique, doesn't it? Why we expect to find greater differentiation there?

Thanks in advance

• 2.0k views

ADD COMMENT • link updated 10.6 years ago by pld 5.1k • written 10.6 years ago by tonja.r ▴ 600

score 0 · Answer 1 · 2014-03-13

0

Entering edit mode

10.6 years ago

pld 5.1k

If I understand you correctly: The tails are where you'll have your really high intensity probes and really low intensity probes. Probe intensities are heteroscedasticic, meaning as the average intensity of some sub-population of probes increases, so does the variance of values within that set. This means in the high end of the extrema, you'll find greater variance in intensities, and therefore you'd expect chips to behave differently at the high end of the extrema. So forcing everything to the same value could hide problems in your data, or just generally be a poor performing normalization.

As for the lower extrema, I'm not as sure. Generally speaking the variance of microarray probe data is very small at low intensities, which is why you sometimes see very strong p-values associated with very small fold changes. You might run into orders of magnitude differences (e.g. 0.001 vs 0.0001), but that is the point of normalization. I just think that you get a lossy normalization of your lower end.

If you're worried about how well your normalization is able to cope with the changing variance in your probe data, check out the vsn options in the bioconductor's limma. Variance stabilization normalization is designed to ensure the variance is relatively constant across the whole range of intensities. It is standard for some microarray-type platforms (e.g. kinome arrays), I think it is a stronger approach when dealing with gene expression microarrays.

http://www.ncbi.nlm.nih.gov/pubmed/12169536

ADD COMMENT • link 10.6 years ago by pld 5.1k

0

Entering edit mode

So, just to be sure that I understood, the density plot is made at probe-level (and not at ptobeset level) and most of the probes have average intensities. Some probes have different high intensities and it results in the greater variance in the high end of the extrema. Is it right? But since the probeset expression value is measured from values of multiple probes so the quantil normalization cannot be that poor, can it?

ADD REPLY • link 10.6 years ago by tonja.r ▴ 600

0

Entering edit mode

It can, because you only average the probes after normalization. Quantile normalization in this case ends up setting the average intensity value for a given probeset. If all probes in probeset x are set to 4, my probeset average is 4. So yes, you really can lose important information. I think there were other issues with quantile normalization, but it's been a while since I read on it (all I remember is that I came to like VSN the best).

However, quantile normalization seems to be more commonplace, but this might be an issue of "the last lab published like this, we will too". I doubt a reviewer will give you flack about normalization, if they do it should be trivial to change methods. If it is a concern, I would just try them both, at least with limma it was trivial to just run all of the methods (I did it in parallel with snow).

ADD REPLY • link 10.6 years ago by pld 5.1k

0

Entering edit mode

Just one small question, the probability that all probes in a probeset x are set to 4 is very small, isn't it?

ADD REPLY • link 10.6 years ago by tonja.r ▴ 600

0

Entering edit mode

Suppose the bottom quantile holds all values between 0 and 0.01, you could end up with a huge number of probes here that will all get set to the same value. This is less of a problem in the lower tails as they're already likely to be close to zero.

I think the effect is stronger for the upper limit, if you look at the difference between PM Normalized and PM Tail adjusted normalized in the paper you reference, you can see how all of the chips get pushed to the maximum value in the normalized. When you look at the tail corrected normalization, you can see that it behaves much better in the upper extrema.

Think of it as a form of lossy compression, in the middle you have a nice dynamic range of probes and normalize nicely. In the ends you can get an effect where the normalization rounds to the nearest limit. Near the bottom is made equal to the bottom, near the top is made equal to near the top. As you move from the middle the magnitude by which an intensity is to be adjusted increases.

ADD REPLY • link 10.6 years ago by pld 5.1k