Quantile normalization and variability questions
1
0
Entering edit mode
7.8 years ago
arronar ▴ 290

Hi, I am new at biostatistics and I am interested to understand some stuff on the quantile normalization technique.

Let's assume that I have a microarray dataset that contains a control set (x4 repeats) , a treated condition (x4 repeats) and i want to look for diff expressed genes.

So the most common first step i saw on the net is to normalize data with quantile normalization technique. Before normalization data density functions looks like this :

enter image description here

By assuming now that there are no differentially expressed genes, we can apply the quantile normalization method to all of our data and make them to have the same distribution.

enter image description here

So here is where I'am getting a little bit confused. Till now i knew that if I'am going to run a t-test for a specific gene between these two conditions (control VS treated) I'm going to get a t-value on the null hypothesis distribution and then calculate the p-value. If that p-value is enough smaller from the a=5% threshold, then i could say that there is such a probability to get a difference like this or more extreme between these two conditions if my null hypothesis is true. In many cases in such situation we say that we can reject the null hypothesis and to support that this difference is due to the treatment that actually has its own (different) distribution compared to the control's.

So how is this going to work if we turn the two distributions into one with quantile normalization ?

Is right the way I'm thinking of it or am I missing something ?

If I am missing something then another way to represent variability except the density function of the conditions will probably prove that variability still exists between those two conditions. So, is there any other way to represent such information (variability) ?

Thank you.

variability quantile normalization microarray • 4.0k views
ADD COMMENT
2
Entering edit mode

This paper will really help your understanding of when to use quantile normalisation: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-015-0679-0

ADD REPLY
0
Entering edit mode

Thanks. I'm gonna read it.

ADD REPLY
0
Entering edit mode

I am not sure, but I think that you apply the quantile normalization to each distribution, and the you get in output as many distribution as you had in the input. See e.g. https://en.wikipedia.org/wiki/Quantile_normalization

ADD REPLY
0
Entering edit mode

You mean that you apply the quantile normalization on control samples and treatment samples separately ?

ADD REPLY
0
Entering edit mode

I think this is incorrect. Quantile normalisation creates a reference distribution from all your samples which you use to normalise each of them separately. This is appropriate when there is small variability across groups. When there are large differences in the distributions between your groups (i.e. samples from different tissues), then quantile normalisation can mask real differences. Instead you can use smooth quantile normalisation.

ADD REPLY
0
Entering edit mode
7.8 years ago
Fabio Marroni ★ 3.0k

No, you apply the normalization to each sample you have. If you read the link I sent you, you will understand, there is an example (Sorry, this was intended to be a comment) .

ADD COMMENT
0
Entering edit mode

yes but in wikipedia's example does not explain what each column stands for. As i said i have 4 technical replicates for control and 4 for the treated. In total lets say that i have 8 columns. Should i normalize them together or separately (controls on their own and treatment on their own)?

ADD REPLY
0
Entering edit mode

You should normalize them together. However, I read the comments by James Ashmore, and I would suggest to take them into consideration.

ADD REPLY

Login before adding your answer.

Traffic: 1950 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6