Gene expression normalization sample-wise or feature-wise? which one is the recommended way?
0
0
Entering edit mode
21 months ago
tyasird ▴ 10

Dear Biostars users,

I would like to ask question about z-score normalization (standardization) on gene-expression data.
As you can aware from the title, I would like to ask which one is the good way to normalize gene expression data?

If I check examples for gene-expression data on the internet usually people use sample-wise normalization, however, when I check the examples on the machine-learning examples or any other examples people usually use feature-wise normalization.

I wonder what is the clear difference between these two methods?

so lets say we have DF like this;

      sample_0  sample_1 sample_2 sample_3
gene0   5.1 3.5 1.4 0.2  
gene1   4.9 3.0 1.4 0.2
gene2   4.7 3.2 1.3 0.2
gene3   4.6 3.1 1.5 0.2
gene4   5.0 3.6 1.4 0.2
... ... ... ... ...
gene145 6.7 3.0 5.2 2.3
gene146 6.3 2.5 5.0 1.9
gene147 6.5 3.0 5.2 2.0
gene148 6.2 3.4 5.4 2.3
gene149 5.9 3.0 5.1 1.8

This is the sample-wise z-score normalization (calculate mean of each sample and subtract from data)

        sample_0    sample_1    sample_2    sample_3
gene0   -0.900681   1.019004    -1.340227   -1.315444
gene1   -1.143017   -0.131979   -1.340227   -1.315444
gene2   -1.385353   0.328414    -1.397064   -1.315444
gene3   -1.506521   0.098217    -1.283389   -1.315444
gene4   -1.021849   1.249201    -1.340227   -1.315444
... ... ... ... ...
gene145 1.038005    -0.131979   0.819596    1.448832
gene146 0.553333    -1.282963   0.705921    0.922303
gene147 0.795669    -0.131979   0.819596    1.053935
gene148 0.432165    0.788808    0.933271    1.448832
gene149 0.068662    -0.131979   0.762758    0.79067

and this is the feature-wise z-score normalization (calculate mean of each feature(gene) and subtract from data)

       sample_0         sample_1    sample_2    sample_3
gene0   1.351023    0.503322    -0.609285   -1.245060
gene1   1.431365    0.354298    -0.552705   -1.232958
gene2   1.358472    0.491362    -0.606977   -1.242858
gene3   1.358655    0.452885    -0.513270   -1.298270
gene4   1.311925    0.562254    -0.615801   -1.258377
... ... ... ... ...
gene145 1.370869    -0.742554   0.514076    -1.142391
gene146 1.321102    -0.792661   0.597972    -1.126413
gene147 1.311682    -0.662893   0.578268    -1.227057
gene148 1.208577    -0.596232   0.692918    -1.305264
gene149 1.195060    -0.582209   0.704779    -1.317631

150 rows × 4 columns

normalization gene-expression • 811 views
ADD COMMENT
1
Entering edit mode

I think you need to be clear on the difference between normalisation and standardization. z-score is not a good way to normalise gene expression data. However, it can be useful in some circumstances to standardize already normalised data. There is no one recommended way (or even whether to do standardisation at all), and it depends on the purpose of your analysis.

ADD REPLY
0
Entering edit mode

Side note: The word is subtract, not substract. - there's no s in the middle. I've corrected the word in your post.

ADD REPLY

Login before adding your answer.

Traffic: 2760 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6