Question

correlation between data

0

Entering edit mode

5.8 years ago

star ▴ 350

I have some ChIP-seq data from different studies and I like to normalise them based on TMM and Upperquartile methods from edge R packages and then see which method is better for my data.

As you see their normalized data are different in the table for each method but when I got a correlation and draw heatmap plot, all the value is the same.

I want to know finding correlation is a good way and why all value after cor() is the same?

drawing heatmap on the result of the correlation is correct?

> dge <- DGEList(counts=data)

> data_upperquartile <- calcNormFactors(dge, method="upperquartile")

> data_upperquartile<- data.frame(cpm(data_upperquartile,normalized.lib.sizes = TRUE))

> data_upperquartile[c(100:105),c(1:3)]

 A                                B                               C
0.1007585                        0.1230328                       0.01741683
0.1151526                        0.1730148                       0.03483366
0.1439407                        0.2268417                       0.04644487
0.1727289                        0.2768238                       0.05225048
0.1631328                        0.2460656                       0.04644487
0.1103546                        0.1461014                       0.02902805


>data_TMM <- calcNormFactors(dge, method="TMM")

>data_TMM<- data.frame(cpm(data_TMM,normalized.lib.sizes = TRUE))

> data_TMM[c(100:105),c(1:3)]


A                                 B                               C
0.09484844                        0.1153246                       0.01901974
0.10839821                        0.1621753                       0.03803947
0.13549776                        0.2126298                       0.05071930
0.16259732                        0.2594804                       0.05705921
0.15356413                        0.2306493                       0.05071930
0.10388162                        0.1369480                       0.03169956


> cor_data_upperquartile <- cor(data_upperquartile)

                    A              B                        C
 A             1.0000000          0.9878731            0.9383675
 B             0.9878731          1.0000000            0.9739410
 C             0.9383675          0.9739410            1.0000000


 >cor_data_TMM <- cor(data_TMM)

                    A              B                        C
 A             1.0000000          0.9878731            0.9383675
 B             0.9878731          1.0000000            0.9739410
 C             0.9383675          0.9739410            1.0000000

R dataframe correlation ggplot edgeR • 1.2k views

ADD COMMENT • link updated 5.8 years ago by Dinara • 0 • written 5.8 years ago by star ▴ 350

score 0 · Answer 1 · 2019-04-17

0

Entering edit mode

5.8 years ago

Dinara • 0

Normalization doesn't change the correlation. It is just a mathematical fact, that cor( x , y )=cor( ax , by ), where a and b are positive scalar values.

ADD COMMENT • link 5.8 years ago by Dinara • 0

1

Entering edit mode

As a remark, that is only true if normalization uses linear factors such as in TMM or the geometric mean approach of DESeq2. If you do something like quantile normalization or loess regression, cor will change dramatically.

ADD REPLY • link 5.7 years ago by ATpoint 86k

0

Entering edit mode

Thanks for your reply. So how can I find which method is better?

ADD REPLY • link 5.8 years ago by star ▴ 350

1

Entering edit mode

I recommend reading the csaw manual on ChIP-seq normalization. It explains the concepts quite nicely and contains code to plot MA plots to visually check the normalization "efficiency".

ADD REPLY • link 5.7 years ago by ATpoint 86k