correlation between data
1
0
Entering edit mode
5.6 years ago
star ▴ 350

I have some ChIP-seq data from different studies and I like to normalise them based on TMM and Upperquartile methods from edge R packages and then see which method is better for my data.

As you see their normalized data are different in the table for each method but when I got a correlation and draw heatmap plot, all the value is the same.

  • I want to know finding correlation is a good way and why all value after cor() is the same?
  • drawing heatmap on the result of the correlation is correct?

    > dge <- DGEList(counts=data)
    
    > data_upperquartile <- calcNormFactors(dge, method="upperquartile")
    
    > data_upperquartile<- data.frame(cpm(data_upperquartile,normalized.lib.sizes = TRUE))
    
    > data_upperquartile[c(100:105),c(1:3)]
    
     A                                B                               C
    0.1007585                        0.1230328                       0.01741683
    0.1151526                        0.1730148                       0.03483366
    0.1439407                        0.2268417                       0.04644487
    0.1727289                        0.2768238                       0.05225048
    0.1631328                        0.2460656                       0.04644487
    0.1103546                        0.1461014                       0.02902805
    
    
    >data_TMM <- calcNormFactors(dge, method="TMM")
    
    >data_TMM<- data.frame(cpm(data_TMM,normalized.lib.sizes = TRUE))
    
    > data_TMM[c(100:105),c(1:3)]
    
    
    A                                 B                               C
    0.09484844                        0.1153246                       0.01901974
    0.10839821                        0.1621753                       0.03803947
    0.13549776                        0.2126298                       0.05071930
    0.16259732                        0.2594804                       0.05705921
    0.15356413                        0.2306493                       0.05071930
    0.10388162                        0.1369480                       0.03169956
    
    
    > cor_data_upperquartile <- cor(data_upperquartile)
    
                        A              B                        C
     A             1.0000000          0.9878731            0.9383675
     B             0.9878731          1.0000000            0.9739410
     C             0.9383675          0.9739410            1.0000000
    
    
     >cor_data_TMM <- cor(data_TMM)
    
                        A              B                        C
     A             1.0000000          0.9878731            0.9383675
     B             0.9878731          1.0000000            0.9739410
     C             0.9383675          0.9739410            1.0000000
    
R dataframe correlation ggplot edgeR • 1.2k views
ADD COMMENT
0
Entering edit mode
5.6 years ago
Dinara • 0

Normalization doesn't change the correlation. It is just a mathematical fact, that cor( x , y )=cor( ax , by ), where a and b are positive scalar values.

ADD COMMENT
1
Entering edit mode

As a remark, that is only true if normalization uses linear factors such as in TMM or the geometric mean approach of DESeq2. If you do something like quantile normalization or loess regression, cor will change dramatically.

ADD REPLY
0
Entering edit mode

Thanks for your reply. So how can I find which method is better?

ADD REPLY
1
Entering edit mode

I recommend reading the csaw manual on ChIP-seq normalization. It explains the concepts quite nicely and contains code to plot MA plots to visually check the normalization "efficiency".

ADD REPLY

Login before adding your answer.

Traffic: 2678 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6