Multiple Correlations Or Anova
1
2
Entering edit mode
11.3 years ago
robjohn7000 ▴ 110

Hi,

I have the following data set and I'm not sure about the correct statistical tool to use - multiple correlation/ANOVA? The values in the dataset represent occurrences (%) of certain gene component to show different characteristics (ABCDEFGHIJK) in bacteria under different experimental conditions. My interest is to show whether there is a correlation in the data between any of the different conditions leading to the observations (ABCDEFGHIJK) and how to implement this in R. My problem is which stat to use to convincingly show that because of the gene components in two or more of the conditions which are well correlated with each other, the observations in bacteria were possible.

          Cond1    Cond2    Cond3    Cond4    Cond5      Cond6
    A    0   1    2    16    17    18
    B    1    3     9    23    24    25
    C    0    1    16    30    31    32
    D    0    0    23    19    20    21
    E    0    0    30    26    27    28
    F    15    16    1    33    34    35
    G    0    0    8    1    2    3
    H    0    1    15    8    9    10
    I    0    0    22    15    16    17
    J    1    2    29    22    23    24
    K    0    1    4    5    6    7

Please help!

Rob

statistics r genetics • 3.2k views
ADD COMMENT
0
Entering edit mode
11.3 years ago

Hello- If you want to show that there is a correlation between any two conditions, you could calculate all the pairwise correlations and correct the p-values for multiple testing. If your data is in form of percentage, I would either linearize it with arcsine transformation or use a non-parametric test for correlation (e.g. Spearman). Here's a sample R code.

Just a thought...

arcsine <- function(x){
    return(asin(sign(x) * sqrt(abs(x))))
}

dat<- read.table('dat.txt', header= TRUE, row.names= 1, sep= '\t')
> dat
  Cond1 Cond2 Cond3 Cond4 Cond5 Cond6
A     0     1     2    16    17    18
B     1     3     9    23    24    25
C     0     1    16    30    31    32
D     0     0    23    19    20    21
E     0     0    30    26    27    28
F    15    16     1    33    34    35
G     0     0     8     1     2     3
H     0     1    15     8     9    10
I     0     0    22    15    16    17
J     1     2    29    22    23    24
K     0     1     4     5     6     7

nr<- sum(1:(ncol(dat)-1))
dat.cor<- data.frame(condA= rep(NA, nr), condB= rep(NA, nr), cor= rep(NA, nr), pval= rep(NA, nr))

n<- 1
for(i in 1:(ncol(dat)-1)){
    for(j in (i+1):ncol(dat)){
        dat.cor$condA[n]<- colnames(dat)[i]
        dat.cor$condB[n]<- colnames(dat)[j]
        p<- cor.test(arcsine(dat[,i]/100), arcsine(dat[,j]/100), method= 'p')
        dat.cor$cor[n]<- p$estimate
        dat.cor$pval[n]<- p$p.value
        n<- n+1
    }
}
dat.cor$padj<- p.adjust(dat.cor$pval, method= 'holm')

dat.cor
   condA condB        cor         pval         padj
1  Cond1 Cond2  0.9906504 4.267460e-09 5.120952e-08
2  Cond1 Cond3 -0.4096434 2.108681e-01 1.000000e+00
3  Cond1 Cond4  0.5106776 1.084494e-01 1.000000e+00
4  Cond1 Cond5  0.5106776 1.084494e-01 1.000000e+00
5  Cond1 Cond6  0.5106776 1.084494e-01 1.000000e+00
6  Cond2 Cond3 -0.4574765 1.571276e-01 1.000000e+00
7  Cond2 Cond4  0.5257322 9.671666e-02 1.000000e+00
8  Cond2 Cond5  0.5257322 9.671666e-02 1.000000e+00
9  Cond2 Cond6  0.5257322 9.671666e-02 1.000000e+00
10 Cond3 Cond4  0.2076371 5.401187e-01 1.000000e+00
11 Cond3 Cond5  0.2076371 5.401187e-01 1.000000e+00
12 Cond3 Cond6  0.2076371 5.401187e-01 1.000000e+00
13 Cond4 Cond5  1.0000000 0.000000e+00 0.000000e+00
14 Cond4 Cond6  1.0000000 0.000000e+00 0.000000e+00
15 Cond5 Cond6  1.0000000 0.000000e+00 0.000000e+00
ADD COMMENT
0
Entering edit mode

Many thanks for your help Dario. Just a couple of questions:

Are $estimate and $p.value from these two lines variables from calling in-built functions:

dat.cor$cor[n]<- p$estimate

dat.cor$pval[n]<- p$p.value

I'm not sure if my interpretation of the output is right: example as follows:

condA condB cor pval padj 1 Cond1 Cond2 0.9906504 4.267460e-09 5.120952e-08

Considering condA and CondB, the correlation between Cond1 and Cond2 is 0.9906504 with adjusted p-value = 5.120952e-08.

Thanks

ADD REPLY
0
Entering edit mode

Hi- Sorry I couldn't reply before. I guess by now you figured it out... Anyway... Yes, p$estimate and p$p.value come from the output of cor.test. And yes again your interpretation of the columns in dat.cor is correct.

ADD REPLY

Login before adding your answer.

Traffic: 982 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6