I have the following data set and I'm not sure about the correct statistical tool to use - multiple correlation/ANOVA? The values in the dataset represent occurrences (%) of certain gene component to show different characteristics (ABCDEFGHIJK) in bacteria under different experimental conditions. My interest is to show whether there is a correlation in the data between any of the different conditions leading to the observations (ABCDEFGHIJK) and how to implement this in R. My problem is which stat to use to convincingly show that because of the gene components in two or more of the conditions which are well correlated with each other, the observations in bacteria were possible.
Cond1 Cond2 Cond3 Cond4 Cond5 Cond6
A 0 1 2 16 17 18
B 1 3 9 23 24 25
C 0 1 16 30 31 32
D 0 0 23 19 20 21
E 0 0 30 26 27 28
F 15 16 1 33 34 35
G 0 0 8 1 2 3
H 0 1 15 8 9 10
I 0 0 22 15 16 17
J 1 2 29 22 23 24
K 0 1 4 5 6 7
Hello- If you want to show that there is a correlation between any two conditions, you could calculate all the pairwise correlations and correct the p-values for multiple testing. If your data is in form of percentage, I would either linearize it with arcsine transformation or use a non-parametric test for correlation (e.g. Spearman). Here's a sample R code.
Hi- Sorry I couldn't reply before. I guess by now you figured it out... Anyway... Yes, p$estimate and p$p.value come from the output of cor.test. And yes again your interpretation of the columns in dat.cor is correct.
Many thanks for your help Dario. Just a couple of questions:
Are $estimate and $p.value from these two lines variables from calling in-built functions:
dat.cor$cor[n]<- p$estimate
dat.cor$pval[n]<- p$p.value
I'm not sure if my interpretation of the output is right: example as follows:
condA condB cor pval padj 1 Cond1 Cond2 0.9906504 4.267460e-09 5.120952e-08
Considering condA and CondB, the correlation between Cond1 and Cond2 is 0.9906504 with adjusted p-value = 5.120952e-08.
Thanks
Hi- Sorry I couldn't reply before. I guess by now you figured it out... Anyway... Yes, p$estimate and p$p.value come from the output of cor.test. And yes again your interpretation of the columns in dat.cor is correct.