You should first aim to merge your datasets. To do this, they'll require common colnames. Here's a reproducible example using random-generated data:
data1 <- matrix(rexp(200, rate=.1), ncol=20)
data2 <- matrix(rexp(400, rate=.1), ncol=20)
data3 <- matrix(rexp(100, rate=.1), ncol=20)
colnames(data1) <- paste("gene", 1:ncol(data1), sep="")
colnames(data2) <- paste("gene", 1:ncol(data2), sep="")
colnames(data3) <- paste("gene", 1:ncol(data3), sep="")
data1 <- data.frame(rep("Black", nrow(data1)), data1)
colnames(data1)[1] <- "ethnicity"
data2 <- data.frame(rep("Asian", nrow(data2)), data2)
colnames(data2)[1] <- "ethnicity"
data3 <- data.frame(rep("Hispanic", nrow(data3)), data3)
colnames(data3)[1] <- "ethnicity"
merge <- rbind(data1, data2, data3)
merge[,1:5]
ethnicity gene1 gene2 gene3 gene4
1 Black 1.16514388 12.30338069 3.82688868 11.36645599
2 Black 24.38089444 9.03542219 6.80817973 0.10700084
3 Black 17.43543444 12.92760579 35.47927735 5.04846676
...
9 Black 12.98195603 21.59992268 0.71956284 3.55260945
10 Black 16.53245930 3.64460684 18.28704018 13.95026225
11 Asian 6.50373870 1.76874482 41.11393353 22.17943616
12 Asian 6.58345454 13.25700767 2.04473476 8.48304042
13 Asian 22.83291833 0.06681459 2.58220889 7.63248332
...
27 Asian 7.97365692 7.67312685 4.41263392 10.72628836
28 Asian 1.87894546 1.47503110 9.30714773 16.05473038
29 Asian 9.10527813 11.70059070 26.77912256 2.67517565
30 Asian 2.11775040 0.62188180 15.13412334 7.83838828
31 Hispanic 27.56932722 1.56425509 9.30974219 6.38043914
32 Hispanic 3.40418082 19.97802677 8.17777047 2.53841623
33 Hispanic 31.19901835 5.39049237 0.43565871 6.44909321
34 Hispanic 5.12689485 11.20531531 17.32246742 5.86373778
35 Hispanic 12.60268618 9.75769548 12.34081183 27.50402493
Then, set the ethnicity variable as categorical:
merge$ethnicity <- factor(merge$ethnicity, levels=c("Asian","Black","Hispanic"))
merge$ethnicity
[1] Black Black Black Black Black Black Black Black
[9] Black Black Asian Asian Asian Asian Asian Asian
[17] Asian Asian Asian Asian Asian Asian Asian Asian
[25] Asian Asian Asian Asian Asian Asian Hispanic Hispanic
[33] Hispanic Hispanic Hispanic
Levels: Asian Black Hispanic
Then. we can do either parametric or non-parametric ANOVA:
parametric ANOVA (slow; simple for
loop):
baseformula <- " ~ ethnicity"
for (i in 2:ncol(merge)) {
formula <- paste(colnames(merge)[i], baseformula, sep="")
p <- summary(aov(as.formula(formula), data=merge))[[1]][["Pr(>F)"]][1]
print(paste(formula, ": p=", p, sep=""))
}
[1] "gene1 ~ ethnicity: p=0.622349829393588"
[1] "gene2 ~ ethnicity: p=0.63661393769293"
[1] "gene3 ~ ethnicity: p=0.948436180231233"
[1] "gene4 ~ ethnicity: p=0.875863416358478"
[1] "gene5 ~ ethnicity: p=0.306303439037517"
[1] "gene6 ~ ethnicity: p=0.622549894820414"
...
[1] "gene15 ~ ethnicity: p=0.848376444462962"
[1] "gene16 ~ ethnicity: p=0.802850599451517"
[1] "gene17 ~ ethnicity: p=0.756847943886486"
[1] "gene18 ~ ethnicity: p=0.431368157733262"
[1] "gene19 ~ ethnicity: p=0.780660298718924"
[1] "gene20 ~ ethnicity: p=0.819316198089084"
non-parmetric ANOVA (Kruskal-Wallis test) (slow; simple for
loop):
baseformula <- " ~ ethnicity"
for (i in 2:ncol(merge)) {
formula <- paste(colnames(merge)[i], baseformula, sep="")
p <- kruskal.test(as.formula(formula), data=merge)$p.value
print(paste(formula, ": p=", p, sep=""))
}
[1] "gene1 ~ ethnicity: p=0.329166862417036"
[1] "gene2 ~ ethnicity: p=0.761201522936653"
[1] "gene3 ~ ethnicity: p=0.812129687344178"
[1] "gene4 ~ ethnicity: p=0.553535953984283"
[1] "gene5 ~ ethnicity: p=0.692051268905425"
[1] "gene6 ~ ethnicity: p=0.77824469542416"
...
[1] "gene15 ~ ethnicity: p=0.729788874269058"
[1] "gene16 ~ ethnicity: p=0.894555285927434"
[1] "gene17 ~ ethnicity: p=0.759029765430655"
[1] "gene18 ~ ethnicity: p=0.178257916287501"
[1] "gene19 ~ ethnicity: p=0.634599044972864"
[1] "gene20 ~ ethnicity: p=0.852143788966208"
---------------------------------------------------------------
You have Batch
and Gender
in your data, too. If you want to adjust for those, you may instead consider multinomial logistic regression and including them as covariates, i.e., same as above but:
glm(ethnicity ~ gene1 + Batch + Gender, data=merge)
Kevin
Hello. This was the best tutorial on how to apply the command in the analysis of gene expression. It worked perfectly with my data. Please, if I want to store all the values shown by the print command, is it possible?
Take a look at
write.table()