I've a dataset for measurement of protein marker abundance for two groups of individuals, Treatment and Control. Within these two groups there are several subjects and for each subject there are three replicates.
The number of subjects in these Treatment and Control groups are unequal with 4 individuals for treatment and 10 for Control. So overall,
:Control and Treatment groups(Denoted by 1 and 2 respectively)
:10 individuals in Control(A to J), 4 individuals in Treatment(K to N)
:3 replicates for each indiv. in Control, 3 replicates for each indiv. in Treatment
I would like to perform a nested ANOVA on this data however I'm new to this kind of data and would like to know the followings:
Is the 'Replicate' column is redundant here and can I simply drop/ignore it?
Can I use the following ANOVA formula in R? Is there anything special I need to take care of because of unequal numbers of control and treated individuals?
protein.aov <- aov(abundance ˜ treatgroup + Error(treatgroup %in% individual), data=protein.df)
Can I do a post hoc pairwise comparison like a TukeyHSD ?
The data is as Follows:
treatgroup individual replicate abundance
1 A R1 -0.709258936
1 A R2 -0.54767131
1 A R3 -0.907607661
1 B R1 3.729646649
1 B R2 -0.650402382
1 B R3 0.884222978
1 C R1 -4.443417184
1 C R2 3.709624162
1 C R3 3.076829126
1 D R1 2.109771383
1 D R2 4.950350294
1 D R3 -1.162741304
1 E R1 0.799105402
1 E R2 2.929226412
1 E R3 2.95692962
1 F R1 2.011646397
1 F R2 -3.4757793
1 F R3 4.615843439
1 G R1 7.324129703
1 G R2 -6.56365647
1 G R3 5.848340873
1 H R1 7.375089916
1 H R2 -0.709544581
1 H R3 -1.715528803
1 I R1 6.70860394
1 I R2 -4.325520039
1 I R3 7.623999717
1 J R1 1.959268861
1 J R2 3.794791979
1 J R3 -0.443267523
2 K R1 4.489545974
2 K R2 -0.93677524
2 K R3 8.030252255
2 L R1 -0.133320899
2 L R2 3.802649555
2 L R3 1.118932954
2 M R1 2.054437925
2 M R2 -3.872643548
2 M R3 5.695342112
2 N R1 -4.913796298
2 N R2 4.647048982
2 N R3 6.729868259
(a) You are probably more likely to get a helpful answer on purely-stats questions like this on stats.se (b) Your description doesn't sound like a nested design (unless you are interested in estimating a variance component for you measuring process), so you might just average your technical replicates and do a t-test