Question

How to run several one-way ANOVAs in R using on different categories?

6

Entering edit mode

5.7 years ago

Longshotx ▴ 70

Thanks for reading my post. I understand how to run non-parametric and parametric tests on three or more groups. But what I haven't figured out was a simple way of doing this multiple times from different categories I am interested in. In my dataset, I have gene copies for various antibiotic classes and I want to see if there are significant differences between samples in each antibiotic class. My data is currently in long format and looks like this:

Sample     Class      Value

  A       Macrolide  0.22 
  A       Macrolide  0.45
  A       Macrolide  0.63
  B       Macrolide  0.25
  B       Macrolide  0.28
 B       Macrolide  0.47
  C       Macrolide  0.22
 C       Macrolide  0.26
 C       Macrolide  0.29
  A       Ceph  0.32 
  A       Ceph  0.42
  A      Ceph  0.62
  B       Ceph  0.42
  B       Ceph  0.20
  B     Ceph  0.91
  C       Ceph  0.82
  C      MCeph  0.92

So essentially I want to do a one-way ANOVA for Macrolides and then another for Ceph etc etc. Can someone help me?

Thanks

R gene statistics anova • 15k views

ADD COMMENT • link updated 3.1 years ago by osiriska • 0 • written 5.7 years ago by Longshotx ▴ 70

score 10 · Answer 1 · 2019-07-24

10

Entering edit mode

5.6 years ago

Neilfws 49k

One approach is to nest the data in a data frame with a list column. You can then use purrr::map to run a function for each group, tidy the results and unnest.

Using the same df1 as in the answer from @zx8754:

library(dplyr)
library(tidyr)
library(purrr)

df1 %>% 
  nest(-Class) %>% 
  mutate(model = map(data, ~anova(lm(Value ~ Sample, .))), 
         tidy = map(model, tidy)) %>% 
select(Class, td) %>% 
unnest()

Result:

# A tibble: 6 x 7
  Class     term         df  sumsq meansq statistic p.value
  <chr>     <chr>     <int>  <dbl>  <dbl>     <dbl>   <dbl>
1 Macrolide Sample        2 0.0471 0.0235     1.22    0.358
2 Macrolide Residuals     6 0.115  0.0192    NA      NA    
3 Ceph      Sample        2 0.103  0.0515     0.662   0.564
4 Ceph      Residuals     4 0.311  0.0777    NA      NA    
5 MCeph     Sample        2 0.161  0.0804     0.447   0.727
6 MCeph     Residuals     1 0.18   0.18      NA      NA

More details: Running a model on separate groups.

ADD COMMENT • link 5.6 years ago by Neilfws 49k

1

Entering edit mode

I did some editions to your code for avoiding error or warning messages:

df1 %>% 
  nest(data = c(Sample, Value)) %>% 
  mutate(model = map(data, ~anova(lm(Value ~ Sample, .))), 
         tidy = map(model, broom::tidy)) %>% 
  select(Class, tidy) %>% 
  unnest(tidy)

ADD REPLY • link 5.0 years ago by edwardsmolina ▴ 10

0

Entering edit mode

Thanks this was extremely helpful!

ADD REPLY • link 5.5 years ago by Longshotx ▴ 70

0

Entering edit mode

Hi infenit101,

A small educational note: if an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted. (and you can accept multiple answers if need-be)

Upvote|Bookmark|Accept

ADD REPLY • link 5.5 years ago by lieven.sterck 15k

0

Entering edit mode

Thank you for your help. May I ask you how to specifically run Type III anova for this multiple cathegories?

ADD REPLY • link 3.1 years ago by osiriska • 0

score 4 · Answer 2 · 2019-06-05

4

Entering edit mode

5.7 years ago

zx8754 12k

Split data by class, then run your favourite function, see example:

# example data
df1 <- read.table(text = "Sample    Class   Value
A   Macrolide   0.22
A   Macrolide   0.45
A   Macrolide   0.63
B   Macrolide   0.25
B   Macrolide   0.28
B   Macrolide   0.47
C   Macrolide   0.22
C   Macrolide   0.26
C   Macrolide   0.29
A   Ceph    0.32
A   Ceph    0.42
A   Ceph    0.62
B   Ceph    0.42
B   Ceph    0.2
B   Ceph    0.91
C   Ceph    0.82
C   MCeph   0.92
A   MCeph   0.2
B   MCeph   0.72
C   MCeph   0.32
", header = TRUE, stringsAsFactors = FALSE)

# split and apply function, result is a named list:
lapply(split(df1, df1$Class), function(i){
  anova(lm(Value ~ Sample, data = i))
})

$Ceph
Analysis of Variance Table

Response: Value
          Df  Sum Sq  Mean Sq F value Pr(>F)
Sample     2 0.10293 0.051467  0.6622 0.5644
Residuals  4 0.31087 0.077717               

$Macrolide
Analysis of Variance Table

Response: Value
          Df   Sum Sq  Mean Sq F value Pr(>F)
Sample     2 0.047089 0.023544  1.2241 0.3582
Residuals  6 0.115400 0.019233               

$MCeph
Analysis of Variance Table

Response: Value
          Df Sum Sq Mean Sq F value Pr(>F)
Sample     2 0.1608  0.0804  0.4467 0.7268
Residuals  1 0.1800  0.1800

ADD COMMENT • link 5.7 years ago by zx8754 12k

0

Entering edit mode

Hello I was looking to do a similar analysis for my data so THANK you for this information it was very useful to me as well. However, I do have a question and I realize this is an old post so I hope you are still part of this community. What R coding would you use to run the 4 diagnostic plots to check your assumptions for each of these Anovas?

ADD REPLY • link 4.9 years ago by lisa.sims • 0

0

Entering edit mode

Please ask as a new question

ADD REPLY • link 4.9 years ago by zx8754 12k

0

Entering edit mode

lisa. sims: did you figure this out? I have the same issue

ADD REPLY • link 4.8 years ago by nicole • 0

0

Entering edit mode

Hi! How can I find Tukey's Honestly significant p-value for each class here? It will be a great help.

ADD REPLY • link 4.8 years ago by tahsinferdousuofc ▴ 10

score 3 · Answer 3 · 2019-06-04

3

Entering edit mode

5.7 years ago

manuel.belmadani ★ 1.4k

Couldn't you just subset your dataframe for Class. i.e.

> library(lmPerm)
> data.df = read.csv("data.txt")

> aov.macro <- lmPerm::aovp(Value ~ Sample, subset(data.df,data.df$Class == "Macrolide")  )
[1] "Settings:  unique SS "
> summary(aov.macro)
Component 1 :
            Df R Sum Sq R Mean Sq Pr(Exact)
Sample       2 0.047089  0.023544         1
Residuals    6 0.115400  0.019233          


> ceph.macro <- lmPerm::aovp(Value ~ Sample, subset(data.df,data.df$Class == "Ceph")  )
[1] "Settings:  unique SS "
> summary(ceph.macro)
Component 1 :
            Df R Sum Sq R Mean Sq Pr(Exact)
Sample       2  0.10293  0.051467         1
Residuals    4  0.31087  0.077717          

> mceph.macro <- lmPerm::aovp(Value ~ Sample, subset(data.df,data.df$Class == "MCeph")  )
Error in ctrfn(levels(x), contrasts = contrasts) : 
  not enough degrees of freedom to define contrasts

Not sure if that was intentional but you have one class labelled as MCeph. Seems like a typo, or the data is truncated? If not, you don't have enough degrees of freedom to do an anova on that class (you'd need at least one of each Sample for the function to even work).

ADD COMMENT • link 5.7 years ago by manuel.belmadani ★ 1.4k

1

Entering edit mode

a small comment, you don't need to write lmPerm::aovp since you already loaded the library

ADD REPLY • link 5.7 years ago by H.Hasani ▴ 990

0

Entering edit mode

Thanks so much! Yes that was a typo.

Can I use the same subsetted dataframe for TukeyHSD?

ADD REPLY • link 5.7 years ago by Longshotx ▴ 70

0

Entering edit mode

Most likely yes. I'm not familiar with the function but it looking at some examples, it looks like it needs takes an input of an anova object (generated by aov) so the lmPerm::aovp method might not work. Try it, and if it doesn't work use aov like in here (with subsetted dataframes):

https://www.rdocumentation.org/packages/stats/versions/3.6.0/topics/TukeyHSD

ADD REPLY • link 5.7 years ago by manuel.belmadani ★ 1.4k