Group by column, summarise other columns, mean
3
0
Entering edit mode
5.3 years ago
bgraphit ▴ 20

Hi I need some advise as to how to go from TEST_play data frame to a data frame that includes individual name, and the average of the counts.1 counts.2 for that Peak.

Example:

 >head(TEST_play)
     name counts.1 counts.2
1 Peak160       97      487
2 Peak160      425      371
3 Peak328        0      104
4 Peak328       13       20
5 Peak344        2       39
6 Peak344        7       63

Desired output

>head(average_TEST_play)
     name counts.1 counts.2
1 Peak160       261    429   
2 Peak328        6.5      62
etc,,,,

,

> sapply(TEST_play, class)
     name  counts.1  counts.2
 "factor" "numeric" "numeric"
R • 1.1k views
ADD COMMENT
0
Entering edit mode

What have you tried? Look at dplyr's group_by() and summarise() functions. There are ways to do it in base R too, but this might be easier.

ADD REPLY
2
Entering edit mode
5.3 years ago
Ram 44k

Here's a base R way to do this:

dummy_df<-read.table(text='"name"   "counts.1"  "counts.2"
+ "Peak160" 97  487
+ "Peak160" 425 371
+ "Peak328" 0   104
+ "Peak328" 13  20
+ "Peak344" 2   39
+ "Peak344" 7   63', sep="\t", header=TRUE)

aggregate(cbind(counts.1, counts.2) ~ name, data=dummy_df, FUN = mean)
     name counts.1 counts.2
1 Peak160    261.0      429
2 Peak328      6.5       62
3 Peak344      4.5       51

Using dplyr, that'd be:

library(dplyr)
dummy_df %>% group_by(name) %>% summarise(counts.1 = mean(counts.1), counts2 = mean(counts.2))

# A tibble: 3 x 3
  name    counts.1 counts2
  <fct>      <dbl>   <dbl>
1 Peak160    261       429
2 Peak328      6.5      62
3 Peak344      4.5      51
ADD COMMENT
2
Entering edit mode
> library(dplyr)
> test %>%
+   group_by(name) %>%
+   summarise_all(mean)
# A tibble: 3 x 3
  name    counts.1 counts.2
  <chr>      <dbl>    <dbl>
1 Peak160    261        429
2 Peak328      6.5       62
3 Peak344      4.5       51
ADD REPLY
0
Entering edit mode

Thank you, TIL summarise_all.

ADD REPLY
1
Entering edit mode
5.3 years ago

with package doBy:

> test=read.csv("test.txt", sep = "\t", stringsAsFactors = F, header = T)
> test
             name counts.1 counts.2
        1 Peak160       97      487
        2 Peak160      425      371
        3 Peak328        0      104
        4 Peak328       13       20
        5 Peak344        2       39
        6 Peak344        7       63
> library(doBy)
> summaryBy(test[,-1] ~ name, test, FUN = mean, keep.names = T)
     name counts.1 counts.2
1 Peak160    261.0      429
2 Peak328      6.5       62
3 Peak344      4.5       51
ADD COMMENT

Login before adding your answer.

Traffic: 1521 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6