How do i calculate the mean of triplicates in a data.frame based on pattern?
3
1
Entering edit mode
5 months ago
Assa Yeroslaviz ★ 1.9k

I have a data.frame with six samples in two triplicates. I would like to calculates for each group the mean of the this group into a new column. What is the best way to do this? My data looks like that

head(p767)
                 sgrna p767.AM_1 p767.AM_2 p767.AM_3 p767.MM_1 p767.MM_2 p767.MM_3
1  Control-104-KO-64-F       758       682       713      1867      1840      1251
2  Control-110-KO-29-F       674       561       642      1706      1737      1141
3  Control-112-KO-65-F       584       518       495      1328      1448      1012
4  Control-125-KO-31-F       565       462       569      1345      1258       856
5 Control-128-KO-136-F       463       452       424      1253      1258       721
6  Control-132-KO-57-F       854       698       755      1974      1977      1326

I would like to a new data.frame with the mean values calculated as such:

head(p767.m)
                     p767.AM p767.MM
Control-104-KO-64-F  717.6667 1652.667
Control-110-KO-29-F  625.6667 1528.000
Control-112-KO-65-F  532.3333 1262.667
Control-125-KO-31-F  532.0000 1153.000
Control-128-KO-136-F 446.3333 1077.333
Control-132-KO-57-F  769.0000 1759.000

Below is the way to read it into R. I would appreciate the help.

Thanks

structure(list(sgrna = c("Control-104-KO-64-F", "Control-110-KO-29-F", 
"Control-112-KO-65-F", "Control-125-KO-31-F", "Control-128-KO-136-F", 
"Control-132-KO-57-F"), p767.AM_1 = c(758L, 674L, 584L, 565L, 
463L, 854L), p767.AM_2 = c(682L, 561L, 518L, 462L, 452L, 698L
), p767.AM_3 = c(713L, 642L, 495L, 569L, 424L, 755L), p767.MM_1 = c(1867L, 
1706L, 1328L, 1345L, 1253L, 1974L), p767.MM_2 = c(1840L, 1737L, 
1448L, 1258L, 1258L, 1977L), p767.MM_3 = c(1251L, 1141L, 1012L, 
856L, 721L, 1326L)), row.names = c(NA, 6L), class = "data.frame")
tidyverse r • 1.0k views
ADD COMMENT
0
Entering edit mode

I know it can be done with something like that:

cbind(p767.AM = (rowMeans(p767[,2:4])),
                p767.MM = (rowMeans(p767[,5:7])))

but it would be nice to have a tidyer way of doing it.

ADD REPLY
4
Entering edit mode
5 months ago

I'm a happy user of data.table:

library(data.table)

dat <- structure(list(sgrna = c("Control-104-KO-64-F", "Control-110-KO-29-F", 
"Control-112-KO-65-F", "Control-125-KO-31-F", "Control-128-KO-136-F", 
"Control-132-KO-57-F"), p767.AM_1 = c(758L, 674L, 584L, 565L, 
463L, 854L), p767.AM_2 = c(682L, 561L, 518L, 462L, 452L, 698L
), p767.AM_3 = c(713L, 642L, 495L, 569L, 424L, 755L), p767.MM_1 = c(1867L, 
1706L, 1328L, 1345L, 1253L, 1974L), p767.MM_2 = c(1840L, 1737L, 
1448L, 1258L, 1258L, 1977L), p767.MM_3 = c(1251L, 1141L, 1012L, 
856L, 721L, 1326L)), row.names = c(NA, 6L), class = "data.frame")

setDT(dat)

ldat <- melt(dat, id.vars='sgrna', variable.name='library_id', value.name='count')
ldat[, sample_id := sub('_\\d+$', '', library_id)]

# Averages in long format:
cntAvg <- ldat[, list(avg=mean(count)), by=list(sgrna, sample_id)]

# If you want to get back to wide format
dcast(cntAvg, sgrna ~ sample_id)

A couple of notes:

  • It is almost always better to work with data in long format (or tidy or normalised)

  • I would prefer to get the sample names from a sample sheet rather than parsing the library names with some dodgy regex.

ADD COMMENT
1
Entering edit mode

I would prefer to get the sample names from a sample sheet rather than parsing the library names with some dodgy regex.

+1 !!!

Anecdote: I analyzed data last week where they indeed tried to put metadata into the filename, resulting in a 160 character string that needed deparsing (demessification). Please, people, unless it's a really short and simple encoding, please use sample/annotation sheets.

ADD REPLY
0
Entering edit mode

I would prefer to get the sample names from a sample sheet rather than parsing the library names with some dodgy regex.

Not really sure, what you mean here. I read the whole count matrix from a data file. Do you mean a separate file with a list of sample names( one name per row)?

ADD REPLY
0
Entering edit mode

Thanks to your help I managed to do it also in tidyverse, by using first pivot_longer creating a new column with group, calculating the mean by grouping the sgrna and group. Then using pivot_wider to convert it to the format i need removing all unnecessary rows.

melted_df <- pivot_longer(dat, cols = starts_with("p"), names_to = "project", values_to = "counts") |>
  mutate(group = gsub(pattern = "\\_[1|2|3]", replacement = "", x = project)) |>
  group_by(sgrna, group) |> mutate(avg = mean(counts)) |> 
  distinct(sgrna,group,avg) |> 
  pivot_wider(id_cols = sgrna, names_from = group, values_from = avg) |> as.data.frame()

thanks again

Still not sure, how you meant it, when you mentioned reading the sample names from a text file.

ADD REPLY
1
Entering edit mode
5 months ago
gglim ▴ 210
library(dplyr)
df %>% rowwise() %>% mutate(p767.AM = mean(c_across(contains("AM"))))

Maybe this is what you're looking for?

ADD COMMENT
0
Entering edit mode

I guess the OP doesn't want to hard-code p767.AM and "AM"?

ADD REPLY
0
Entering edit mode

I guess I can live with that. :-)

Can one create a vectors of patterns, e.g. c("AM, "MM") and use it as input to the mutate command?

ADD REPLY
1
Entering edit mode
meanofcol <- function(df, col) {
  df %>% rowwise() %>% mutate("Mean of {{col}}" := mean(c_across(contains({{col}}))))
}
meanofcol(df, col = "AM")

This seems good for me, reference here

ADD REPLY
0
Entering edit mode
5 months ago
MolGeek ▴ 80

Or just use rowMeans and provide the desired columns (for dataframes with low number of columns):

p767.m <- data.frame(p767.AM = rowMeans(p767[,1:3]) , p767.MM = rowMeans(p767[,4:6])   )
ADD COMMENT
1
Entering edit mode

Works here, but it's not generic or tidy since you have to manually enter the correct numbers and enter the colnames of the output.

ADD REPLY
0
Entering edit mode

You are right, thats a quick and dirty answer to get the means.

ADD REPLY
0
Entering edit mode

exactly, this would be needed to be monitored closely, as the structure changes often.

ADD REPLY

Login before adding your answer.

Traffic: 2549 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6