Question

summarize consecutive columns over a data frame

0

Entering edit mode

19 months ago

Assa Yeroslaviz ★ 1.9k

I am looking for a way to summarise consecutive columns in a data frame, meaning summarise col2 + col1, col3+col2+col1, col4+col3+col2+col1, etc.

the output of each new column should be the sum of all previous columns.

I'm trying to combine the across command from tidyverse, but I can't get it right.

any help would be appreciated.

thanks

Assa

tidyverse r offtopic • 1.5k views

ADD COMMENT • link updated 19 months ago by Ram 45k • written 19 months ago by Assa Yeroslaviz ★ 1.9k

1

Entering edit mode

This post is more suitable at SO.

ADD REPLY • link 19 months ago by zx8754 12k

0

Entering edit mode

This post does not fit the theme of this forum.

ADD REPLY • link 19 months ago by Ram 45k

score 1 · Answer 1 · 2023-09-11

1

Entering edit mode

19 months ago

ATpoint 87k

Define starting vector i <- c(1,2) and then use a loop that in each iteration adds 1 to the largest element if the vector and appends it to the vector. Use colSums or whatever makes sense for the summarization after subsetting df[,i,drop=FALSE]. An experienced user like you will for sure get this going.

Edit: Here is one way to do this with a two-liner using base R:

# example data
ncols <- 6
nrows <- 3
set.seed(1)
data <- data.frame(matrix(rnorm(ncols*nrows), ncol=ncols))

# actual code
iter <- lapply(2:ncols, function(x) 1:x)
lapply(iter, function(x) rowSums(data[,x,drop=FALSE]))

ADD COMMENT • link 19 months ago by ATpoint 87k

0

Entering edit mode

Well, i have sort of a solution, but I find it be very "clunky" (dirty).

this is my solutiuon

test.df <- read.delim2(file = "test.df.txt")
i <- c(2:3)
tmp <- data.frame()
tmp2<- list()
tmp2[[1]]<- test.df[,1]
for (col_i in i) {
  print(col_i)
  tmp <- apply(test.df[1:col_i], 1, sum)
  tmp2[[col_i]] <- tmp

}

But I was hoping to find something more elegant and slick, if possible.

ADD REPLY • link 19 months ago by Assa Yeroslaviz ★ 1.9k

0

Entering edit mode

There are many ways of how this can be done.

# example data
ncols <- 6
nrows <- 3
set.seed(1)
data <- data.frame(matrix(rnorm(ncols*nrows), ncol=ncols))

iter <- lapply(2:ncols, function(x) 1:x)
lapply(iter, function(x) rowSums(data[,x,drop=FALSE]))

ADD REPLY • link 19 months ago by ATpoint 87k

Ram · Answer 2 · 2023-09-11

You can do like this-

df <- as.data.frame(matrix(round(runif(n=10, min=1, max=20), 0), nrow=5, ncol=4))
df
V1 V2 V3 V4
1 14  5 14  5
2 20  6 20  6
3 15 17 15 17
4 12 14 12 14
5 17  6 17  6

result=cbind(
  df,
  setNames(
    as.data.frame(Reduce(`+`, df, accumulate = TRUE)),
    Reduce(paste0, names(df), accumulate = TRUE)
  )
)
result
   V1 V2 V3 V4 V1 V1V2 V1V2V3 V1V2V3V4
1  2 15  2 15  2   17     19       34
2  4  1  4  1  4    5      9       10
3  5  8  5  8  5   13     18       26
4 10 11 10 11 10   21     31       42
5  5  1  5  1  5    6     11       12