0

so I have a CSV file that I imported into R studio. I ran my analysis and then was informed that there was a mistake with the data collection. Every 4 rows actually represents 1 month. But what I need to do is leave first row alone and combine (aggregate) the next 3 rows into one. The head of my data looks like this:

             X2   X3   X4   X5   X6   X7
1          3830    0    0 1184    0 5765
2             0    0  153    0  153  153
3             0    0    0   73   73    0
4             0    0  153    0  153    0
5          3815    0 3354  651  903 5174
6             0    0  138  101  273  558
7             0    0  322    0  272  322
8             0    0    0  101  273  420
9             0    0 3682 3302   83 6485
10          165    0   63    0  293  165
11            0    0   98   98    0    0
12            0    0    0    0  230    0

So basically I want to leave row 1 alone and combine rows 2,3,and 4 combined into one new row. Leave row 5 alone, combine 6,7, and 8 into another etc. So lets say for column 76, I would want a new row that says 5765 for row 1 then 13 for sum row 2 through 4. What is the easiest way to go about doing this in R?

Phil
  • 4,424
  • 3
  • 22
  • 55
Josh_PL
  • 25
  • 5

2 Answers2

2

Try this:

library(tidyverse)
#Create index
index <- seq(1,dim(df)[1],by=4)
num <- 1:length(index)
#Assign
df$var <- NA
df$var2 <- NA
df$var[index]<-num
df %>% fill(var) -> df
df$var2[index]<-rep(1,length(index))
df$var2 <- ifelse(is.na(df$var2),2,df$var2)
#Mutate aggregations
df %>% group_by(var,var2) %>% summarise_all(sum,na.rm=T)

# A tibble: 6 x 8
# Groups:   var [3]
    var  var2    X2    X3    X4    X5    X6    X7
  <int> <dbl> <int> <int> <int> <int> <int> <int>
1     1     1  3830     0     0  1184     0  5765
2     1     2     0     0   306    73   379   153
3     2     1  3815     0  3354   651   903  5174
4     2     2     0     0   460   202   818  1300
5     3     1     0     0  3682  3302    83  6485
6     3     2   165     0   161    98   523   165
Duck
  • 37,428
  • 12
  • 34
  • 70
2

Here's a slight variation of my answer to your previous question that uses a custom lambda function.

In Tidyverse non-standard evaluation, the ~ operator is like function(...). The . symbol later represents the first argument. Thus .[1] represents the first element, and .[2:4] represents the next 3. So when our lambda function is applied to the vector of each column, we can use rbind to return a 1 column, 2 row matrix. summarize helpfully merges the results of all the columns into two rows for each group.

library(dplyr)
data %>%
  mutate(Aggregate = rep(seq(1,ceiling(nrow(data)/4)),each = 4)) %>%
  group_by(Aggregate) %>%
  summarise(across(everything(), ~rbind(.[1], sum(.[2:4]))))
## A tibble: 6 x 7
## Groups:   Aggregate [3]
#  Aggregate X2[,1] X3[,1] X4[,1] X5[,1] X6[,1] X7[,1]
#      <int>  <int>  <int>  <int>  <int>  <int>  <int>
#1         1   3830      0      0   1184      0   5765
#2         1      0      0    306     73    379    153
#3         2   3815      0   3354    651    903   5174
#4         2      0      0    460    202    818   1300
#5         3      0      0   3682   3302     83   6485
#6         3    165      0    161     98    523    165
Ian Campbell
  • 19,690
  • 9
  • 20
  • 44
  • Thank you! You're amazing! Just so I understand how this is working, the mutate and aggregate breaks up the data into blocks of four, and the rbind .[1] leaves that row unchanged. While the sum function sums rows 2:4 in each individual block correct? – Josh_PL Jul 14 '20 at 14:59
  • Thanks man! I just wanted to make sure that I was understanding it correctly. You're a lifesaver! – Josh_PL Jul 14 '20 at 15:07