-2

I have a huge dataset of 1600 rows and 13 columns, every three rows present data of a single person, while the columns are characteristics of this person. I want to get the median of all columns for every person (3 rows).

For example, from this:

   C1 C2 C3 C4 C5...
R1  1  2  3  4  5
R2  6  7  8  9  1
R3  2  3  4  5  6  
R4  7  6  9  8  3
R5  3  4  6  2  5
R6  9  7  4  3  2
R7  2  5  9  7  6
R8  3  5  8  3  2
R9  9  4  6  5  8

I would like to get the following:

   C1 C2 C3 C4 C5  
R1  2  3  4  5  5  
R2  7  6  6  3  3  
R3  3  5  8  5  6  

. . .

Ronak Shah
  • 286,338
  • 16
  • 97
  • 143

3 Answers3

0

In R, we create a grouping variable using %/% and then get the median of all the columns with summarise_each

 library(dplyr)
 df1 %>% 
    group_by(grp = ((row_number()-1)%/%3)+1) %>%
    summarise_each(funs(median))
 #   grp    C1    C2    C3    C4    C5
 #  <dbl> <int> <int> <int> <int> <int>
 #1     1     2     3     4     5     5
 #2     2     7     6     6     3     3
 #3     3     3     5     8     5     6

Or using data.table

 library(data.table)
 setDT(df1)[, lapply(.SD, median) , .(grp =gl(nrow(df1), 3, nrow(df1)))]
#   grp C1 C2 C3 C4 C5
#1:  1  2  3  4  5  5
#2:  2  7  6  6  3  3
#3:  3  3  5  8  5  6

Or with aggregate from base R

aggregate(.~grp, transform(df1, grp = ((1:nrow(df1) -1) %/%3) + 1), median)
#  grp C1 C2 C3 C4 C5
#1   1  2  3  4  5  5
#2   2  7  6  6  3  3
#3   3  3  5  8  5  6
akrun
  • 674,427
  • 24
  • 381
  • 486
0

You can turn your data into an array and use apply:

DF <- read.table(text = "   C1 C2 C3 C4 C5
                 R1  1  2  3  4  5
                 R2  6  7  8  9  1
                 R3  2  3  4  5  6  
                 R4  7  6  9  8  3
                 R5  3  4  6  2  5
                 R6  9  7  4  3  2
                 R7  2  5  9  7  6
                 R8  3  5  8  3  2
                 R9  9  4  6  5  8", header = TRUE)

M <- as.matrix(DF)
dim(M) <- c(3, nrow(M)/3, ncol(M))
apply(M, c(2, 3), median)
#     [,1] [,2] [,3] [,4] [,5]
#[1,]    2    3    4    5    5
#[2,]    7    6    6    3    3
#[3,]    3    5    8    5    6
Roland
  • 117,893
  • 9
  • 163
  • 255
0

Here's my attempt. It evaluates the median value for every three sublists. The median is taken doing the sum of the elements taken from a zip of the unpacked sublist.

t =[[1, 2, 3, 4, 5], [6, 7, 8, 9, 1], ......

median = lambda subl: list(map(lambda n: sum(n)/len(subl), zip(*subl)))

for i in range(0, len(t), 3):
    print(median(t[i:i+3]))