Arrange long-format dataframe according to multiple columns and taking into account changing sorting directions

Question

I would like to re-arrange my data according to a score variable value and an additional group variable. However, depending on the group the sorting should either be descending or ascending. The groups are consisting of test scores (higher is better) and processing time (lower is better).

df <- data.frame(id = rep(1:4, 4),
                 value = rnorm(16, 5), 
                 group = c(paste0("test", 1:3), "time0"))
df$value[seq(4,16, 4)] <- 1:4

> df %>% group_by(group) %>% arrange(group, desc(value))
# A tibble: 16 x 3
# Groups:   group [4]
      id value group
   <int> <dbl> <fct>
 1     3  6.06 test1
 2     4  4.69 test1
 3     1  4.32 test1
 4     2  3.56 test1
 5     4  5.96 test2
 6     1  5.96 test2
 7     3  4.43 test2
 8     2  3.86 test2
 9     3  6.28 test3
10     4  5.55 test3
11     2  4.59 test3
12     1  3.53 test3
13     4  4    time0
14     3  3    time0
15     2  2    time0
16     1  1    time0

The desired output looks like this:

      id value group
   <int> <dbl> <fct>
 1     3  6.06 test1
 2     4  4.69 test1
 3     1  4.32 test1
 4     2  3.56 test1
 5     4  5.96 test2
 6     1  5.96 test2
 7     3  4.43 test2
 8     2  3.86 test2
 9     3  6.28 test3
10     4  5.55 test3
11     2  4.59 test3
12     1  3.53 test3
13     4  1    time0
14     3  2    time0
15     2  3    time0
16     1  4    time0

I tried using arrange_if but could not figure it out. Any help is much appreciated.

Thanks for the answers so far, they are equally helpful!

EDIT for Clarification: This is different from this question since the sorting is not only based on multiple columns but also dependend on within-column characteristics.

IceCreamToucan · Accepted Answer · 2019-09-05T15:25:09.440

3

This makes the rows within test groups sort descending and the rows within time groups sort ascending. If you want the inverse, just reverse the -1 and 1.

df %>% 
  arrange(group, value*ifelse(grepl('time', group), 1, -1))

#    id    value group
# 1   1 6.358680 test1
# 2   1 6.100025 test1
# 3   1 4.844204 test1
# 4   1 3.622940 test1
# 5   2 5.763176 test2
# 6   2 4.897212 test2
# 7   2 4.585005 test2
# 8   2 3.529248 test2
# 9   3 5.387672 test3
# 10  3 4.835476 test3
# 11  3 4.605710 test3
# 12  3 4.521850 test3
# 13  4 1.000000 time0
# 14  4 2.000000 time0
# 15  4 3.000000 time0
# 16  4 4.000000 time0

Here is another option which works when value is character

df <- data.frame(id = rep(1:4, 4),
                 value = rnorm(16, 5), 
                 group = c(paste0("test", 1:3), "time0"))
set.seed(2019)
df$value <- sample(letters, nrow(df), T)

df %>% 
  arrange(group, rank(value)*ifelse(grepl('time', group), 1, -1))
#    id value group
# 1   1     u test1
# 2   1     f test1
# 3   1     c test1
# 4   1     b test1
# 5   2     s test2
# 6   2     p test2
# 7   2     f test2
# 8   2     b test2
# 9   3     v test3
# 10  3     u test3
# 11  3     s test3
# 12  3     h test3
# 13  4     a time0
# 14  4     q time0
# 15  4     q time0
# 16  4     r time0

edited Sep 05 '19 at 15:25

answered Sep 05 '19 at 14:27

IceCreamToucan

23,575
2
13
25

Do you think this will also work when there are 2 positive and 2 negative values for `time` ? – Ronak Shah Sep 05 '19 at 14:38
Yes. Of course, it will not sort them based on absolute value, but it will sort them ascending or descending, according to the group, properly. – IceCreamToucan Sep 05 '19 at 14:39
I am still trying to figure out how this works, but it does work. – Ronak Shah Sep 05 '19 at 14:43
1

For groups where the `ifelse` returns `-1`, this sorts by `value*(-1)` = `-value`, i.e. sorts by value descending (sorting by -x is equivalent to sorting by x descending). For groups where `ifelse` returns `1`, this sorts by `value*1` = `value` i.e. sorts by value ascending. – IceCreamToucan Sep 05 '19 at 14:44
Thank you very much! This solution is answering the question perfectly. Although, one potential issue might arise when `value` contains non-numerical data. – Ben Nutzer Sep 05 '19 at 15:21
1

Added a method which works for non-numeric `value` field – IceCreamToucan Sep 05 '19 at 15:25

akrun · Answer 2 · 2019-09-05T15:28:15.427

We can do a filter to exclude the 'time0' group, do the arrange on the rest of the dataset and bind_rows with the other set of group

library(dplyr)
df %>% 
   filter(group != 'time0') %>%
   arrange(group, desc(value)) %>%
   bind_rows(., df %>% 
                  filter(group == 'time0') %>% 
                  arrange(value))
#   id value group
#1   3  6.06 test1
#2   4  4.69 test1
#3   1  4.32 test1
#4   2  3.56 test1
#5   4  5.96 test2
#6   1  5.96 test2
#7   3  4.43 test2
#8   2  3.86 test2
#9   3  6.28 test3
#10  4  5.55 test3
#11  2  4.59 test3
#12  1  3.53 test3
#13  1  1.00 time0
#14  2  2.00 time0
#15  3  3.00 time0
#16  4  4.00 time0

Also, if the 'value' can be non-numeric'

df %>%
    arrange(group, desc(as.numeric(value)), is.na(as.numeric(value)))

data

df <- structure(list(id = c(3L, 4L, 1L, 2L, 4L, 1L, 3L, 2L, 3L, 4L, 
2L, 1L, 4L, 3L, 2L, 1L), value = c(6.06, 4.69, 4.32, 3.56, 5.96, 
5.96, 4.43, 3.86, 6.28, 5.55, 4.59, 3.53, 4, 3, 2, 1), group = c("test1", 
"test1", "test1", "test1", "test2", "test2", "test2", "test2", 
"test3", "test3", "test3", "test3", "time0", "time0", "time0", 
"time0")), class = "data.frame", row.names = c("1", "2", "3", 
"4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", 
"16"))

Thank you so much! This is neat solution and answers my question completely. Additionally, it will also work in the case when `df$value[seq(4,16, 4)] — Ben Nutzer, Sep 05 '19 at 15:22

Arrange long-format dataframe according to multiple columns and taking into account changing sorting directions

2 Answers2

data