dplyr lead/lag with group_by

Question

Do not understand why I the lead and lag functions ignore the group by. Here's a simple example (in reality I need to group by 5 columns).

# Dummy DataSet
df <- data.frame(group = c("a","a","a","a", "a", "b", "b", "b", "b", "b"),
                 order = c(3, 4, 2, 5, 1, 1, 3, 4, 2, 4),
                 value = c(15, 22, 43, 31, 25, 11, 37, 24, 18, 9))    

"group" "order" "value"
"a" 3   15
"a" 4   22
"a" 2   43
"a" 5   31
"a" 1   25
"b" 1   11
"b" 3   37
"b" 4   24
"b" 2   18
"b" 4   9

Tried this but even the order by doesn't work here

df %>%
    group_by(group) %>%
    mutate(previous = dplyr::lag(value, n=1, default=NA, order_by = order))

Then tried to arrange beforehand.

df %>% 
    arrange(group, order) %>%
    group_by(group) %>% 
    mutate(previous = dplyr::lag(value, n=1, default=NA))

"group" "order" "value" "previous"
"a" 1   25  NA
"a" 2   43  25
"a" 3   15  43
"a" 4   22  15
"a" 5   31  22
"b" 1   11  31
"b" 2   18  11
"b" 3   37  18
"b" 4   24  37
"b" 4   9   24

Which fixes the sorting but is still ignoring the group by as b 1 should be NA not 31. Am I missing something obvious or can lag/lead and group_by not be combined like this?

It would work in SQL with

LAG(value, 1, NULL) OVER (PARTITION BY group ORDER BY order)

Apologies if formatting is poor, not posted code questions before.

Oh, as simple as that, spent ages trying different things aha. Restarted R and updated dplyr from 0.7.4 to 0.7.6. Thanks Both — Quixotic22, Aug 29 '18 at 18:03
Actually, the issue persists in `plyr`. So if for some reason you import both `dplyr` and `plyr`, import `dplyr` last. Otherwise `plyr` will overwrite `dplyr`'s `lag` function leading to the issue described here. — 0range, Mar 24 '19 at 04:31
Does this answer your question? [dplyr: lead() and lag() wrong when used with group\_by()](https://stackoverflow.com/questions/28235074/dplyr-lead-and-lag-wrong-when-used-with-group-by) — Sherman, Apr 14 '21 at 16:33

dplyr lead/lag with group_by

0 Answers0