Do not understand why I the lead and lag functions ignore the group by. Here's a simple example (in reality I need to group by 5 columns).
# Dummy DataSet
df <- data.frame(group = c("a","a","a","a", "a", "b", "b", "b", "b", "b"),
order = c(3, 4, 2, 5, 1, 1, 3, 4, 2, 4),
value = c(15, 22, 43, 31, 25, 11, 37, 24, 18, 9))
"group" "order" "value"
"a" 3 15
"a" 4 22
"a" 2 43
"a" 5 31
"a" 1 25
"b" 1 11
"b" 3 37
"b" 4 24
"b" 2 18
"b" 4 9
Tried this but even the order by doesn't work here
df %>%
group_by(group) %>%
mutate(previous = dplyr::lag(value, n=1, default=NA, order_by = order))
Then tried to arrange beforehand.
df %>%
arrange(group, order) %>%
group_by(group) %>%
mutate(previous = dplyr::lag(value, n=1, default=NA))
"group" "order" "value" "previous"
"a" 1 25 NA
"a" 2 43 25
"a" 3 15 43
"a" 4 22 15
"a" 5 31 22
"b" 1 11 31
"b" 2 18 11
"b" 3 37 18
"b" 4 24 37
"b" 4 9 24
Which fixes the sorting but is still ignoring the group by as b 1 should be NA not 31. Am I missing something obvious or can lag/lead and group_by not be combined like this?
It would work in SQL with
LAG(value, 1, NULL) OVER (PARTITION BY group ORDER BY order)
Apologies if formatting is poor, not posted code questions before.