Here is a way using data.table
library(data.table)
setDT(d)
d[, out := pmin(cumsum(is.na(x)), rev(cumsum(is.na(x)))), by = rleid(is.na(x))]
d
# x y out
# 1: 0 0 0
# 2: NA 1 1
# 3: 0 0 0
# 4: NA 1 1
# 5: NA 2 2
# 6: NA 1 1
# 7: 0 0 0
# 8: NA 1 1
# 9: NA 2 2
#10: NA 3 3
#11: NA 2 2
#12: NA 1 1
#13: 0 0 0
For each group of NA
s we calculation the parallel minimum of cumsum(is.na(x))
and its reverse. That works because the values in the groups of all non-NA
s will be 0
. Call setDF(d)
if you want to continue with a data.frame
.
Instead of calculating cumsum(is.na(x))
twice, we could also do
d[, out := {
tmp <- cumsum(is.na(x))
pmin(tmp, rev(tmp))
}, by = rleid(is.na(x))]
This might give a performance gain, but I didn't test.
Using dplyr
syntax this would read
library(dplyr)
d %>%
group_by(grp = data.table::rleid(is.na(x))) %>%
mutate(out = pmin(cumsum(is.na(x)), rev(cumsum(is.na(x))))) %>%
ungroup()
# A tibble: 13 x 4
# x y grp out
# <dbl> <dbl> <int> <int>
# 1 0 0 1 0
# 2 NA 1 2 1
# 3 0 0 3 0
# 4 NA 1 4 1
# 5 NA 2 4 2
# 6 NA 1 4 1
# 7 0 0 5 0
# 8 NA 1 6 1
# 9 NA 2 6 2
#10 NA 3 6 3
#11 NA 2 6 2
#12 NA 1 6 1
#13 0 0 7 0
The same idea in base R
rle_x <- rle(is.na(d$x))
grp <- rep(seq_along(rle_x$lengths), times = rle_x$lengths)
transform(d, out = ave(is.na(x), grp, FUN = function(i) pmin(cumsum(i), rev(cumsum(i)))))