0

I'm able to find a lot of info on SO regarding handling subsetting dates to a certain weekday (e.g. Get Dates of a Certain Weekday from a Year in R). However, I am unable to find any that implement a fallback logic that I'd like. Specifically, if a given weekday does not exist in a given week, I'd like to grab the next available date, excluding Saturday and Sunday.

For example, from a vector of dates, I want to select all dates corresponding to Thursdays. However, in weeks where Thursdays are missing, I should instead pick the date of the next working day. In the example below, this is the following day, the Friday.

library(lubridate)

# Create some dates
dates <- seq.Date(as.Date("2017-11-16"), as.Date("2017-11-24"), by = 1)

# Remove Thursday, November 23
dates <- dates[dates != as.Date("2017-11-23")]

# Get all Thursdays in dates
dates[wday(dates) == 5]
# [1] "2017-11-16"

# Desired Output:
# Because Thursday 2017-11-23 is missing in a week,
# we roll over and select Friday 2017-11-24 instead  
# [1] "2017-11-16" "2017-11-24"

Note 1: For a given week where Thursday is missing and Friday is missing as well, I'd want to roll over to Monday. Essentially, for weeks where a Thursday is not found, grab the next date among the available dates.

Note 2: I'd like to accomplish this without any external dependencies other than common R packages such as lubridate, etc. (e.g. no dependency on a c++ library).

I'm confident I could write something to do what I want, but I am having trouble finding of creating something short and elegant.

Henrik
  • 56,228
  • 12
  • 124
  • 139
jmuhlenkamp
  • 1,993
  • 1
  • 12
  • 33

3 Answers3

1

An alternative with findInterval.

Create a sequence of dates ('tmp'), from the focal weekday ('wd') in the week of min 'dates', to max 'dates'.

Select dates corresponding to the focal weekday ('wds').

Select working days from 'dates' ('dates_1_5').

Use findInterval to roll 'wds' to closest available working day in 'dates_1_5'.

f <- function(wd, dates){
  tmp <- seq(as.Date(paste(format(min(dates), "%Y-%W"), wd, sep = "-"),
                     format = "%Y-%W-%u"),
             max(dates), by = 1)

  wds <- tmp[as.integer(format(tmp, "%u")) == wd]

  dates_1_5 <- dates[as.integer(format(dates, "%u")) %in% 1:5]

  dates_1_5[findInterval(wds, dates_1_5, left.open = TRUE) + 1]
}

Some examples:

d <- seq.Date(as.Date("2017-11-16"), as.Date("2017-11-24"), by = 1)

dates <- d[d != as.Date("2017-11-23")]
f(wd = 4, dates)
# [1] "2017-11-16" "2017-11-24"

dates <- d[d != as.Date("2017-11-16")]
f(wd = 4, dates)
# [1] "2017-11-17" "2017-11-23"

dates <- d[!(d %in% as.Date(c("2017-11-16", "2017-11-17", "2017-11-21", "2017-11-23")))]
f(wd = 2, dates)
# [1] "2017-11-20" "2017-11-22"

Slightly more compact using a data.table rolling join:

library(data.table)

wd <- 2
# using 'dates' from above

d1 <- data.table(dates)
d2 <- data.table(dates = seq(as.Date(paste(format(min(dates), "%Y-%W"), wd, sep = "-"),
                                     format = "%Y-%W-%u"),
                             max(dates), by = 1))

d1[wday(dates) %in% 2:6][d2[wday(dates) == wd + 1],
                         on = "dates", .(x.dates), roll = -Inf]

...or a non-equi join:

d1[wday(dates) %in% 2:6][d2[wday(dates) == wd + 1],
                         on = .(dates >= dates), .(x.dates), mult = "first"]

If desired, just wrap in a function as above.

Henrik
  • 56,228
  • 12
  • 124
  • 139
0

might not be the most ellegant way, but i think it should work :)

library(lubridate)


dates <- seq.Date(as.Date("2017-11-16"), as.Date("2017-11-30"), by = 1) #your dates
dates <- dates[dates != as.Date("2017-11-23")] # thursday
dates <- dates[dates != as.Date("2017-11-24")] # friday
dates <- dates[dates != as.Date("2017-11-25")] # satureday
dates <- dates[dates != as.Date("2017-11-26")] # sunday
dates <- dates[dates != as.Date("2017-11-27")] # monday
dates <- dates[dates != as.Date("2017-11-28")] # tuesday
#dates <- dates[dates != as.Date("2017-11-29")] # wednesday

dates_shall_be <- seq.Date(min(dates)-wday(min(dates))+1, max(dates), by = 1) # create a shall-be list of days within your date-range
# min(dates)-wday(min(dates))+1 shiftback mindate to get missing thursdays in week one

thuesdays_shall = dates_shall_be[wday(dates_shall_be) == 5] # get all thuesdays that should be in there

for(i in 1:6) # run threw all possible followup days till wednesday next week 
{
  thuesdays_shall[!thuesdays_shall %in% dates] = thuesdays_shall[!thuesdays_shall %in% dates] + 1 # if date is not present in your data add another day to it
}

thuesdays_shall[!thuesdays_shall %in% dates] = NA # if date is still not present in the data after 6 shifts, this thursday + the whole followup days till next thursday are missing and NA is taken
thuesdays_shall
TinglTanglBob
  • 597
  • 3
  • 12
0

I'm breaking your condition of "no external dependencies", but as you already use lubridate (which is a dependency ;-) ), I'll provide you a solution that utilizes lead and lag from dplyr. You could write write those yourself though, looking at the source, if it really is a hard condition.

What I'm doing is figuring out where the "skips" are in the sequence by computing a kind of running diff of days. Once we know where the skip is, we just roll over to the next data in the sequence, whatever that is. Now, it might well be that this isn't a Friday, but a Saturday. In that case you're going to have to figure out if you still want the next Friday, even if there is a Thurday in between.

library(dplyr)

rollover_to_next <- function(dateseq, the_day = 5) {
  day_diffs <- lead(wday(dateseq) - lag(wday(dateseq))) %% 7
  skips <- which(day_diffs > 1) 

  sort(c(dateseq[wday(dateseq) == the_day], dateseq[skips + 1]))
}

dates <- seq.Date(as.Date("2017-11-16"), as.Date("2017-11-24"), by = 1)
dates <- dates[dates != as.Date("2017-11-23")]

rollover_to_next(dates)

Output:

[1] "2017-11-16" "2017-11-24"

You might have to account for the edge case where the idx + 1 element doesn't exist, but I'll leave that up to you to handle.

Oliver Baumann
  • 1,905
  • 1
  • 7
  • 23
  • Thanks this is helpful. Basic question, it does appear to handle multiple skips correctly. For example, if I add `dates – jmuhlenkamp Oct 24 '18 at 13:40
  • @jmuhlenkamp, yeah, that's a bug you found, it doesn't handle missing dates at the beginning, because the diff will always be 1... – Oliver Baumann Oct 24 '18 at 13:55
  • @jmuhlenkamp, I'm really sorry, but right now I can't come up with any concise and general solution. Your best bet might be to spell this out in a for-loop, as your logic branches depending on next/previous values, e.g. "If the current thing is a Friday, and it is the first element, return it", "if the thing is not a Thursday and the next thing is not a Friday, find the next thing that is a Monday". Are you sure this is what you need though? This could very well be an XY-problem... – Oliver Baumann Oct 24 '18 at 16:15
  • Thanks, your answer is useful nonetheless. I went ahead without handling in my case, though still intrigued as I would have expected this to already be answered on SO, and may return to this question at some point in the future. Pardon my ignorance, what do you mean by an XY-problem? – jmuhlenkamp Oct 25 '18 at 22:53
  • @jmuhlenkamp, be sure to check out the other answers, too! Henrik provided a solution and is already curious if it helped. An XY-problem describes the situation where you ask about a solution Y to problem X you assume to be correct, so you ask about Y instead of your problem, X; it is [better described on Meta](https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem) – Oliver Baumann Oct 26 '18 at 08:57