9

I am working on the transformation of week based dates to month based dates.

When checking my work, I found the following problem in my data which is the result of a simple call to as.Date()

as.Date("2016-50-4", format = "%Y-%U-%u")
as.Date("2016-50-5", format = "%Y-%U-%u")
as.Date("2016-50-6", format = "%Y-%U-%u")
as.Date("2016-50-7", format = "%Y-%U-%u") # this is the problem

The previous code yields correct date for the first 3 lines:

"2016-12-15"
"2016-12-16"
"2016-12-17"  

The last line of code however, goes back 1 week:

 "2016-12-11"

Can anybody explain what is happening here?

Aramis7d
  • 2,261
  • 18
  • 23
KoenV
  • 3,428
  • 2
  • 16
  • 27
  • 3
    The issue is that %u starts with Monday, and %U starts with Sunday, so `as.Date("2016-50-7", format = "%Y-%U-%u")` is interpreted as the first day (Sunday) of week 50. See `?strptime` to verify. – lmo Jan 18 '17 at 16:33
  • following @Imo's comment, `as.Date("2016-50-7", format = "%Y-%V-%u")` seems to work, but would produce different results than your example for some years - how important this is may depend on your application – Miff Jan 18 '17 at 16:43
  • @Miff on my computer that results in `"2016-01-18"` which isn't the correct date – Jaap Jan 18 '17 at 17:18
  • @Miff on my computer I get the same result Jaap mentions "2016-01-18", for all lines of input i.e.: "2016-50-4" and so on – KoenV Jan 18 '17 at 17:32
  • @lmo, I'd say that `as.Date("2016-50-7", format = "%Y-%U-%u")` is interpreted as the **seventh** day (Sunday) of week 50 – d.b Jan 18 '17 at 17:36
  • @DarshanBaral Take a closer look at the OP's output, or copy and paste each line of code and you will see what I mean. – lmo Jan 18 '17 at 18:29

3 Answers3

9

Working with week of the year can become very tricky. You may try to convert the dates using the ISOweek package:

# create date strings in the format given by the OP
wd <- c("2016-50-4","2016-50-5","2016-50-6","2016-50-7", "2016-51-1", "2016-52-7")
# convert to "normal" dates
ISOweek::ISOweek2date(stringr::str_replace(wd, "-", "-W"))

The result

#[1] "2016-12-15" "2016-12-16" "2016-12-17" "2016-12-18" "2016-12-19" "2017-01-01"

is of class Date.

Note that the ISO week-based date format is yyyy-Www-d with a capital W preceeding the week number. This is required to distinguish it from the standard month-based date format yyyy-mm-dd.

So, in order to convert the date strings provided by the OP using ISOweek2date() it is necessary to insert a W after the first hyphen which is accomplished by replacing the first - by -W in each string.

Also note that ISO weeks start on Monday and the days of the week are numbered 1 to 7. The year which belongs to an ISO week may differ from the calendar year. This can be seen from the sample dates above where the week-based date 2016-W52-7 is converted to 2017-01-01.

About the ISOweek package

Back in 2011, the %G, %g, %u, and %V format specifications weren't available to strptime() in the Windows version of R. This was annoying as I had to prepare weekly reports including week-on-week comparisons. I spent hours to find a solution for dealing with ISO weeks, ISO weekdays, and ISO years. Finally, I ended up creating the ISOweek package and publishing it on CRAN. Today, the package still has its merits as the aforementioned formats are ignored on input (see ?strptime for details).

Uwe
  • 34,565
  • 10
  • 75
  • 109
  • Thank you for your contribution and time. I will work with Darshan Barans elaboration on your post. – KoenV Jan 19 '17 at 06:50
  • I did some manual testing on your proposal: i.e. one line of code using the ISOweek package. So far, this seems to work perfectly. I will now run this is my full dataset and report back. – KoenV Jan 19 '17 at 13:04
  • In all its simplicity, this continues to work without problems for my dataset with dates from 2014-2017(now). A big thank you ! – KoenV Jan 19 '17 at 13:34
  • I have tested Uwe Block's solution on a dataset with dates from 2004 up to halfway January 2017. This approach works perfectly. – KoenV Feb 01 '17 at 09:47
  • @KoenV Glad to hear that and thank for all the testing. – Uwe Feb 01 '17 at 09:50
5

As @lmo said in the comments, %u stands for the weekdays as a decimal number (1–7, with Monday as 1) and %U stands for the week of the year as decimal number (00–53) using Sunday as the first day. Thus, as.Date("2016-50-7", format = "%Y-%U-%u") will result in "2016-12-11".

However, if that should give "2016-12-18", then you should use a week format that has also Monday as starting day. According to the documentation of ?strptime you would expect that the format "%Y-%V-%u" thus gives the correct output, where %V stands for the week of the year as decimal number (01–53) with monday as the first day.

Unfortunately, it doesn't:

> as.Date("2016-50-7", format = "%Y-%V-%u")
[1] "2016-01-18"

However, at the end of the explanation of %V it sais "Accepted but ignored on input" meaning that it won't work.

You can circumvent this behavior as follows to get the correct dates:

# create a vector of dates
d <- c("2016-50-4","2016-50-5","2016-50-6","2016-50-7", "2016-51-1")

# convert to the correct dates
as.Date(paste0(substr(d,1,8), as.integer(substring(d,9))-1), "%Y-%U-%w") + 1

which gives:

[1] "2016-12-15" "2016-12-16" "2016-12-17" "2016-12-18" "2016-12-19"
Frank
  • 63,401
  • 8
  • 85
  • 161
Jaap
  • 71,900
  • 30
  • 164
  • 175
  • Thank you. This seems to work fine, with one exception, when the number of day in the week equals 1 like "2016-50-1", resulting in NA. – KoenV Jan 18 '17 at 17:59
  • 1
    @KoenV Fixed, I think; added an example with a `1` to demo. – Frank Jan 18 '17 at 18:08
  • @Frank. Thank you again. This seems to work perfectly. I will do some additional testing on larger datasets later on, and report back on this forum. – KoenV Jan 18 '17 at 18:12
  • @Jaap @Frank The correct format string would be `"%G-%V-%u"`. `%G` is the week-based year. As already mentioned this will work only for output as these format specifiers are _Accepted but ignored on input._ – Uwe Jan 19 '17 at 07:59
  • 1
    @Frank Your proposal works for most of the dates but not for some near the year end. The dates it is not working for are (not exhaustive): `"2014-52-6" "2015-53-6" "2015-53-4" "2015-53-1" "2015-53-2" ` This generates NAs. – KoenV Jan 19 '17 at 10:09
2

The issue is because for %u, 1 is Monday and 7 is Sunday of the week. The problem is further complicated by the fact that %U assumes week begins on Sunday.

For the given input and expected behavior of format = "%Y-%U-%u", the output of line 4 is consistent with the output of previous 3 lines.

That is, if you want to use format = "%Y-%U-%u", you should pre-process your input. In this case, the fourth line would have to be as.Date("2016-51-7", format = "%Y-%U-%u") as revealed by

format(as.Date("2016-12-18"), "%Y-%U-%u")
# "2016-51-7"

Instead, you are currently passing "2016-50-7".

Better way of doing it might be to use the approach suggested in Uwe Block's answer. Since you are happy with "2016-50-4" being transformed to "2016-12-15", I suspect in your raw data, Monday is counted as 1 too. You could also create a custom function that changes the value of %U to count the week number as if week begins on Monday so that the output is as you expected.

#Function to change value of %U so that the week begins on Monday
pre_process = function(x, delim = "-"){
    y = unlist(strsplit(x,delim))
    # If the last day of the year is 7 (Sunday for %u),
    # add 1 to the week to make it the week 00 of the next year
    # I think there might be a better solution for this
    if (y[2] == "53" & y[3] == "7"){
        x = paste(as.integer(y[1])+1,"00",y[3],sep = delim)
    } else if (y[3] == "7"){
    # If the day is 7 (Sunday for %u), add 1 to the week 
        x = paste(y[1],as.integer(y[2])+1,y[3],sep = delim)
    }
    return(x)
}

And usage would be

as.Date(pre_process("2016-50-7"), format = "%Y-%U-%u")
# [1] "2016-12-18"

I'm not quite sure how to handle when the year ends on a Sunday.

Community
  • 1
  • 1
d.b
  • 29,772
  • 5
  • 24
  • 63
  • 1
    Thank you. I will test your approach as well. In order to test for years ending on a Sunday, I need to include **2006**. I need time to get these raw data before I can run such a test. Currently, I am doing a first EDA with years 2014-2017. – KoenV Jan 19 '17 at 06:27
  • 1
    @KoenV @Darshan The correct format string would be `"%G-%V-%u"`. `%G` is the week-based year. As already mentioned this will work only for output as these format specifiers are _Accepted but ignored on input._ – Uwe Jan 19 '17 at 07:57