3

My data is in a dataframe which has a structure like this:

df2 <- data.frame(Year = c("2007"), Week = c(1:12), Measurement = c(rnorm(12, mean = 4, sd = 1)))

Unfortunately I do not have the complete date (e.g. days are missing) for each measurement, only the Year and the Weeks (these are ISO weeks).

Now I want to aggregate the Median of a Month's worth of measurements (e.g. the weekly measurements per month of the specific year) into a new column, Months. I did not find a convenient way to do this without having the exact day of the measurements available. Any inputs are much appreciated!

Roggan
  • 103
  • 9

4 Answers4

5

When it is necessary to allocate a week to a single month, the rule for first week of the year might be applied, although ISO 8601 does not consider this case. (Wikipedia)

For example, the 5th week of 2007 belongs to February, because the Thursday of the 5th week was the 1st of February.

I am using data.table and ISOweek packages. See the example how to compute the month of the week. Then you can do any aggregation by month.

require(data.table)
require(ISOweek)

df2 <- data.table(Year = c("2007"), Week = c(1:12),
                  Measurement = c(rnorm(12, mean = 4, sd = 1)))

# Generate Thursday as year, week of the year, day of week according to ISO 8601
df2[, thursday_ISO := paste(Year, sprintf("W%02d", Week), 4, sep = "-")]

# Convert Thursday to date format
df2[, thursday_date := ISOweek2date(thursday_ISO)]

# Compute month
df2[, month := format(thursday_date, "%m")]
df2

Suggestion by Uwe to compute a year-month string.

# Compute year-month
df2[, yr_mon := format(ISOweek2date(sprintf("%s-W%02d-4", Year, Week)), "%Y-%m")]
df2

And finally you can do an aggregation to the new table or by adding median as a column.

df2[, median(Measurement), by = yr_mon]

df2[, median := median(Measurement), by = yr_mon]
df2
Uwe
  • 34,565
  • 10
  • 75
  • 109
djhurio
  • 5,108
  • 4
  • 24
  • 43
  • 1
    Thanks, this worked for me! I chose your answer, cause it is more suited for later implementation for other years in my data. – Roggan Sep 20 '17 at 12:27
  • 3
    Great Idea to take the month of the Thursday of each week. This ensures that the whole week is assigned to the month to which the majority of the days of the week belong to. However, to be on the safe side I suggest to create a year-month string and to streamline the code, i.e., `yr_mon := format(ISOweek2date(sprintf("%i-W%02i-4", Year, Week)))`. – Uwe Sep 20 '17 at 13:58
  • 2
    Just noticed that you have spared the aggregation part. So, a complete answer would be `library(data.table); setDT(df2)[, median(Measurement), by = .(Yr_Mon = format(ISOweek::ISOweek2date(sprintf("%s-W%02d-4", Year, Week)), "%Y-%m"))]` – Uwe Sep 20 '17 at 14:33
  • @Uwe, thanks! Just noticed you are the author of the `ISOweek`. Thanks a lot for the package! – djhurio Sep 20 '17 at 14:48
  • Sorry for me taking so long to accept the answer. Thanks a thousand times for your explanations!! – Roggan Oct 06 '17 at 08:25
2

If I understand correctly, you don't know the exact day, but only the week number and year. My answer takes the first day of the year as a starting date and then compute one week intervals based on that. You can probably refine the answer.

Based on an answer by mnel, using the lubridate package.

library(lubridate)

# Prepare week, month, year information ready for the merge
# Make sure you have all the necessary dates
wmy <- data.frame(Day = seq(ymd('2007-01-01'),ymd('2007-04-01'), 
                            by = 'weeks')) 
wmy <- transform(wmy, 
                 Week = isoweek(Day),
                 Month = month(Day),
                 Year = isoyear(Day))

# Merge this information with your data
merge(df2, wmy, by = c("Year", "Week"))

   Year Week Measurement        Day Month
1  2007    1    3.704887 2007-01-01     1
2  2007   10    1.974533 2007-03-05     3
3  2007   11    4.797286 2007-03-12     3
4  2007   12    4.291169 2007-03-19     3
5  2007    2    4.305010 2007-01-08     1
6  2007    3    3.374982 2007-01-15     1
7  2007    4    3.600008 2007-01-22     1
8  2007    5    4.315184 2007-01-29     1
9  2007    6    4.887142 2007-02-05     2
10 2007    7    4.155411 2007-02-12     2
11 2007    8    4.711943 2007-02-19     2
12 2007    9    2.465862 2007-02-26     2
Paul Rougieux
  • 7,937
  • 1
  • 49
  • 82
  • Thanks for this detailed answer! It worked really well, but i chose to accept another answer since it offers more flexibility with the `ISOweek` package later on in my code. – Roggan Sep 20 '17 at 12:28
  • Please, note that the ISO week-based year may differ from the calendar year. For instance, `ISOweek::ISOweek("2010-01-01")` returns "2009-W53", `ISOweek::ISOweek("2011-01-01")` "2010-W52". See `%G` and `%g` in `?strptime`. The different conventions for efining week of the year are discussed in [this answer](https://stackoverflow.com/a/45587644/3817004). – Uwe Sep 20 '17 at 14:28
  • 1
    @Uwe thank you. I see, `lubridate::isoweek("2010-01-01")` returns `53` and `lubridate::year("2010-01-01")` returns `2010`. I should have used `lubridate::isoyear("2010-01-01")` which does return `2009`. – Paul Rougieux Sep 20 '17 at 14:50
1

using dplyr you can try:

require(dplyr)

df2 %>% mutate(Date = as.Date(paste("1", Week, Year, sep = "-"), format = "%w-%W-%Y"),
            Year_Mon = format(Date,"%Y-%m")) %>% group_by(Year_Mon) %>%
            summarise(result = median(Measurement))

As @djhrio pointed out, Thursday is used to determine the weeks in a month. So simply switch paste("1", to paste("4", in the code above.

count
  • 1,318
  • 8
  • 16
  • 2
    The OP has pointed out that he is using ISO week. Therefore, an answer using the UK definition of the week of the year `%W` and a weekday numbering starting on Sunday `%w` (ISO weeks start on Monday) is blatantly wrong (see `?strptime`). Also note that the year belonging to an ISO week may differ from the calendar year. Please, see [this answer](https://stackoverflow.com/a/45587644/3817004) for a discussion of the different conventions. – Uwe Sep 20 '17 at 14:15
  • 3
    Be careful. This solution works for 2007, but will not work for 2008 and other years when 1st January is Tu, We, or Th, because numbering of weeks by `%W` is not according to the ISO standart. From the `strptime` help: "%W: Week of the year as decimal number (00–53) using Monday as the first day of week (and typically with the first Monday of the year as day 1 of week 1)." – djhurio Sep 20 '17 at 14:37
  • @Uwe and djhurio I had no idea. Thanks for the input! – count Sep 20 '17 at 15:46
0

This can be done relatively simply in dplyr.

library(dplyr)

df2 %>% 
  mutate(Month = rep(1:3, each = 4)) %>% 
  group_by(Month) %>% 
  summarise(MonthlyMedian = stats::median(Measurement))

Basically, add a new column to define your months. I'm presuming since you don't have days, you are going to allocate 4 weeks per month? Then you just group by your Month variable and calculate the median. Very simple

Hope this helps

Quinn
  • 379
  • 4
  • 18
  • 1
    This is problematic, as week 5 in 2007 is still in January. – count Sep 20 '17 at 10:35
  • 1
    @count, actually the 5th week of 2007 is in February, because the month of a week is defined according to the Thursday of a week by ISO standart. Thursday of the 2007-W05 was 2007-02-01. – djhurio Sep 20 '17 at 13:05
  • @djhrio Interesting, didn't know that. Nevertheless, simply allocating 4 weeks per month is likely bound to produce inaccuracies in this scenario – count Sep 20 '17 at 13:16
  • How will you call the 13. month of the year you have just created by your simple method? :-) If you allocate 4 weeks per month and there are 52 or 53 weeks per year, you will end up with 13 4 weeks periods (plus 1 or 2 additional days). That's too much of simplification, IMHO. – Uwe Sep 20 '17 at 14:05
  • @Uwe, the example contained 12 weeks, and there wasn't indication if it went beyond one year, hence the super simplification – Quinn Sep 20 '17 at 14:07