1

I want to apply rolling on the function that requires 2 vector arguments. Here is the exmample (that doesn't work) using data.table:

library(data.table)
df <- as.data.table(cbind.data.frame(x=1:100, y=101:200))
my_sum <- function(x, y) {
  x <- log(x)
  y <- x * y
  return(x + y)
}
roll_df <- frollapply(df, 10, function(x, y) {
  my_sum(x, y)})

It doesn't recognize y column. Ofc, the solution can be using xts or some other package.

EDIT: This is the real function I want to apply:

library(dpseg)
dpseg_roll <- function(time, price) {
  p <- estimateP(x=time, y=price, plot=FALSE)
  segs <- dpseg(time, price, jumps=jumps, P=p, type=type, store.matrix=TRUE)
  slope_last <- segs$segments$slope[length(segs$segments$slope)]
  return(slope_last)
}
Mislav
  • 1,393
  • 10
  • 29
  • 1
    just as a small comment, you can also initiate your data.table like this: data.table(x=1:100, y=101:200) – Solarion Sep 25 '20 at 11:48

3 Answers3

2

With runner you can apply any function in rolling window. Running window can be created also on a rows of data.frame inserted to x argument. Let's focus on simpler function my_sum. Argument f in runner can accept only one object (data in this case). I encourage to put browser() to the function to debug row-by-row before you apply some fancy model on the subset (some algorithms requires some minimal number of observations).

my_sum <- function(data) {
  # browser()
  x <- log(data$x)
  y <- x * data$y
  tail(x + y, 1) # return only one value
}

my_sum should return only one value, because runner computes for each row - if my_sum returns vector, you would get a list. Because runner is an independent function you need to pass data.table object to x. Best way to do this is to use x = .SD (see here why)

df[, 
   new_col := runner(
      x = .SD,
      f = my_sum,
      k = 10
)]
GoGonzo
  • 1,857
  • 12
  • 23
0

I have no idea what you are going to do with frollapply (mean or sum or something else?).

Assuming you are about to use rolling sum, here might be one example. I rewrote your function my_sum such that it applies to df directly.

my_sum <- function(...) {
  v <- c(...)
  x <- log(v[[1]])
  y <- Reduce(`*`,v)
  return(x + y)
}

roll_df <- frollapply(
  my_sum(df), 
  10,
  FUN = sum)
ThomasIsCoding
  • 53,240
  • 4
  • 13
  • 45
0

rollapply in zoo passes a zoo object to the function to be applied if coredata=FALSE is used. The zoo object is made up of a time and a value part so we can use the following if the x value represents ascending values (which I gather it does). Note that my_sum in the question returns a 10 element result if the two arguments are length 10 so out shown below is a 100 x 10 zoo object with the first 9 rows filled with NAs.

If you don't want the NAs omit fill=NA or if you want to apply the function to partial inputs at the beginning instead of fill=NA use partial=TRUE. If you only want one of the 10 elements, such as the last one, then use function(x) my_sum(time(x), coredata(x))[10] in place of the function shown or just use out[, 10].

fortify.zoo(out) can be used to turn a zoo object out to a data frame if you need the result in that form or use as.data.frame(out) if you want to drop the times. as.data.table(out) also works in a similar manner.

library(zoo)

z <- read.zoo(df)  # df$x becomes the time part and df$y the value part
out <- rollapplyr(z, 10, function(u) my_sum(time(u), coredata(u)), 
  coredata = FALSE, fill = NA)

dim(out)
## [1] 100  10

Note that in dpseg_roll that jumps and type are not defined.

G. Grothendieck
  • 211,268
  • 15
  • 177
  • 297