4

I have a simple data frame with 3 columns: name, goal, and actual. Because this is a simplification of much larger dataframe, I want to use dplyr to compute the number of times a goal has been met by each person.

df <- data.frame(name = c(rep('Fred', 3), rep('Sally', 4)),
                 goal = c(4,6,5,7,3,8,5), actual=c(4,5,5,3,3,6,4))

enter image description here

The result should look like this:

enter image description here

I should be able to pass an anonymous function similar to what is shown below, but don't have the syntax quite right:

library(dplyr)
g <- group_by(df, name)
summ <- summarise(g, met_goal = sum((function(x,y) {
                                       if(x>y){return(0)}
                                       else{return(1)}
                                     })(goal, actual)
                                    )
                  )

When I run the code above, I see 3 of these errors:

Warning messages: 1: In if (x == y) { : the condition has length > 1 and only the first element will be used

Rich Scriven
  • 90,041
  • 10
  • 148
  • 213
Michael Szczepaniak
  • 1,600
  • 19
  • 32

3 Answers3

4

We have equal length vectors in goal and actual, so the relational operators are appropriate to use here. However, when we use them in a simple if() statement we may get unexpected results because if() expects length 1 vectors. Since we have equal length vectors and we require a binary result, taking the sum of the logical vector is the best approach, as follows.

group_by(df, name) %>%
    summarise(met_goal = sum(goal <= actual))
# A tibble: 2 x 2
    name met_goal
  <fctr>    <int>
1   Fred        2
2  Sally        1

The operator is switched to <= because you want 0 for goal > actual and 1 otherwise.

Note that you can use an anonymous function. It was the if() statement that was throwing you off. For example, using

sum((function(x, y) x <= y)(goal, actual)) 

would work in the manner you are asking about.

Rich Scriven
  • 90,041
  • 10
  • 148
  • 213
  • 1
    This answers the question well. I did overcomplicate my attempt intentionally because I wanted to see how a more complex/general anonymous function could be passes. – Michael Szczepaniak Sep 22 '17 at 22:43
  • 1
    @MichaelSzczepaniak - Note that you *can* use an anonymous function. It was the `if()` statement that was throwing you off. For example, `sum((function(x, y) x <= y)(goal, actual))` would work. – Rich Scriven Sep 22 '17 at 22:49
  • 1
    That was EXACTLY what I was looking for. Thanks for explaining this (twice ;-). – Michael Szczepaniak Sep 22 '17 at 22:51
2

Solution using data.table:

You asked for dplyr solution, but as actual data is much larger you can use data.table. foo is function you want to apply.

foo <- function(x, y) {
    res <- 0
    if (x <= y) {
        res <- 1
    }
    return(res)
}

library(data.table)
setDT(df)
setkey(df, name)[, foo(goal, actual), .(name, 1:nrow(df))][, sum(V1), name]

If you prefer pipes then you can use this:

library(magrittr)
setDT(df) %>%
    setkey(name) %>%
    .[, foo(goal, actual), .(name, 1:nrow(.))] %>%
    .[, .(met_goal = sum(V1)), name]

    name met_goal
1:  Fred        2
2: Sally        1
pogibas
  • 24,254
  • 17
  • 63
  • 100
0

Found myself needing to do something similar to this again (a year later) but with a more complex function than the simple one provided in the original question. The originally accepted answer took advantage of a specific feature of the problem, but the more general approach was touched on here. Using this approach, the answer I was ultimately after was something like this:

library(dplyr)

df <- data.frame(name = c(rep('Fred', 3), rep('Sally', 4)),
                 goal = c(4,6,5,7,3,8,5), actual=c(4,5,5,3,3,6,4))

my_func = function(act, goa) {
  if(act < goa) {
    return(0)
  } else {
    return(1)
  }
}

g <- group_by(df, name)
summ = df %>% group_by(name) %>%
  summarise(met_goal = sum(mapply(my_func, .data$actual, .data$goal)))

> summ
# A tibble: 2 x 2
  name  met_goal
  <fct>    <dbl>
1 Fred         2
2 Sally        1

The original question referred to using an anonymous function. In that spirit, the last part would look like this:

g <- group_by(df, name)
summ = df %>% group_by(name) %>%
  summarise(met_goal = sum(mapply(function(act, go) {
                                    if(act < go) {
                                      return(0)
                                    } else {
                                      return(1)
                                    }
                                  }, .data$actual, .data$goal)))
Michael Szczepaniak
  • 1,600
  • 19
  • 32