1

I plan to use a loop to go through a few 100 columns and calculate a median and count against a set of variables.

Code :

grp_var <- "Species" 
voi <- "Sepal.Length"
dmp <- iris%>%
      select_(grp_var,voi)%>% 
      group_by_(grp_var)%>%
      summarise_(Median_Value = median(voi),Count = n())

Error :

Error in n() : This function should not be called directly

I face this error when i use summarise_ and don't face any issues at all when i use summarise.

I know that the function call to n() is just the error message but when called within a dplyr function it should return the row count. Am i just being dumb about this or is this a bug ?

Edit : I don't have any conflicts for the summarise function , plyr isn't loaded.

eli-k
  • 8,496
  • 10
  • 38
  • 42
Abhishek Vij
  • 113
  • 1
  • 4
  • 2
    https://stackoverflow.com/questions/22801153/dplyr-error-in-n-function-should-not-be-called-directly – M-- Aug 11 '17 at 21:07
  • 1
    looks like you have the plyr package loaded. try `detach("package:plyr")` first – Richard Telford Aug 11 '17 at 21:08
  • I have checked for conflicts and there aren't any. I haven't loaded plyr package either. – Abhishek Vij Aug 11 '17 at 21:24
  • The problem is mixing standard and non-standard evaluation inside `summarise_`. The solution is covered in keiku's answer at the [question that keeps getting linked](https://stackoverflow.com/a/42716540/903061). – Gregor Thomas Aug 11 '17 at 21:46
  • Since `dplyr` is now in version >= 0.7, tidyeval is now the new concept. all `*_` function are now deprecated. linked answer is outdated for current `dplyr` – cderv Aug 11 '17 at 22:02
  • True, but the question is explicitly about a `*_` function. – Gregor Thomas Aug 11 '17 at 22:10
  • Yes true. However I think it shouldn't be used anymore as it will disappear. – cderv Aug 12 '17 at 06:10

1 Answers1

2

First, as explained in the comments, you mixed standard evaluation and non-standard evaluation. n() is not found because you can't use it like that in *_ functions. In dplyr before 0.7.0, you would use ~n() in summarise_.

However things have changed in the tidyverse world.

Since version 0.7.0, dplyr uses now a new system for programming with dplyr, called tidy evaluation, or tidy eval for short. All function with *_ are now deprecated and should not be used in new code, unless you want to keep a dependency on an old dplyr version. I'll advice to use tidy eval now. I will not explained it here, you could see the Programming vignette

For example, now you would do something like this with dplyr (>= 0.7.0):


library(dplyr)
# quo is a tidy eval concept for quoting
grp_var <-quo(Species)
voi <- quo(Sepal.Length)
# use !! another tidy eval concept to unquote
dmp <- iris %>%
  select(!! grp_var, !! voi) %>% 
  group_by(!! grp_var) %>%
  summarise(Median_Value = median( !! voi ), Count = n())
dmp
#> # A tibble: 3 x 3
#>      Species Median_Value Count
#>       <fctr>        <dbl> <int>
#> 1     setosa          5.0    50
#> 2 versicolor          5.9    50
#> 3  virginica          6.5    50
cderv
  • 4,972
  • 1
  • 16
  • 22