1

I am creating a summary data frame for multiple columns with dplyr, and it works well for desired functions excepts for count or n(). I want to show total number of observations.

 summ <- Test_data %>%            
   summarize(across(
     .cols = is.numeric,
     .fns = list(a=n(),mean=mean, stdev=sd, max=max,min=min), na.rm=TRUE,
     .names = "{col}_{fn}"
   ))

It gives error: Error: Problem with summarise() input ..1.x Can't convert an integer vector to function i Input ..1 is across(...).

Why just function for counts does not work, please advise.

tjebo
  • 12,885
  • 4
  • 34
  • 61
  • 1
    related 1: https://stackoverflow.com/questions/45024158/using-n-at-the-same-time-as-calculating-other-summary-statistics – tjebo Jan 27 '21 at 18:13
  • 2
    related 2: https://stackoverflow.com/questions/58068522/summarize-all-with-n-function – tjebo Jan 27 '21 at 18:13
  • 1
    related 3: https://stackoverflow.com/questions/22801153/dplyr-error-in-n-function-should-not-be-called-directly?noredirect=1&lq=1 – tjebo Jan 27 '21 at 18:13
  • 1
    related 4: https://stackoverflow.com/questions/55259295/unused-argument-in-summarise-n-r – tjebo Jan 27 '21 at 18:15
  • 1
    related 5: https://stackoverflow.com/q/60249276/7941188 – tjebo Jan 27 '21 at 18:16
  • 1
    In short: using n() in summarise is tricky. There are workarounds. See those threads. Plenty of answers – tjebo Jan 27 '21 at 18:16
  • 1
    Thanks for pointing out. I saw few other link which you suggest here, I did not get this one earlier though which is what I needed. https://stackoverflow.com/questions/58068522/summarize-all-with-n-function – user7256821 Jan 27 '21 at 18:36

1 Answers1

1

As @tjebo comments, n() is tricky inside of a summarize().

This is discussed the comment he linked - summarize_all with "n()" function - where @akrun explains "getting the n() for each column is not making much sense as it would be the same [for each summarized column]." The reason n() is giving you problems is because n() doesn't take any arguments, but the summarize is sending the value of the selected .cols as an argument.

Two solutions are to replace with the length function, or add a leading ~ to turn n() into ~ n(). (However, I don't know why ~ n() works, is it turning the code into a single-sided formulae? )

Also, use where(is.numeric) in the column selection.

iris %>% 
    summarize(across(.cols  = where(is.numeric), 
                     .fns   = list(n = ~ n(), mean = mean, sd = sd), 
                     .names = "{col}_{fn}"))

iris %>% 
    summarize(across(.cols  = where(is.numeric), 
                     .fns   = list(n = length, mean = mean, sd = sd), 
                     .names = "{col}_{fn}"))
M.Viking
  • 1,658
  • 2
  • 8
  • 21
  • 1
    Can you explain why n() is working with the leading tilde sign in this case? Thanks. – TarJae Jan 27 '21 at 18:33
  • 1
    Solution suggested works, but I can't understand the reason for n() not working in summarise, is it related with groupby which I am not using? – user7256821 Jan 27 '21 at 18:34