0

I wanted to use a pipe to take a mean of a column from some data in a package, and was surprised when I could not.

At first I thought it had to do with piping, but apparently it was because the df column needed to be unlisted before taking the mean. Is it possible to pipe data from a df/tibble directly to a mean function without de-listing first?

install.packages("UsingR")
library(UsingR)
library(dplyr)

father.son %>% 
  filter(round(fheight) == 71) %>%
  select(sheight) %>% mean

[1] NA
Warning message:
In mean.default(.) : argument is not numeric or logical: returning NA

When I pipe all that into a new object and unlist it, I can take the mean; can I do that right from the pipe?

s <- father.son %>% 
  filter(round(fheight) == 71) %>%
  select(sheight)
mean(unlist(s))

> mean(unlist(s))
[1] 70.54082
cumin
  • 411
  • 4
  • 15
  • 4
    `%>% unlist %>% mean`? Sure, you can do that. I think newer versions of `dplyr` have `pull`, which combines that functionality with selecting a column, so you can also do `mtcars %>% pull(mpg) %>% mean`. – Gregor Thomas Aug 31 '17 at 00:15
  • Suggested dupe: [extract dplyr tbl column as a vector](https://stackoverflow.com/q/21618423/903061) – Gregor Thomas Aug 31 '17 at 00:18
  • I thought one could just operate on the col directly; I did not know about 'pull'. Thank you. – cumin Aug 31 '17 at 00:21
  • I mean, the point of dplyr is to keep everything in a data.frame, so why not just `father.son %>% filter(round(fheight) == 71) %>% summarise_all(mean)`? – alistaire Aug 31 '17 at 02:23
  • I guess you _could_ pipe directly into `colMeans`, though it returns a vector, breaking the dplyr paradigm. – alistaire Aug 31 '17 at 02:25

2 Answers2

3

In addition to doing what Gregor suggests and piping everything though unlist, you can also stay in the dplyr framework and use summarize:

father.son %>% 
  filter(round(fheight) == 71) %>%
  summarize(mean(sheight))

Or use $ to extract the data, in one of the three following ways:

father.son %>% 
  filter(round(fheight) == 71) %>%
  .$sheight %>% 
   mean

father.son %>% 
  filter(round(fheight) == 71) %>%
  `$`(sheight) %>% 
   mean

library(magrittr)
father.son %>% 
  filter(round(fheight) == 71) %>%
  use_series(sheight) %>% mean
Christoph Wolk
  • 1,688
  • 1
  • 5
  • 12
3

With the latest editions to dplyr there is a function to grab a column and return a vector, pull.

father.son %>% 
  filter(round(fheight) == 71) %>%
  dplyr::pull(sheight) %>% 
  mean
roarkz
  • 749
  • 8
  • 19