30

I like to create a table that has the frequency of several columns in my data frame. I am copying part of my data frame below.

The table is supposed to have frequency (both n and %) of "red" in Color and "F" in Gender.

I think that the dplyr package could do this but I cannot figure it out.

Thank you-

    RespondentID    Color        Gender   
1     1503          Red           F      
2     1653          NA            M   
3     1982          Red           F   
4     4862          Red           NA   
15    4880          Blue          M  

Rooz
  • 339
  • 1
  • 3
  • 6
  • 1
    How many columns? In your case a simple `table` or `prop.table` will do. – David Arenburg Jan 18 '16 at 17:29
  • I have almost 10 columns for two different data set. I am hoping to get the numbers (count and %) for each variable (column) to create a barplot showing the difference between the trends in these two dataset. – Rooz Jan 18 '16 at 19:17

1 Answers1

44
library(dplyr)

df %>%
  count(Color, Gender) %>%
  group_by(Color) %>%          # now required with changes to dplyr::count()
  mutate(prop = prop.table(n))

# Source: local data frame [4 x 4]
# Groups: Color [3]
# 
#    Color Gender     n      prop
#   (fctr) (fctr) (int)     (dbl)
# 1   Blue      M     1 1.0000000
# 2    Red      F     2 0.6666667
# 3    Red     NA     1 0.3333333
# 4     NA      M     1 1.0000000

Updating per comment -- if you want to look at each variable separately, you will need to rearrange the dataframe first. You can accomplish this with tidyr:

library(tidyr)
library(dplyr)

gather(df, "var", "value", -RespondentID) %>%
  count(var, value) %>%
  group_by(var) %>%             # now required with changes to dplyr::count()
  mutate(prop = prop.table(n))

# Source: local data frame [6 x 4]
# Groups: var [2]
# 
#      var value     n  prop
#   (fctr) (chr) (int) (dbl)
# 1  Color  Blue     1   0.2
# 2  Color   Red     3   0.6
# 3  Color    NA     1   0.2
# 4 Gender     F     2   0.4
# 5 Gender     M     2   0.4
# 6 Gender    NA     1   0.2
JasonAizkalns
  • 18,131
  • 6
  • 47
  • 99
  • I want to calculate the frequency of red in column color and female in column gender separately. These column are independent variables. I have almost 10 columns for two different data set. I am hoping to get the numbers (count and %) for each variable (column) to create a barplot showing the difference between the trends in these two dataset. – Rooz Jan 18 '16 at 19:28
  • @Rooz updated my answer per your comment. Perhaps you should post a new question being more specific of plotting being the end game because there are likely better ways to attack that specific question and result. Moreover, it sounds like the sample data provided does not accurately represent the data in question -- perhaps you should try and work on creating a reproducible example and/or sample data set that best depicts the unique challenges of your dataset. – JasonAizkalns Jan 18 '16 at 19:36
  • I will follow your suggestion for the future questions that I post. Thank you so much for your help. – Rooz Jan 18 '16 at 20:27
  • I don't understand why but I can't replicate this output. I get proportions over the whole prop column, and not per group proportions as given in the output above. I had to add group_by(var) to get the specified output. I'm leaving this comment in case anyone else has a similar experience. – Chris Beeley Nov 13 '18 at 16:24
  • I added a reprex https://gist.github.com/ChrisBeeley/0576e73951506a84a1a5cfd15cbccdbe – Chris Beeley Nov 13 '18 at 16:30
  • @ChrisBeeley I believe this is related to `dplyr` changes to the `count()` verb -- see this issue: https://github.com/tidyverse/dplyr/issues/2021. – JasonAizkalns Nov 13 '18 at 18:16
  • note it seems it's recommended not to use gather https://tidyr.tidyverse.org/reference/gather.html – baxx Dec 09 '19 at 19:45