2

I have a dataframe that looks like this:

 COLA     COLB    COLC   COLD     COLE      
 Name1    yes      A      AB      uno
 Name2    yes      B      AC      dos
 Name3    no       C      AB      tres
 Name4    no       D      AC      cuatro

How do I create a proportion dataframe that shows the percentages of each selected column value along with the frequency:

ATTRIBUTE   Percentages      Frequency       
*COLB*      *Percentage*     *Amount*
yes         50%              2
no          50%              2
*COLC*      *Percentage      *Amount*
A           25%              1
B           25%              1
C           25%              1
D           25%              1
*COLD*      *Percentage*     *Amount*
AB          50%              2
AC          50%              2

It doesn't need to look exactly like this but I need it to all be in one dataframe and include only the selected columns mentioned.

Any help would be great, thanks!

nak5120
  • 3,410
  • 3
  • 23
  • 62
  • Have a look [here](http://stackoverflow.com/questions/24576515/relative-frequencies-proportions-with-dplyr). Melting the data first is probably required. – Haboryme Oct 10 '16 at 18:18

1 Answers1

2

You can do the following:

dat <- data.frame(COLA=paste0("name",1:4),
                  "COLB"=c("yes", "yes", "no", "no"))

require(purrr)
col_to_stat <- function(col){
  tmp <- table(col)
  data.frame(ATTRIBUTE = names(tmp), Percentages = c(tmp/length(col)), Frequency = c(tmp),
             stringsAsFactors = FALSE)
}
map_df(dat, col_to_stat, .id="col")

Which gives you:

   col ATTRIBUTE Percentages Frequency
1 COLA     name1        0.25         1
2 COLA     name2        0.25         1
3 COLA     name3        0.25         1
4 COLA     name4        0.25         1
5 COLB        no        0.50         2
6 COLB       yes        0.50         2

If you want to print percentages instead of decimals have a look at: How to format a number as percentage in R?

P.S.: If you use tibble instead of data.frame you can use the following which is shorter:

tibble(ATTRIBUTE = names(tmp), Percentages = tmp/length(col), Frequency = tmp)
Rentrop
  • 18,602
  • 6
  • 64
  • 93