0

A data set to work with:

df <- tibble::tribble(~person, ~age, ~height,  
                      "John", 1, 20,  
                      "Mike", 3, 50,  
                      "Maria", 3, 52,  
                      "Elena", 6, 90,  
                      "Biden", 9, 120)

I am trying to get a data frame that would have the following structure:

age | height(cm) | number of people  
0-5 | 0-50       |  2  
0-5 | 50-100     |  1  
0-5 | 100-200    |  0  
5-10 | 0-50       |  0  
5-10 | 50-100     |  1  
5-10 | 100-200    |  1

Basically, I have a data set that has a lot of information about a certain number of people. And I want to categorize this first by their age and inside of each age group to have a height group and in the end the number of people that belong in those categories.

any tips?

CroatiaHR
  • 505
  • 3
  • 12

2 Answers2

2

You can use cut() to generate bins from continuous variables, then summarise the new categories.

library(dplyr)

df %>%
  mutate(
    age_c = cut(
      age,
      breaks = c(-Inf, 5, 10),
      labels = c("0-5", "5-10"),
      right = TRUE
    ),
    height_c = cut(
      height,
      breaks = c(-Inf, 50, 100, 200),
      labels = c("0-50", "50-100", "100-200"),
      right = TRUE
    )
  ) %>%
  count(age_c, height_c, .drop = FALSE)

# A tibble: 6 x 3
  age_c height_c     n
  <fct> <fct>    <int>
1 0-5   0-50         2
2 0-5   50-100       1
3 0-5   100-200      0
4 5-10  0-50         0
5 5-10  50-100       1
6 5-10  100-200      1
EJJ
  • 1,274
  • 7
  • 16
  • 1
    very nice, I wasn't familiar with `cut` but it seems quite usefull – CroatiaHR Nov 10 '20 at 17:05
  • A follow up question, do you know if there is an easy adjustment to make it count the relative frequency and not the absolute? – CroatiaHR Nov 12 '20 at 00:31
  • this [post](https://stackoverflow.com/questions/24576515/relative-frequencies-proportions-with-dplyr) has relevant info for that – EJJ Nov 13 '20 at 15:59
2

In base R you could do:

data.frame(with(df, table(age=cut(age, c(0,5,10)), height=cut(height, c(0,50,100,200)))))

     age    height Freq
1  (0,5]    (0,50]    2
2 (5,10]    (0,50]    0
3  (0,5]  (50,100]    1
4 (5,10]  (50,100]    1
5  (0,5] (100,200]    0
6 (5,10] (100,200]    1
Onyambu
  • 31,432
  • 2
  • 14
  • 36