1

I have survey data where one respondent per household was asked about the age and education level of each member of the household. The data are wide, in the sense that there are indexed columns for the age and education of each household member.

Here's a simple example:

df <- 
  data.frame(
    HHID = 1:2,
    age_1 = c(4, 19),
    yrsedu_1 = c(8,12),    
    age_2 = c(7, 6),
    yrsedu_2 = c(14, 6)
  )

Thus age_1 is the age of one member of household 1 and edu_2 is their education level.

Essentially I want to stack the pairs of columns (2 columns per household member) to end up with a long dataset like this:

  data.frame(
    HHID = c(1,1,2,2),
    hh_child_number = c(1,2,1,2),
    age = c(4,7,19,6),    
    yrsedu = c(8,14,12,6))  

How can I do this with gather in tidyr? The pairs of columns are causing trouble.

lethalSinger
  • 445
  • 2
  • 6
  • Is this question not covered by [this](https://stackoverflow.com/q/12466493/5325862) post, particularly the `pivot_longer` answers? – camille Mar 12 '20 at 00:48

2 Answers2

3

As gather is retired you can use the new pivot_longer :

tidyr::pivot_longer(df, cols = -HHID, 
         names_to = c(".value", "hh_child_number"), 
         names_sep = "_") %>% type.convert(as.is = TRUE)


# A tibble: 4 x 4
#   HHID hh_child_number   age yrsedu
#  <int>           <int> <int>  <int>
#1     1               1     4      8
#2     1               2     7     14
#3     2               1    19     12
#4     2               2     6      6
Ronak Shah
  • 286,338
  • 16
  • 97
  • 143
  • We seem to have gone full circle and `pivot_longer` isn't that different to the old base R `reshape` - e.g.: `idv – thelatemail Mar 12 '20 at 00:52
  • I wouldn't disagree, in `tidyverse` they keep on experimenting with a lot of things and finally agree that the good old solutions were better. :P – Ronak Shah Mar 12 '20 at 01:03
0

It's not a tidyverse solution, but just to mention it, one alternative will be to use melt function from data.table:

library(data.table)
age <- grep("age",colnames(df))
yr <- grep("yrsedu", colnames(df))

setDT(df)
melt(df, measure = list(age,yr), value.name = c("Age", "Yrsedu"), variable.name = "hh_child_number")

   HHID hh_child_number Age Yrsedu
1:    1               1   4      8
2:    2               1  19     12
3:    1               2   7     14
4:    2               2   6      6
dc37
  • 14,370
  • 3
  • 11
  • 27