1

I have seen some possible discussion of my problem elsewhere but it either wasn't resolved or I could not fully understand if the answer applied, so I'm creating a new question.

The following question in particular touches on this subject but is not resolved. Gathering wide columns into multiple long columns using pivot_longer

Take the following sample data. As you can see there is a unique identifier variable, and then 8 other variables. Of the other 8, you can group them into two sets, gpa and percent_a. For each set there is a class, group, course, and dept value.

In my actual data I have about 20 different sets, all with the same structure, the same four descriptors in each set.

What I would like to do is perform a function similar to pivot_longer. Except instead of combining multiple columns into a set of key and value columns, each unique set in my data (class, group, course, dept) would be grouped into there own key/value columns.

set.seed(101)
df <- data.frame(
  id = 1:10,
  class_gpa = rnorm(10, 0, 1),
  course_gpa = rnorm(10, 0, 1),
  group_gpa = rnorm(10, 0, 1),
  dept_gpa = rnorm(10, 0, 1),
  class_percent_a = rnorm(10, 0, 1),
  course_percent_a = rnorm(10, 0, 1),
  group_percent_a = rnorm(10, 0, 1),
  dept_percent_a = rnorm(10, 0, 1)
)

So in this example, lets say I group all of the gpa values into two columns (gpa_type, and gpa_value) and the percent_a values into two columns (percent_a_type, percent_a_value), then I would end up at the end with only 5 columns:

id, gpa_type, gpa_value, percent_a_type, percent_a_value

Is there a way to do this? Either with pivot_longer or another method. Thanks.

iago
  • 1,663
  • 14
  • 19
ecanf
  • 11
  • 1
  • 2

1 Answers1

2

Honestly, I would rather simply do:

df %>% pivot_longer(-id, names_to = c("type", ".value"), names_pattern = "([^_]+)_(.*)")

And keep the data into a more practical format:

# A tibble: 40 x 4
      id type      gpa percent_a
   <int> <chr>   <dbl>     <dbl>
 1     1 class  -0.326     0.482
 2     1 course  0.526    -1.15 
 3     1 group  -0.164    -0.260
 4     1 dept    0.895     1.51 
 5     2 class   0.552     0.758
 6     2 course -0.795    -0.274
 7     2 group   0.709    -1.41 
 8     2 dept    0.279     1.62 
 9     3 class  -0.675    -2.32 
10     3 course  1.43      0.578
# … with 30 more rows

Why duplicate the "type" attribute for each "set"?

For your desired output:

# A tibble: 40 x 5
      id gpa_type gpa_value percent_a_type percent_a_value
   <int> <chr>        <dbl> <chr>                    <dbl>
 1     1 class       -0.326 class                    0.482
 2     1 course       0.526 course                  -1.15 
 3     1 group       -0.164 group                   -0.260
 4     1 dept         0.895 dept                     1.51 
 5     2 class        0.552 class                    0.758
 6     2 course      -0.795 course                  -0.274
 7     2 group        0.709 group                   -1.41 
 8     2 dept         0.279 dept                     1.62 
 9     3 class       -0.675 class                   -2.32 
10     3 course       1.43  course                   0.578
# … with 30 more rows

You could try:

lst_df <- df %>%
  gather(key, value, -id) %>%
  extract(key, into = c("var", "type"), "([^_]+)_(.*)") %>%
  split(.$type) 

names(lst_df) %>% 
  map_dfc(~ setNames( 
    lst_df[[.x]] %>% 
      select(-type), 
    c("id", paste0(.x, c("_type", "_value"))))) %>%
  select(-matches("id\\d+"))
Steven Beaupré
  • 20,095
  • 7
  • 52
  • 73