how do I gather 2 sets of columns in tidyr

Question

I have the following structure:

key | category_x | 2009 | category_y | 2010
test

example data as requested

set.seed(24)
df <- data.frame(
key = 1:10,
category_x = paste0("stock_", 0:9),
'2008' = rnorm(10, 0, 10),
category_y = paste0("stock_", 0:9),
'2009' = rnorm(10, 0, 10),
category_z = paste0("stock_", 0:9),
'2010' = rnorm(10, 0, 10),
check.names=FALSE
)

how do I change that into:

key | category | year

I know I can use:

library(magrittr)
library(dplyr)
library(tidyr)

data %>% gather(key, category, starts_with("category_"))

but that doesn't deal with the year. I looked at Gather multiple sets of columns

but I don't get the extract spread commands.

how do I generate random strings for you to test then? – KillerSnail Aug 26 '15 at 13:57 — KillerSnail, Aug 26 '15 at 13:57
yup fixed the paste command – KillerSnail Aug 26 '15 at 14:11 — KillerSnail, Aug 26 '15 at 14:11

akrun · Accepted Answer · 2015-08-26T14:17:03.530

If we are using gather, we can do this in two steps. First, we reshape from 'wide' to 'long' format for the column names that starts with 'category' and in the next step, we do the same with the numeric column names by selecting with matches. The matches can regex patterns, so a pattern of ^[0-9]+$ means we match one or more numbers ([0-9]+) from the start (^) to the end ($) of string. We can remove the columns that are not needed with select.

library(tidyr)
library(dplyr) 
gather(df, key, category, starts_with('category_')) %>%
     gather(key2, year, matches('^[0-9]+$')) %>%
     select(-starts_with('key'))

Or using the devel version of data.table, this would be much easier as the melt can take multiple patterns for measure columns. We convert the 'data.frame' to 'data.table' (setDT(df)), use melt and specify the patterns with in the measure argument. We also have options to change the column names of the 'value' column. The 'variable' column is set to NULL as it was not needed in the expected output.

library(data.table)#v1.9.5+
melt(setDT(df), measure=patterns(c('^category', '^[0-9]+$')), 
           value.name=c('category', 'year'))[, variable:=NULL][]

tidy version yields incorrect result – Nettle Oct 07 '18 at 15:10 — Nettle, Oct 07 '18 at 15:10

how do I gather 2 sets of columns in tidyr

1 Answers1

Linked