1

I have a data.frame that looks like this:

dfTall <- frame_data(
    ~id, ~x, ~y, ~z,
      1, "a", 4, 5,
      1, "b", 6, 5,
      2, "a", 5, 4,
      2, "b", 1, 9)

I want to turn it into this:

dfWide <- frame_data(
    ~id, ~y_a, ~y_b, ~z_a, ~z_b,
      1,    4,    6,    5,    5,
      2,    5,    1,    4,    9)

Currently, I'm doing this

dfTall %>%
    split(., .$x) %>%
    mapply(function(df,name) 
        {df$x <- NULL; names(df) <- paste(names(df), name, sep='_'); df}, 
        SIMPLIFY=FALSE, ., names(.)) %>%
    bind_cols() %>%
    select(-id_b) %>%
    rename(id = id_a)

In practice, I will have a larger number of numeric columns that need to be expanded (i.e., not just y and z). My current solution works, but it has issues, like the fact that multiple copies of the id variable get added into the final data.frame and need to be removed.

Can this expansion be done using a function from tidyr such as spread?

John Kleve
  • 429
  • 1
  • 3
  • 9
  • Side note. I think `tribble` is the more up to date term for `frame_data`. See `?frame_data` details: "frame_data() is an older name for tribble(). It will eventually be phased out." – markdly Aug 24 '17 at 23:16

1 Answers1

4

It can be done with spread but not in a single step, as it involves multiple columns as values; You can firstly gather the value columns, unite the headers manually and then spread:

library(dplyr)
library(tidyr)

dfTall %>% 
    gather(col, val, -id, -x) %>% 
    unite(key, col, x) %>% 
    spread(key, val)

# A tibble: 2 x 5
#     id   y_a   y_b   z_a   z_b
#* <dbl> <dbl> <dbl> <dbl> <dbl>
#1     1     4     6     5     5
#2     2     5     1     4     9

If you use data.table, dcast supports cast multiple value columns:

library(data.table)
dcast(setDT(dfTall), id ~ x, value.var = c('y', 'z'))

#   id y_a y_b z_a z_b
#1:  1   4   6   5   5
#2:  2   5   1   4   9 
Psidom
  • 171,477
  • 20
  • 249
  • 286
  • 2
    And base R is `reshape(as.data.frame(dfTall), idvar="id", timevar="x", direction="wide", sep="_")` as `reshape` also deals with multiple variables. – thelatemail Aug 24 '17 at 23:26