0

EDIT:

Upon further examination, this dataset is way more insane than I previously believed.

Values have been encapsulated in the column names!

My dataframe looks like this:

| ID | Year1_A | Year1_B | Year2_A | Year2_B |
|----|---------|---------|---------|---------|
| 1  | a       | b       | 2a      | 2b      |
| 2  | c       | d       | 2c      | 2d      |

I am searching for a way to reformat it as such:

| ID | Year | _A  | _B  |
|----|------|-----|-----|
| 1  | 1    | a   |  b  |
| 1  | 2    | 2a  |  2b |
| 2  | 1    | c   |  d  |
| 2  | 2    | 2c  |  2d | 

The answer below is great, and works perfectly, but the issue is that the dataframe needs more work -- somehow possibly be spread back out, so that each row has 3 columns.

My best idea was to do merge(df, df, by="ID") and then filter out the unwanted rows but this is quickly becoming unwieldy.

df <- data.frame(ID = 1:2, Year1_A = c('a', 'c'), Year1_B = c('b','d' ), Year2_A = c('2a', '2c'), Year2_B = c('2b', '2d'))
Monica Heddneck
  • 2,967
  • 5
  • 41
  • 77
  • Apply `separate` on the `Year` column from the answer below, then use `spread`, i.e. `df %>% separate(Year, c("Year", "Part")) %>% spread(Part, value)` – Weihuang Wong Jan 05 '17 at 22:08

1 Answers1

0
library(tidyr)

# your example data
df <- data.frame(ID = 1:2, Year1_A = c('a', 'c'), Year1_B = c('b','d' ), Year2_A = c('2a', '2c'), Year2_B = c('2b', '2d'))

# the solution
df <- gather(df, Year, value, -ID)

# cleaning up
df$Year <- gsub("Year", "", df$Year)

Result:

> df
  ID Year value
1  1  1_A     a
2  2  1_A     c
3  1  1_B     b
4  2  1_B     d
5  1  2_A    2a
6  2  2_A    2c
7  1  2_B    2b
8  2  2_B    2d
Fr.
  • 2,635
  • 2
  • 21
  • 42
  • 2
    Full tidyverse, this would look like `df %>% gather(Year, val, -ID) %>% mutate(Year = sub('Year', '', Year))` – alistaire Jan 05 '17 at 01:09