2

first of all, I am sorry, I know that my question overlaps with couple of other questions here, but I am really new to R and I could not find an answer that fills the task completely and that I could actually understand.

my data set is something like that:

ID … Exam_t_minus_3 Exam_t_minus_2 Exam_t_minus_1 Grade_2012 Grade_2013 Grade_2014
1       Math        Physics         Chemestry         98         70         76
2       English     French          Russian           77         85         59
3       English     Chemistry       Biology           65         77         69

and I want it to become something like that:

ID  …   i(as t_minus_i_)    Exam         Grade
1       3                   Math          98
1       2                   Physics       70
1       1                   Chemistry     76
2       3                   English       77
2       2                   French        85
2       1                   Russian       59

I have gone through swirl() course and thought I could do it with tidyr, but I could not figure out how could I gather more than one set of columns. I ended up with a result like:

ID  …   ................    Exam         Grade
1       .                   Math          98
1       .                   Math          70
1       .                   Math          76
1       .                   Physics       98
1       .                   Physics       70
1       .                   Physics       76

I looked into this: Gather multiple sets of columns and some others, but most of them lost me on half way.

I tried the answer for this question as well (I changed the earning columns to end with T_minus_* beforehand): Gather multiple columns with tidyr

i.e I tried

library(data.table)
res = melt(setDT(sample_df), 
measure.vars = patterns("^Exam", "^Grade"), 
variable.name = "i")
res[, i := factor(i, labels = c("3","2", "1"))]

It does the most of the thing I need, however, I need to the values in my i columns to be integers not categorical variables, because I need them for the future calculations.

I tried to do res$i <- as.numeric(res$i), but that changed the order, i.e "3" was evaluated to 1, "1" to 3. I tried to just leave it out, but that gives me 1,2,3 as well for the i column values.

As I changed the earnings columns to be Earnings_T_minus_* could I perhaps just get those * values in the i column somehow?

library(data.table)
res = melt(setDT(sample_df), 
measure.vars = patterns("^Exam_T_minus_*", "^Grade_T_minus_*"), 
variable.name = "i")

Sorry, the question became a bit long and perhaps confusing, but hope someone can lead me in right direction.

HelMel
  • 23
  • 1
  • 5
  • To overcome my issue, I just used select(), to re-arrange my columns, before melting. However, it is suitable just because I need numbers from 3 to 1. Hence, I will leave my question up, in case someone can offer me a better solution. – HelMel Jul 27 '17 at 15:48

2 Answers2

1

A solution using dplyr and tidyr.

It needs two times of gather process. The first time, we can gather the i(as t_minus_i_) and the Exam. The second time, we can gather the Year and the Grade.

starts_with is a helper function that can select columns with a beginning of string. sub is to replace the string with empty string "" to further format the columns. as.integer is to convert a character number to an integer. dt2 is the final output.

library(dplyr)
library(tidyr)

dt2 <- dt %>%
  # First gather process: gather the i(as t_minus_i_) and Exam
  gather(`i(as t_minus_i_)`, Exam, starts_with("Exam")) %>%
  mutate(`i(as t_minus_i_)` = as.integer(sub("Exam_t_minus_", "", `i(as t_minus_i_)`))) %>%
  # Second gather process: gather the Grade
  gather(Year, Grade, starts_with("Grade")) %>%
  mutate(Year = as.integer(sub("Grade_", "", Year))) %>%
  arrange(ID)

dt2
#    ID i(as t_minus_i_)      Exam Year Grade
# 1   1                3      Math 2012    98
# 2   1                2   Physics 2012    98
# 3   1                1 Chemestry 2012    98
# 4   1                3      Math 2013    70
# 5   1                2   Physics 2013    70
# 6   1                1 Chemestry 2013    70
# 7   1                3      Math 2014    76
# 8   1                2   Physics 2014    76
# 9   1                1 Chemestry 2014    76
# 10  2                3   English 2012    77
# 11  2                2    French 2012    77
# 12  2                1   Russian 2012    77
# 13  2                3   English 2013    85
# 14  2                2    French 2013    85
# 15  2                1   Russian 2013    85
# 16  2                3   English 2014    59
# 17  2                2    French 2014    59
# 18  2                1   Russian 2014    59
# 19  3                3   English 2012    65
# 20  3                2 Chemistry 2012    65
# 21  3                1   Biology 2012    65
# 22  3                3   English 2013    77
# 23  3                2 Chemistry 2013    77
# 24  3                1   Biology 2013    77
# 25  3                3   English 2014    69
# 26  3                2 Chemistry 2014    69
# 27  3                1   Biology 2014    69

DATA

dt <- read.table(text = "ID Exam_t_minus_3 Exam_t_minus_2 Exam_t_minus_1 Grade_2012 Grade_2013 Grade_2014
1       Math        Physics         Chemestry         98         70         76
                 2       English     French          Russian           77         85         59
                 3       English     Chemistry       Biology           65         77         69",
                 header = TRUE, stringsAsFactors = FALSE)
www
  • 35,154
  • 12
  • 33
  • 61
1

Your data.table approach was nearly spot on. Reshaping with multiple columns is the way to go.

library(data.table)
melt(setDT(sample_df), 
     measure.vars = patterns("^Exam", "^Grade"), value.name = c("Exam", "Grade"), 
     variable.name = "i", variable.factor = FALSE)[
       , i := 4L - as.integer(i)][order(ID)]
   ID i      Exam Grade
1:  1 3      Math    98
2:  1 2   Physics    70
3:  1 1 Chemistry    76
4:  2 3   English    77
5:  2 2    French    85
6:  2 1   Russian    59
7:  3 3   English    65
8:  3 2 Chemistry    77
9:  3 1   Biology    69

The only modifications are to pass variable.factor = FALSE as parameter to melt(), to do some arithmetic on i after coercion to integer and to order() the result appropriately.

Uwe
  • 34,565
  • 10
  • 75
  • 109