0

I have a dataset similar to this:

library(tidyverse)

df <- tibble(
  subjid = 1:5,
  event_1 = c("Watery eyes",         # Event number 1 
          "Sore throat",
          "Vomiting",
          "Gastroenteritis viral",
          "Dry Mouth"),
  start_date_1 = as.Date("2017-01-02") + 0:4,
  stop_date_1 = as.Date("2017-01-03") + 0:4,
  severity_1 = 1,
  related_to_drug_1 = 0,
  event_2 = c("Nausea",             # Event number 2
          "Dizziness",
          "Cough",
          "Disorientation",
          "Diarrhea"),
  start_date_2 = as.Date("2017-02-02") + 0:4,
  stop_date_2 = as.Date("2017-02-03") + 0:4,
  severity_2 = 2,
  related_to_drug_2 = 1,
  event_3 = c("Eczema",             # Event number 3
          "Sinusitis",
          "Abdominal discomfort",
          "Muscle spasms",
          "Nasopharyngitis"),
  start_date_3 = as.Date("2017-03-02") + 0:4,
  stop_date_3 = as.Date("2017-03-03") + 0:4,
  severity_3 = 2,
  related_to_drug_3 = 1
)
df

# A tibble: 5 × 16
  subjid               event_1 start_date_1 stop_date_1 severity_1 related_to_drug_1        event_2 start_date_2 stop_date_2 severity_2 related_to_drug_2              event_3
   <int>                 <chr>       <date>      <date>      <dbl>             <dbl>          <chr>       <date>      <date>      <dbl>             <dbl>                <chr>
1      1           Watery eyes   2017-01-02  2017-01-03          1                 0         Nausea   2017-02-02  2017-02-03          2                 1               Eczema
2      2           Sore throat   2017-01-03  2017-01-04          1                 0      Dizziness   2017-02-03  2017-02-04          2                 1            Sinusitis
3      3              Vomiting   2017-01-04  2017-01-05          1                 0          Cough   2017-02-04  2017-02-05          2                 1 Abdominal discomfort
4      4 Gastroenteritis viral   2017-01-05  2017-01-06          1                 0 Disorientation   2017-02-05  2017-02-06          2                 1        Muscle spasms
5      5             Dry Mouth   2017-01-06  2017-01-07          1                 0       Diarrhea   2017-02-06  2017-02-07          2                 1      Nasopharyngitis
# ... with 4 more variables: start_date_3 <date>, stop_date_3 <date>, severity_3 <dbl>, related_to_drug_3 <dbl>

However, there are many more rows of data and over 100 "events"/series of columns. The data frame consists of a row for each subject, with adverse events, and their associated attributes, listed in columns that are named with an underscore to indicate which event they belong to. I would like to use tidyr to gather these events into a tibble like such:

# A tibble: 15 × 7
   subjid event_number                 event start_date  stop_date severity related_to_drug
    <int>        <int>                 <chr>     <date>     <date>    <int>                <int>
1       1            1           Watery eyes 2017-01-02 2017-01-03        1                    0
2       2            1           Sore throat 2017-01-03 2017-01-04        1                    0
3       3            1              Vomiting 2017-01-04 2017-01-05        1                    0
4       4            1 Gastroenteritis viral 2017-01-05 2017-01-06        1                    0
5       5            1             Dry Mouth 2017-01-06 2017-01-07        1                    0
6       1            2                Nausea 2017-02-02 2017-02-03        2                    1
7       2            2             Dizziness 2017-02-03 2017-02-04        2                    1
8       3            2                 Cough 2017-02-04 2017-02-05        2                    1
9       4            2        Disorientation 2017-02-05 2017-02-06        2                    1
10      5            2              Diarrhea 2017-02-06 2017-02-07        2                    1
11      1            3                Eczema 2017-03-02 2017-03-03        3                    2
12      2            3             Sinusitis 2017-03-03 2017-03-04        3                    2
13      3            3  Abdominal discomfort 2017-03-04 2017-03-05        3                    2
14      4            3         Muscle spasms 2017-03-05 2017-03-06        3                    2
15      5            3       Nasopharyngitis 2017-03-06 2017-03-07        3                    2

This would have one row for each adverse event, and columns identifying attributes for that specific event.

jsly
  • 3
  • 2

2 Answers2

1

You can do this with the following code:

df %>%
  gather(Var,Val,-1) %>%
  mutate(Var = gsub('_(\\d+)','!!\\1',Var)) %>% 
  separate(Var,c('Var','Event'),sep = '!!') %>%
  spread(Var,Val)

Unfortunately this will destroy the class of the columns, and that will need to be fixed, which you can do with a call to mutate.

(Also note that the mutate line after gather is just because you have '_' in your col names and I want to split out the event numbers.)

Bishops_Guest
  • 1,222
  • 11
  • 13
  • Thanks; this is exactly what I needed! I added: %>% mutate(start_date = as_date(as.numeric(start_date))) %>% mutate(stop_date = as_date(as.numeric(stop_date))) and it works just like I needed! Thanks again!!!! – jsly Mar 20 '17 at 14:35
  • @jsly dplyr tip: you can do multiple changes with a single call to `mutate`. For example: `mutate(A = as_date(A), B=as_date(B))`. With proper indentation this can be less messy. (or a lot worse) – Bishops_Guest Mar 20 '17 at 16:42
1

A more convoluted way to do it, but that, quite importantly, preserves classes.
Start with column names, split them according to event number, then make one dataframe per event, and finally stack them vertically:

names(df) %>% 
  setdiff("subjid") %>% 
  split(sub(".*_(\\d+)$", "\\1", x = .)) %>% 
  map(~ select_(.data = df, .dots = c("subjid", .x))) %>% 
  map(~ setNames(.x, nm = sub("(.*)_\\d+$", "\\1", x = names(.x)))) %>%
  map2(names(.), ~ mutate(.x, event_number = .y)) %>% 
  bind_rows() %>% 
  select(subjid, event_number, everything())
# # A tibble: 15 × 7
#    subjid event_number                 event start_date  stop_date severity related_to_drug
#     <int>        <chr>                 <chr>     <date>     <date>    <dbl>           <dbl>
# 1       1            1           Watery eyes 2017-01-02 2017-01-03        1               0
# 2       2            1           Sore throat 2017-01-03 2017-01-04        1               0
# 3       3            1              Vomiting 2017-01-04 2017-01-05        1               0
# 4       4            1 Gastroenteritis viral 2017-01-05 2017-01-06        1               0
# 5       5            1             Dry Mouth 2017-01-06 2017-01-07        1               0
# 6       1            2                Nausea 2017-02-02 2017-02-03        2               1
# 7       2            2             Dizziness 2017-02-03 2017-02-04        2               1
# 8       3            2                 Cough 2017-02-04 2017-02-05        2               1
# 9       4            2        Disorientation 2017-02-05 2017-02-06        2               1
# 10      5            2              Diarrhea 2017-02-06 2017-02-07        2               1
# 11      1            3                Eczema 2017-03-02 2017-03-03        2               1
# 12      2            3             Sinusitis 2017-03-03 2017-03-04        2               1
# 13      3            3  Abdominal discomfort 2017-03-04 2017-03-05        2               1
# 14      4            3         Muscle spasms 2017-03-05 2017-03-06        2               1
# 15      5            3       Nasopharyngitis 2017-03-06 2017-03-07        2               1
Aurèle
  • 10,219
  • 1
  • 26
  • 43