I have a dataset similar to this:
library(tidyverse)
df <- tibble(
subjid = 1:5,
event_1 = c("Watery eyes", # Event number 1
"Sore throat",
"Vomiting",
"Gastroenteritis viral",
"Dry Mouth"),
start_date_1 = as.Date("2017-01-02") + 0:4,
stop_date_1 = as.Date("2017-01-03") + 0:4,
severity_1 = 1,
related_to_drug_1 = 0,
event_2 = c("Nausea", # Event number 2
"Dizziness",
"Cough",
"Disorientation",
"Diarrhea"),
start_date_2 = as.Date("2017-02-02") + 0:4,
stop_date_2 = as.Date("2017-02-03") + 0:4,
severity_2 = 2,
related_to_drug_2 = 1,
event_3 = c("Eczema", # Event number 3
"Sinusitis",
"Abdominal discomfort",
"Muscle spasms",
"Nasopharyngitis"),
start_date_3 = as.Date("2017-03-02") + 0:4,
stop_date_3 = as.Date("2017-03-03") + 0:4,
severity_3 = 2,
related_to_drug_3 = 1
)
df
# A tibble: 5 × 16
subjid event_1 start_date_1 stop_date_1 severity_1 related_to_drug_1 event_2 start_date_2 stop_date_2 severity_2 related_to_drug_2 event_3
<int> <chr> <date> <date> <dbl> <dbl> <chr> <date> <date> <dbl> <dbl> <chr>
1 1 Watery eyes 2017-01-02 2017-01-03 1 0 Nausea 2017-02-02 2017-02-03 2 1 Eczema
2 2 Sore throat 2017-01-03 2017-01-04 1 0 Dizziness 2017-02-03 2017-02-04 2 1 Sinusitis
3 3 Vomiting 2017-01-04 2017-01-05 1 0 Cough 2017-02-04 2017-02-05 2 1 Abdominal discomfort
4 4 Gastroenteritis viral 2017-01-05 2017-01-06 1 0 Disorientation 2017-02-05 2017-02-06 2 1 Muscle spasms
5 5 Dry Mouth 2017-01-06 2017-01-07 1 0 Diarrhea 2017-02-06 2017-02-07 2 1 Nasopharyngitis
# ... with 4 more variables: start_date_3 <date>, stop_date_3 <date>, severity_3 <dbl>, related_to_drug_3 <dbl>
However, there are many more rows of data and over 100 "events"/series of columns. The data frame consists of a row for each subject, with adverse events, and their associated attributes, listed in columns that are named with an underscore to indicate which event they belong to. I would like to use tidyr to gather these events into a tibble like such:
# A tibble: 15 × 7
subjid event_number event start_date stop_date severity related_to_drug
<int> <int> <chr> <date> <date> <int> <int>
1 1 1 Watery eyes 2017-01-02 2017-01-03 1 0
2 2 1 Sore throat 2017-01-03 2017-01-04 1 0
3 3 1 Vomiting 2017-01-04 2017-01-05 1 0
4 4 1 Gastroenteritis viral 2017-01-05 2017-01-06 1 0
5 5 1 Dry Mouth 2017-01-06 2017-01-07 1 0
6 1 2 Nausea 2017-02-02 2017-02-03 2 1
7 2 2 Dizziness 2017-02-03 2017-02-04 2 1
8 3 2 Cough 2017-02-04 2017-02-05 2 1
9 4 2 Disorientation 2017-02-05 2017-02-06 2 1
10 5 2 Diarrhea 2017-02-06 2017-02-07 2 1
11 1 3 Eczema 2017-03-02 2017-03-03 3 2
12 2 3 Sinusitis 2017-03-03 2017-03-04 3 2
13 3 3 Abdominal discomfort 2017-03-04 2017-03-05 3 2
14 4 3 Muscle spasms 2017-03-05 2017-03-06 3 2
15 5 3 Nasopharyngitis 2017-03-06 2017-03-07 3 2
This would have one row for each adverse event, and columns identifying attributes for that specific event.