0

I have a large dataset for which I want to create 50 new variables where the values are conditional on values in previous columns, and the name of the variables reflect this fact. To make it more intelligible, here is an example:

df <- tibble("a" = runif(10,1990,2000),
         "event" = 1995) %>%
      mutate("relative_event" = a - event)

Now with this dataset I would like to create dummy variables that code if the specific observation is one year prior to the event, 2 year prior, etc, as well as forward. One clumsy way to do this (which works) is:

df <- df %>%
  mutate("event_b1" = ifelse( (relative_event<=0) & (relative_event > -1),1,0)) %>%
  mutate("event_b2" = ifelse( (relative_event<=-1) & (relative_event > -2),1,0)) %>% #etc with more lagx
  mutate("event_f1" = ifelse( (relative_event>0) & (relative_event < 1),1,0)) %>%
  mutate("event_f2" = ifelse( (relative_event>1) & (relative_event < 2 ),1,0)) #etc with more forward

where b1 is for "one year before" and f2 is for "2 years forward". The result looks like this:

 A tibble: 10 x 7
       a event relative_event event_b1 event_b2 event_f1 event_f2
    <dbl> <dbl>          <dbl>    <dbl>    <dbl>    <dbl>    <dbl>
  1 1993.  1995         -1.94         0        1        0        0
  2 1992.  1995         -2.59         0        0        0        0
  3 2000.  1995          4.75         0        0        0        0
  4 1998.  1995          3.25         0        0        0        0
  5 1991.  1995         -3.88         0        0        0        0
  6 1992.  1995         -3.02         0        0        0        0
  7 1996.  1995          1.08         0        0        0        1
  8 1994.  1995         -1.04         0        1        0        0
  9 1993.  1995         -2.22         0        0        0        0
 10 1995.  1995         -0.302        1        0        0        0

Since I have more than 50 columns to create I would like to know how to do it automatically so that I don't have to copy-paste 49 times and manually change the condition and the variable name. I spent time looking on SO on this thread, this one and on CV as well but I am still clueless. I tried the following code which does not work:

for (i in 0:10) {

  if (i<0) {

    event_bi  <- paste0("event_b",i)
    df <- df %>%
      mutate(get(event_bi) = ifelse((relative_event<=-(i-1)) & (relative_event>-i),1,0)) 

  } 
}

Ideally I'd like to learn how to do it with dplyr but if there is an obvious Base R solution I'm happy to learn it as well.

Thanks!

Jo_
  • 59
  • 5
  • when you run the index `i` from 0:10 - of course your code won't work when your condition is if `(i < 0)`. Moreover - is there a good reason why you want to get your 50 different columns ? To my understanding, you are basically assigning values to different conditions `relative event` - which could be all put in one single column. But I might not have understood your question correclty – tjebo Sep 04 '18 at 23:01

2 Answers2

1

I won't claim that this is the full answer but hopefully this stimulates some other users to comment/post

     # load packages
     pacman::p_load(tibble,dplyr,tidyr)

     # your dataframe
     df <- tibble("a" = runif(10,1990,2000),
                  "event" = 1995) %>%
       mutate("relative_event" = round(a - event),0)

     df$rel3 <- df$relative_event #initialize new column

     for(xx in 1:(length(df$relative_event))) {
       if (df$relative_event[xx] <=0) {
         df$rel3[xx] <- paste0('b',as.character(abs(df$relative_event[xx])))
         } else {
         #add preceding a for "after"
         df$rel3[xx] <- paste0('a',as.character(abs(df$relative_event[xx])))
       }
     }

img1

Then you could convert the values within rel3 into columns in df.

Patrick
  • 643
  • 1
  • 5
  • 16
  • I agree that it's somewhat suggestive to put the values in one column. I would not use a for loop here, but something like mutate with a conditional statement instead. But a solution which comes close to what the OP asked below – tjebo Sep 04 '18 at 23:43
1

Although I prefer a solution with all variables in one column as suggested by @Patrick (although I would use something like %>% mutate(new_col = case_when(etc...)), here a way with for-loop

# I changed your data a tiny bit
df <- tibble("a" = sample(1990:2000, size = 10),  # better to use 'sample' then 'runif' !
             "event" = 1995) %>% mutate("relative_event" = a - event)

Now the actual work

for (i in min(df$relative_event):max(df$relative_event)) {

# the indexing value is your difference in years. So you have to run the index from the lowest difference to the highest. 

if( i < 0 ) {
  df[[paste0('event_b', abs(i))]] <- ifelse(i == df$relative_event, 1, 0)
  } 
  if( i >= 0 ) {
    df[[paste0('event_f', abs(i))]] <- ifelse(i == df$relative_event, 1, 0)
    df
  } 
}  

# A tibble: 10 x 14
       a event relative_event event_b5 event_b4 event_b3 event_b2 event_b1
   <int> <dbl>          <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>
 1  1990  1995             -5        1        0        0        0        0
 2  1992  1995             -3        0        0        1        0        0
 3  1991  1995             -4        0        1        0        0        0
 4  2000  1995              5        0        0        0        0        0
 5  1998  1995              3        0        0        0        0        0
 6  1993  1995             -2        0        0        0        1        0
 7  1996  1995              1        0        0        0        0        0
 8  1997  1995              2        0        0        0        0        0
 9  1994  1995             -1        0        0        0        0        1
10  1999  1995              4        0        0        0        0        0
# ... with 6 more variables: event_f0 <dbl>, event_f1 <dbl>, event_f2 <dbl>,
#   event_f3 <dbl>, event_f4 <dbl>, event_f5 <dbl>

If you don't want to run through every possible difference in years - (this will create 'empty' columns) - you could simply create a vector with unique(df$relative_event) and run i through this vector

tjebo
  • 12,885
  • 4
  • 34
  • 61
  • Thank you Tjebo, I like your method as although it is not fully tidy, it remains very readable – Jo_ Sep 05 '18 at 00:52