Using shift with mutate and case_when in dplyr in R: not working as expected

Question

My data shows changes in pupil size. When the value is -1 is means there is a blink. I have written some code to detect blink onsets and offsets but I'm having some issues using shift function.

Sample of my data:

library(dplyr)
DataFrame<-structure(list(Pupil_Avg = c(7.174, 6.6910005, 6.518, 2.461, 
                                    2.182, 1.942, 1.942, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 
                                    -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 
                                    -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 
                                    -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 
                                    -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 
                                    -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 
                                    -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 
                                    -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 
                                    -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 
                                    -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 
                                    -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 
                                    -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 
                                    -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 
                                    -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 
                                    -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 
                                    -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 
                                    -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 
                                    -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 
                                    -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 
                                    -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 
                                    -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 
                                    -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 
                                    -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 
                                    -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 
                                    -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 
                                    -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 
                                    -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 
                                    -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 
                                    -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 
                                    -1, -1, -1, -1, -1, -1, -1, 1.487, -1, -1, -1, -1, 2.202, 2.202, 
                                    2.281, 2.344)), row.names = c(NA, -481L), class = c("tbl_df", 
                                                                                        "tbl", "data.frame"))

Operations I perform on that data:

DataFrame$BLINK_IDENTIFICATION <- ""

# set an arbitrary decrease value in pupil size
Pupil_Constriction = 3

DataFrame<-DataFrame %>%
  # removed the columns below as they're not necessary to work on the problem.
  #group_by(StimulusName, Name, StimuliBlock) %>%
  # if there is a reduction in pupil size of the value in Pupil_Constriction in the current row add "Blink Onset"
  mutate(BLINK_IDENTIFICATION = case_when((DataFrame$Pupil_Avg <= (shift(DataFrame$Pupil_Avg, 1L, type="lag")-Pupil_Constriction)) ~ "Blink Onset",
                                          # The next line of code is supposed to check the last row and if "Blink Onset" is contained in the previous row in BLINK_IDENTIFICATION AND the current Pupil_Avg value is NOT equal to -1, then the write "Blink Onset" to the current BLINK_IDENTIFICATION row 
                                          ( (shift(DataFrame$BLINK_IDENTIFICATION, 1L, type="lag")=="Blink Onset") & (DataFrame$Pupil_Avg != -1) ) ~ "Blink Onset",
# the next line of code write "Blink Offset" if previous row was -1, current is greater than -1, and the next row is NOT -1
                                          ( (shift(DataFrame$Pupil_Avg, 1L, type="lag")==-1) & (DataFrame$Pupil_Avg >-1) & (shift(DataFrame$Pupil_Avg, 1L, type="lead")!=-1)) ~ "Blink Offset",
# the next line write "Eye Closed" if current row equals -1
                                          (DataFrame$Pupil_Avg==-1) ~ "Eye Closed"))

I'm trying to detect blink onsets and offsets based on the changes in values in Pupil_Avg. My main issue is with the line of code ( (shift(DataFrame$BLINK_IDENTIFICATION, 1L, type="lag")=="Blink Onset") & (DataFrame$Pupil_Avg != -1) ) ~ "Blink Onset",

This line is supposed to check the previous row value of BLINK_IDENTIFICATION and if it is equal to "Blink Onset" AND the current value of Pupil_Avg is NOT equal to -1: write "Blink Onset" to BLINK_IDENTIFICATION in the current row.

When you run the code you'll see this isn't working. I guess I'm not understanding how to use shift function properly, as I think the logic is sound. Of course I might be mistaken.

Thank you for your time.

`shift` is from data.table. dplyr also has a `lag` and a `lead` function. Why are you mixing the two packages? — phiver, Sep 16 '18 at 11:40
@phiver I didn't know that. Can you suggest a version of the code that will work? — Docconcoct, Sep 16 '18 at 11:44
@mt1022 I create BLINK_IDENTIFICATION before I enter the pipe code. I run this line `DataFrame$BLINK_IDENTIFICATION — Docconcoct, Sep 16 '18 at 11:51
@Docconcoct, sorry. I missed that line. Will delete this message shortly. — mt1022, Sep 16 '18 at 13:34

score 2 · Accepted Answer · answered Sep 16 '18 at 13:32

3 options of your case statement can be done in with dplyr:

DataFrame <- DataFrame %>% 
  mutate(BLINK_IDENTIFICATION = case_when(Pupil_Avg == -1 ~ "Eye Closed",
                                          Pupil_Avg <= lag(Pupil_Avg) - Pupil_Constriction ~ "Blink Onset",
                                          lag(Pupil_Avg) == -1 & Pupil_Avg > -1 & lead(Pupil_Avg) != -1 ~ "Blink Offset",
                                          TRUE ~ ""))

# A tibble: 481 x 2
   Pupil_Avg BLINK_IDENTIFICATION
       <dbl> <chr>               
 1      7.17 ""                  
 2      6.69 ""                  
 3      6.52 ""                  
 4      2.46 Blink Onset         
 5      2.18 ""                  
 6      1.94 ""                  
 7      1.94 ""                  
 8     -1    Eye Closed          
 9     -1    Eye Closed          
10     -1    Eye Closed          
# ... with 471 more rows

But the condition lag(BLINK_IDENTIFICATION) == "Blink Onset" & Pupil_Avg != -1 is recursively dependent on the previous value (see rows 5, 6, 7). For this you need a loop.

for(i in 2:nrow(DataFrame)) {
  DataFrame$BLINK_IDENTIFICATION[i] = ifelse(DataFrame$BLINK_IDENTIFICATION[i-1]  == "Blink Onset" & DataFrame$Pupil_Avg[i] != -1, "Blink Onset", DataFrame$BLINK_IDENTIFICATION[i])
}

DataFrame
# A tibble: 481 x 2
   Pupil_Avg BLINK_IDENTIFICATION
       <dbl> <chr>               
 1      7.17 ""                  
 2      6.69 ""                  
 3      6.52 ""                  
 4      2.46 Blink Onset         
 5      2.18 Blink Onset         
 6      1.94 Blink Onset         
 7      1.94 Blink Onset         
 8     -1    Eye Closed          
 9     -1    Eye Closed          
10     -1    Eye Closed     

tail(DataFrame, 10)
# A tibble: 10 x 2
   Pupil_Avg BLINK_IDENTIFICATION
       <dbl> <chr>               
 1     -1    Eye Closed          
 2      1.49 ""                  
 3     -1    Eye Closed          
 4     -1    Eye Closed          
 5     -1    Eye Closed          
 6     -1    Eye Closed          
 7      2.20 Blink Offset        
 8      2.20 ""                  
 9      2.28 ""                  
10      2.34 ""

But you could also do everything in a for loop. As you can see at the end of the data are some not filled in gapes. There you need to define what you want to do with them. Leave them as is, or fill them in.

Thanks. I'll check out your solution in the morning. I have been avoiding using for loops since I'm working with a massive data set in the order of millions of rows. I figured %>% was quicker. Thanks for your help. — Docconcoct, Sep 16 '18 at 14:12

Using shift with mutate and case_when in dplyr in R: not working as expected

1 Answers1