Clustering rows by ID based on a column value condition multiple times

Question

Some time ago I opened a related question in this post

Suppose I have the following df:

data <- data.frame(ID = c(1,1,1,1,1,1,1,1,1,1,1, 1, 1,1,1,1,1,1,1,1,1,1),
               Obs1 = c(1,1,0,1,0,1,1,0,1,0,0,0,1,1,1,1,1,1,1,1,0,1),
               Control = c(0,3,3,1,12,1,1,1,36,13,1,1,2,24,2,2,48,24,20,21,10,10),
               ClusterObs1 = c(1,1,1,2,2,3,3,3,4,4,4,4,5,5,5,5,5,5,5,5,5,6))

And I want to obtain:

data <- data.frame(ID = c(1,1,1,1,1,1,1,1,1,1,1, 1, 1,1,1,1,1,1,1,1,1,1),
               Obs1 = c(1,1,0,1,0,1,1,0,1,0,0,0,1,1,1,1,1,1,1,1,0,1),
               Control = c(0,3,3,1,12,1,1,1,36,13,1,1,2,24,2,2,48,24,20,21,10,10),
               ClusterObs1 = c(1,1,1,2,2,3,3,3,4,4,4,4,5,5,5,5,5,5,5,5,5,6),
               DesiredResultClusterObs1 = c(1,1,1,2,2,3,3,3,4,4,4,4,5,6,6,6,7,8,9,10,10,11))

The conditions are: If value of 'Control' is higher than 12 and actual 'Obs1' value is equal to 1 and to previous 'Obs1' value, 'DesiredResultClusterObs1' value should add +1 (the main difference with the other question is that consecutive control values above 12 must be considered)

Any idea of how can I achieve the desired result.

score 1 · Accepted Answer · answered Oct 23 '18 at 17:35

I don't know much how to use the whith() and rle() functions, but i've got to a solution to the problem, using ifelse.

data <- data %>% mutate (aux = ifelse (Control>12 & Obs1 == 1 & lag(Obs1) ==1,1,0),
                         DesiredResultClusterObs1 = ClusterObs1 + cumsum(aux))

The aux variable is not necessary, it just help to see step by step. You can do the following too

data <- data %>% mutate (DesiredResultClusterObs1 =
                         ClusterObs1 + 
                         cumsum(ifelse (Control>12 & Obs1 == 1 & lag(Obs1) ==1,1,0)))

Clustering rows by ID based on a column value condition multiple times

1 Answers1