I am looking to collapse rows of data by group based on specified time difference (i.e. 60 mins) between timestamps and/or until a particular condition is met within the data. Here is a mock data frame of what I am working with:-
UserId<-c("2203af12ce3e", "2203af12ce3e", "2203af12ce3e", "2203af12ce3e",
"3b9c32d4c700", "3b9c32d4c700", "3b9c32d4c700", "3b9c32d4c700",
"3b9c32d4c700", "68b25fd3ca78", "68b25fd3ca78", "68b25fd3ca78",
"68b25fd3ca78", "68b25fd3ca78", "fbbd0e13e61b", "fbbd0e13e61b",
"fbbd0e13e61b", "808dcbe0cad2", "6f2020841f9e", "faf849c35400",
"02002044e512", "02002044e512", "02002044e512", "02002044e512",
"ff10b8560791", "ff10b8560791", "ff10b8560791", "ff10b8560791",
"ff10b8560791", "ff10b8560791", "ff10b8560791", "ff10b8560791",
"ff10b8560791", "ff10b8560791", "ff10b8560791", "ff10b8560791",
"ff10b8560791", "ff10b8560791", "ff10b8560791", "ff10b8560791")
OrigTime<-structure(c(1546313039, 1546313070, 1546313116, 1546344887, 1546366783,
1546371206, 1546378029, 1546380713, 1546381727, 1546317095, 1546317335,
1546319551, 1546347453, 1546355351, 1546381815, 1546381844, 1546381873,
1546355462, 1546370527, 1546354015, 1546310854, 1546311154, 1546311218,
1546311772, 1546308212, 1546308230, 1546308248, 1546308268, 1546308298,
1546308317, 1546330603, 1546330620, 1546330637, 1546330650, 1546330676,
1546334052, 1546334129, 1546334523, 1546334867, 1546334902), class = c("POSIXct",
"POSIXt"), tzone = "UTC")
LastTime<-structure(c(1546313039, 1546313070, 1546313448, 1546344887, 1546366783,
1546371206, 1546378029, 1546380713, 1546381727, 1546317095, 1546318123,
1546319551, 1546347453, 1546355351, 1546381815, 1546381844, 1546381873,
1546355462, 1546370527, 1546354015, 1546311063, 1546311154, 1546311746,
1546313128, 1546308212, 1546308230, 1546308248, 1546308268, 1546308298,
1546310346, 1546330603, 1546330620, 1546330637, 1546330650, 1546330676,
1546334052, 1546334129, 1546334523, 1546334867, 1546334902), class = c("POSIXct",
"POSIXt"), tzone = "UTC")
calls<-c(1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1,
4, 1, 3, 4, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
Status<-c("Engaged", "Engaged", "Abandoned", "Abandoned", "Answered",
"Answered", "Answered", "Answered", "Answered", "Engaged", "Engaged",
"Engaged", "Answered", "Answered", "Engaged", "Engaged", "Answered",
"Answered", "Answered", "Answered", "Engaged", "Engaged", "Engaged",
"Abandoned", "Engaged", "Engaged", "Engaged", "Engaged", "Engaged",
"Answered", "Engaged", "Engaged", "Engaged", "Engaged", "Answered",
"Answered", "Answered", "Answered", "Answered", "Answered")
Successful<-c("N", "N", "N", "N", "Y", "Y", "Y", "Y", "Y", "N", "N", "N",
"Y", "Y", "N", "N", "Y", "Y", "Y", "Y", "N", "N", "N", "N", "N",
"N", "N", "N", "N", "Y", "N", "N", "N", "N", "Y", "Y", "Y", "Y",
"Y", "Y")
df<-as.data.frame(cbind(UserId,OrigTime,LastTime,calls,Status,Successful))
df$OrigTime<-as.numeric(df$OrigTime)
df$OrigTime<-as.POSIXct(df$OrigTime, origin = "1970-01-01 00:00:00")
df$LastTime<-as.numeric(df$LastTime)
df$LastTime<-as.POSIXct(df$LastTime, origin = "1970-01-01 00:00:00")
And here is a few examples of what I am trying to achieve:-
Example 1
Here is UserId = "2203af12ce3e", as it is in df:-
Before
df%>%filter(UserId=="2203af12ce3e")
OrigTime LastTime calls Status Successful UserId
1 2019-01-01 03:23:59 2019-01-01 03:23:59 1 Engaged N 2203af12ce3e
2 2019-01-01 03:24:30 2019-01-01 03:24:30 1 Engaged N 2203af12ce3e
3 2019-01-01 03:25:16 2019-01-01 03:30:48 3 Abandoned N 2203af12ce3e
4 2019-01-01 12:14:47 2019-01-01 12:14:47 1 Abandoned N 2203af12ce3e
I want to collapse the rows which fall within 60 mins between the first the first LastTime
and subsequent OrigTime
, while tallying up the calls
and showing the last Status
. So in this case, I would collapse row 1-3 into a single row, with the calls tallied up and the last status as "Abandoned", which would look like this:-
Outcome
OrigTime LastTime calls Status Successful UserId
1 2019-01-01 03:23:59 2019-01-01 03:30:48 5 Abandoned N 2203af12ce3e
2 2019-01-01 12:14:47 2019-01-01 12:14:47 1 Abandoned N 2203af12ce3e
Notice that the OrigTime in row 1 and the LastTime of row 3 in the Before segment are now in one row, the calls from rows 1-3 are tallied up and takes the last Status from row 3 because rows 1-3 all occur within 60 minute window. Row 4 is left as it is because it's OrigTime is >60 mins after the previous LastTime.
I want to give one more example just to be clear on what I am trying to do.
Example 2
If there are lots of "Abandoned" and "Engaged" factors before an "Answered" factor comes along and all occurring within a 60 minute window, then I want these rows to be collapsed with the final status of "Answered". Here is example 2:-
Before
df%>%filter(UserId=="ff10b8560791")
OrigTime LastTime calls Status Successful UserId
1 2019-01-01 02:03:32 2019-01-01 02:03:32 1 Engaged N ff10b8560791
2 2019-01-01 02:03:50 2019-01-01 02:03:50 1 Engaged N ff10b8560791
3 2019-01-01 02:04:08 2019-01-01 02:04:08 1 Engaged N ff10b8560791
4 2019-01-01 02:04:28 2019-01-01 02:04:28 1 Engaged N ff10b8560791
5 2019-01-01 02:04:58 2019-01-01 02:04:58 1 Engaged N ff10b8560791
6 2019-01-01 02:05:17 2019-01-01 02:39:06 3 Answered Y ff10b8560791
7 2019-01-01 08:16:43 2019-01-01 08:16:43 1 Engaged N ff10b8560791
8 2019-01-01 08:17:00 2019-01-01 08:17:00 1 Engaged N ff10b8560791
9 2019-01-01 08:17:17 2019-01-01 08:17:17 1 Engaged N ff10b8560791
10 2019-01-01 08:17:30 2019-01-01 08:17:30 1 Engaged N ff10b8560791
11 2019-01-01 08:17:56 2019-01-01 08:17:56 1 Answered Y ff10b8560791
12 2019-01-01 09:14:12 2019-01-01 09:14:12 1 Answered Y ff10b8560791
13 2019-01-01 09:15:29 2019-01-01 09:15:29 1 Answered Y ff10b8560791
14 2019-01-01 09:22:03 2019-01-01 09:22:03 1 Answered Y ff10b8560791
15 2019-01-01 09:27:47 2019-01-01 09:27:47 1 Answered Y ff10b8560791
16 2019-01-01 09:28:22 2019-01-01 09:28:22 1 Answered Y ff10b8560791
And as same as the previous example, I want to collapse rows that occur within 60 mins of the OrigTime and previous LastTime, tally up the calls column and give the last status of the row. However, here is the extra condition: if the row has an "Answered" status and the previous rows all occur within 60 mins of that "Answered" row, then I want to stop there and tally up the previous rows if they occur within 60 minute window and have the final status as "Answered". Here is the outcome for this example:-
Outcome
OrigTime LastTime calls Status Successful UserId
1 2019-01-01 02:03:32 2019-01-01 02:39:06 8 Answered Y ff10b8560791
2 2019-01-01 08:16:43 2019-01-01 08:17:56 5 Answered Y ff10b8560791
3 2019-01-01 09:14:12 2019-01-01 09:14:12 1 Answered Y ff10b8560791
4 2019-01-01 09:15:29 2019-01-01 09:15:29 1 Answered Y ff10b8560791
5 2019-01-01 09:22:03 2019-01-01 09:22:03 1 Answered Y ff10b8560791
6 2019-01-01 09:27:47 2019-01-01 09:27:47 1 Answered Y ff10b8560791
7 2019-01-01 09:28:22 2019-01-01 09:28:22 1 Answered Y ff10b8560791
So in this case, rows 1-6 in the Before segment have been collapsed because they all occur within 60 minutes but stopped when the status was "Answered"; happens again for rows 7-11 and for the rest of the rows with the "Answered" status, this stays the same. I hope the Outcome segments that I have provided makes it clear what I am trying to do. Any help will be greatly appreciated :)