how to identify time overlap in r

Question

I have a dataframe (df) with admission and discharge dates of patients, with 4 columns:

ID, admitDate (as date), dcDate (as date), los (length of stay in days).

$ admitDate  : Date, format: "2009-09-19" "2010-01-24" "2010-09-30" ...
$ dcDate     : Date, format: "2009-09-23" "2010-01-27" "2010-10-04" ...
$ los        : num  4 3 4 25 6 3 6 2 2 3 ...

I need to be able to tell at any given time how many patients (and which patients) were admitted. That is, I think I need to find out the overlap between the patients' los. Here is how I'm defining overlap: (df$admitDate[x] <= df$disDate[y]) & (df$admitDate[y] <= df$disDate[x])

Any help is much appreciated.

Here is the output of dput for the first 20 patients:

> dput(head(df,20))
structure(list(Unit.Number = c(2013459L, 2013459L, 2047815L, 
1362858L, 1331174L, 2068040L, 1363711L, 2175972L, 2036695L, 1426614L, 
1403126L, 2083126L, 1334063L, 1349385L, 1404482L, 2175545L, 1296600L, 
1293220L, 1336768L, 2148401L), admitDate = structure(c(14506, 
14633, 14882, 15172, 14945, 15632, 15482, 15601, 16096, 15843, 
16013, 15548, 15436, 15605, 16115, 15597, 15111, 15050, 15500, 
15896), class = "Date"), dcDate = structure(c(14510, 14636, 14886, 
15197, 14951, 15635, 15488, 15603, 16098, 15846, 16016, 15552, 
15438, 15606, 16118, 15598, 15113, 15058, 15501, 15915), class = "Date"), 
los = c(4, 3, 4, 25, 6, 3, 6, 2, 2, 3, 3, 4, 2, 1, 3, 1, 
2, 8, 1, 19)), .Names = c("Unit.Number", "admitDate", "dcDate", 
"los"), row.names = c(NA, 20L), class = "data.frame")

First, I tried the code suggested by G. Grothendieck:

days <- seq(min(df$admitDate), max(df$dcDate), "day")
no.patients <- data.frame(
  Date = days, 
  Num = sapply(days, function(d) sum(d >= df$admitDate & d <= df$dcDate)),
  Patients = sapply(days, function(d)
        toString(df$Unit.Number[d >= df$admitDate & d <= df$dcDate]))
)

And here is what happened:

> days <- seq(min(df$admitDate), max(df$dcDate), "day")
Error in seq.int(0, to0 - from, by) : 'to' cannot be NA, NaN or infinite
> no.patients <- data.frame(Date = d, 
+                           Num = sapply(days, function(d) sum(d >= df$admitDate & d <=         df$dcDate)))
Error in data.frame(Date = d, Num = sapply(days, function(d) sum(d >=  : 
object 'd' not found

Then, I thought maybe I need to get rid of NA's. So here is what I did:

> df <- df[rowSums(is.na(df)) < 0, ]

And tried again. Here is what I got:

> days <- seq(min(df$admitDate), max(df$dcDate), "day")
Error in seq.int(0, to0 - from, by) : 'to' cannot be NA, NaN or infinite
In addition: Warning messages:
1: In min.default(numeric(0), na.rm = FALSE) :
no non-missing arguments to min; returning Inf
2: In max.default(numeric(0), na.rm = FALSE) :
no non-missing arguments to max; returning -Inf
> no.patients <- data.frame(Date = d, 
+                           Num = sapply(days, function(d) sum(d >= df$admitDate & d <=   df$dcDate)))
Error in data.frame(Date = d, Num = sapply(days, function(d) sum(d >=  : 
object 'd' not found

Please display enough data using `dput` to use as an example. — G. Grothendieck, Apr 14 '14 at 02:45
when I try to cut and paste from df, it looks unintelligible, everything follows each other, not in rows and columns. As you can tell, I'm pretty novice in all of this. — user3399918, Apr 14 '14 at 03:23
The purpose of `dput` is so that those who answer can simply copy your output and paste it back into their session to exactly reproduce it. See: http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example — G. Grothendieck, Apr 14 '14 at 03:25
Thanks for advice. I've added the dput for the first 20 patients. — user3399918, Apr 14 '14 at 03:44

G. Grothendieck · Answer 1 · 2014-04-14T04:54:06.593

1

Try this:

days <- seq(min(df$admitDate), max(df$dcDate), "day")
no.patients <- data.frame(
      Date = days, 
      Num = sapply(days, function(d) sum(d >= df$admitDate & d <= df$dcDate)),
      Patients = sapply(days, function(d)
            toString(df$Unit.Number[d >= df$admitDate & d <= df$dcDate]))
)

giving:

> head(no.patients)
        Date Num Patients
1 2009-09-19   1  2013459
2 2009-09-20   1  2013459
3 2009-09-21   1  2013459
4 2009-09-22   1  2013459
5 2009-09-23   1  2013459
6 2009-09-24   0

ADDED patient list to each row. Fixed case of df.

edited Apr 14 '14 at 04:54

answered Apr 14 '14 at 04:10

G. Grothendieck

211,268
15
177
297

I've tried it. See above (the original question) to read the error messages I got. – user3399918 Apr 14 '14 at 04:28
I had updated the answer at one point and I think you grabbed it before my revisions. I have added the first few lines of output to show that it does work on the sample data provided. If you still get errors then you will need to provide a reproducible example that shows them. – G. Grothendieck Apr 14 '14 at 04:54
After removing NA's, it works perfectly fine and the result -plot(no.patient$Date, no.patient$Num) - is identical to the suggestion made bellow. Many tx. – user3399918 Apr 15 '14 at 05:26

score 0 · Accepted Answer · answered Apr 14 '14 at 12:52

Here is another way. This is a process that will create the size of the queue based on entry/exit times and can be used in this case to compute the number of patients:

df <- structure(list(Unit.Number = c(2013459L, 2013459L, 2047815L, 
1362858L, 1331174L, 2068040L, 1363711L, 2175972L, 2036695L, 1426614L, 
1403126L, 2083126L, 1334063L, 1349385L, 1404482L, 2175545L, 1296600L, 
1293220L, 1336768L, 2148401L), admitDate = structure(c(14506, 
14633, 14882, 15172, 14945, 15632, 15482, 15601, 16096, 15843, 
16013, 15548, 15436, 15605, 16115, 15597, 15111, 15050, 15500, 
15896), class = "Date"), dcDate = structure(c(14510, 14636, 14886, 
15197, 14951, 15635, 15488, 15603, 16098, 15846, 16016, 15552, 
15438, 15606, 16118, 15598, 15113, 15058, 15501, 15915), class = "Date"), 
los = c(4, 3, 4, 25, 6, 3, 6, 2, 2, 3, 3, 4, 2, 1, 3, 1, 
2, 8, 1, 19)), .Names = c("Unit.Number", "admitDate", "dcDate", 
"los"), row.names = c(NA, 20L), class = "data.frame")

# create dataframe for computing the size of the queue (concurrent patients)
x <- data.frame(date = c(df$admitDate, df$dcDate)
            , op = c(rep(1, nrow(df)), rep(-1, nrow(df)))
            , Unit.Number = c(df$Unit.Number, df$Unit.Number)
            )
# sort and calculate concurrent patients
x <- x[order(x$date), ]  # sort in time order
x$cum <- cumsum(x$op)

# 'x' will have the 'cum' equal to the number of patients concurrently.
# for 'op' == 1, you have the patient ID and 'cum' will be the number of
# patients at that time.

plot(x$date, x$cum, type = 's')

This is what the first part of 'x' looks like:

> head(x,10)
         date op Unit.Number cum
1  2009-09-19  1     2013459   1
21 2009-09-23 -1     2013459   0
2  2010-01-24  1     2013459   1
22 2010-01-27 -1     2013459   0
3  2010-09-30  1     2047815   1
23 2010-10-04 -1     2047815   0
5  2010-12-02  1     1331174   1
25 2010-12-08 -1     1331174   0
18 2011-03-17  1     1293220   1
38 2011-03-25 -1     1293220   0
>

This worked like a charm. Many thanks. Now I'm going to study the codes, hopefully to understand and learn from it. — user3399918, Apr 15 '14 at 04:17

how to identify time overlap in r

2 Answers2

Linked