Get distinct consecutive date ranges from overlapping date ranges

Question

I need to get a list of date ranges that are NOT overlapping with each other from a list of overlapping dates and get the sum of coins during that overlap. I have tried googling for an example but no luck so far. I might not be using the right key words?

I have a list of overlapping dates

1.1.2018 - 31.1.2018 80
7.1.2018 - 10.1.2018 10
7.1.2018 - 31.1.2018 10
11.1.2018 - 31.1.2018 5
25.1.2018 - 27.1.2018 5
2.2.2018 - 23.2.2018 100

Desired outcome would be

1.1.2018 - 6.7.2018 80 coins
7.1.2018 - 10.1.2018 100 coins
11.1.2018 - 24.1.2018 95 coins
25.1.2018 - 27.1.2018 100 coins
28.1.2018 - 31.1.2018 95 coins
2.2.2018 - 23.2.2018 100 coins

Here is a figure how it should work

|------------------------------|
       |---|
       |-----------------------|
           |-------------------|
                      |---|
                                   |----------------------|
Outcome              
|------|---|----------|---|----|   |----------------------|
   80   100     95     100  95                100

This is my test data

drop table coinsonperiod2;
create table coinsonperiod2(
  id serial,
  startdate date,
  enddate date,
  coins integer,
  userid integer
);
insert into coinsonperiod2 (startdate, enddate, coins,userid) values
  ('2018-01-01','2018-01-31', 80,1)
, ('2018-01-07','2018-01-10', 10,1)
, ('2018-01-07','2018-01-31', 10,1)
, ('2018-01-11','2018-01-31', 5,1)
, ('2018-01-25','2018-01-27', 5,1)
, ('2018-02-02','2018-02-23', 100,2)
, ('2018-01-01','2018-01-31', 80,2)
, ('2018-01-07','2018-01-10', 10,2)
, ('2018-01-07','2018-01-31', 10,2)
, ('2018-01-11','2018-01-31', 5,2)
, ('2018-01-25','2018-01-27', 5,2)
, ('2018-02-02','2018-02-23', 100,3)
;

UPDATE: Actually StephenM's and joops answers do not meet my desired outcome. Both answers show enddate wrong.

When one period ends the next should start next day (or later if there is a gap). In my desired outcome 1.1.2018-6.1.2018 includes the 6th day. There is no gap between 6th and 7th because 7th is included in 7.1.2018-10.1.2018.

UPDATE2: Now I understood what is the difference between open, half open and closed intervals. In joops solution, calculation must be done against half open intervals, but my desired outcome is closed interval. That is why enddate must be reduced to make the outcome as closed interval. Correct me if I am wrong.

I also added userid in the sample data and modified joops solution some more. Here is the query that gives me my desired outcome.

with changes AS (
  SELECT
    userid,
    startdate AS tickdate,
    coins,
    1         AS cover
  FROM coinsonperiod2
  UNION ALL
  -- add 1 day to correct intervals into half open intervals, so the calculation is correct
  SELECT
    userid,
    1 + enddate AS tickdate,
    -1 * coins,
    -1          AS cover
  FROM coinsonperiod2
)
, sumchanges  AS (
    SELECT
      userid,
      tickdate,
      SUM(coins) AS change,
      SUM(cover) AS cover
    FROM changes
    GROUP BY tickdate, userid
)
, aggregated AS (
    SELECT
      userid   AS userid,
      tickdate AS startdate,
      lead(tickdate)
      over www AS enddate,
      sum(change)
      OVER www AS cash,
      sum(cover)
      OVER www AS cover
    FROM sumchanges
    WINDOW www AS (
      partition by userid
      ORDER BY tickdate )
)
-- reduce 1 day from the enddate to make closed interval
SELECT
userid
, startdate
, enddate-1 as enddate
, cash
, cover
FROM aggregated
WHERE cover > 0
ORDER BY userid, startdate
;

Outcome:

Very well presented question with sample data and a visualization. I wish all first-time posters would put that much effort into their questions. — a_horse_with_no_name, Oct 16 '18 at 07:38
I assume a typo for the end date of the first period in your desired outcome (should be 6.1.2018, if I'm interpreting your date formats correctly) — Damien_The_Unbeliever, Oct 16 '18 at 08:35
The trick is to transform the *level triggered* representation into *edge triggered*, accumulate them and transform them back. — joop, Oct 16 '18 at 08:35
The non-overlapping part is actually quite easy: https://rextester.com/JMZC97931 — a_horse_with_no_name, Oct 16 '18 at 09:00
Adding a key-field(`user_id`) does not *really* change the problem. And changing the question, and adding answers to the question-body should not be done. Instead, ask a new question, (and *maybe* answer your own question) — wildplasser, Oct 17 '18 at 12:31
Looking at the title, this should help https://stackoverflow.com/questions/52383964/determine-if-a-range-is-completely-covered-by-a-set-of-ranges — Salman A, Oct 21 '18 at 07:46

joop · Accepted Answer · 2018-10-16T11:21:15.863

The logic is:

at the beginning of an interval add its value to a cumulative sum
at the end of an interval substract its value from this sum
but in order to sweep the dateline, we'll have to collect al the (unique) date/time stamps, either start or stop.

So the point is: convert the data from a series of intervals to a series of (start/stop) events, and aggregate over these.

-- \i tmp.sql

create table coinsonperiod(
  id serial,
  startdate date,
  enddate date,
  coins integer
);
insert into coinsonperiod (startdate, enddate, coins) values
  ('2018-01-01','2018-01-31', 80)
, ('2018-01-07','2018-01-10', 10)
, ('2018-01-07','2018-01-31', 10)
, ('2018-01-11','2018-01-31', 5)
, ('2018-01-25','2018-01-27', 5)
, ('2018-02-02','2018-02-23', 100)
        ;

WITH changes AS (
    SELECT startdate AS tickdate , coins
            , 1 AS cover
    FROM coinsonperiod
    UNION ALL
    -- add 1 day to convert to half-open intervals
    SELECT 1+enddate AS tickdate, -1* coins
            , -1 AS cover
    FROM coinsonperiod
    )
, sumchanges  AS (
        SELECT tickdate, SUM(coins) AS change, SUM(cover) AS cover
        FROM changes
        GROUP BY tickdate
        )
, aggregated AS (
        SELECT
        tickdate AS startdate
        , lead(tickdate) over www AS enddate
        , sum(change) OVER www AS cash
          -- number of covered intervals
        , sum(cover) OVER www AS cover
        FROM sumchanges
        WINDOW www AS (ORDER BY tickdate)
        )
             -- substract one day from enddate to correct back to closed intervals
SELECT startdate, enddate-1 AS enddate, cash, cover
FROM aggregated
WHERE cover > 0
ORDER BY startdate
        ;

-- reduce 1 day from the final result to correct the enddate `SELECT startdate, enddate-1, cash, cover FROM aggregated WHERE cover > 0 ORDER BY startdate ` — Matti Ilvonen, Oct 16 '18 at 10:29

score 1 · Answer 2 · answered Oct 16 '18 at 08:36

1

Looks like I found an ugly one that works

select t1.dt, t1.enddt, sum(coins)
from (
    select distinct cp1.dt, min(cp2.dt) enddt
    from ( select startdate as dt from coinsonperiod union all select enddate as dt from coinsonperiod ) cp1, 
         ( select startdate as dt from coinsonperiod union all select enddate as dt from coinsonperiod ) cp2
    where cp2.dt > cp1.dt
    group by cp1.dt
    order by cp1.dt ) t1, coinsonperiod t2
where t1.dt between t2.startdate and t2.enddate
and t1.enddt between t2.startdate and t2.enddate
group by t1.dt, t1.enddt

Output:

dt         |enddt      |sum |
-----------|-----------|----|
2018-01-01 |2018-01-07 |80  |
2018-01-07 |2018-01-10 |100 |
2018-01-10 |2018-01-11 |90  |
2018-01-11 |2018-01-25 |95  |
2018-01-25 |2018-01-27 |100 |
2018-01-27 |2018-01-31 |95  |
2018-02-02 |2018-02-23 |100 |

Only difference with your output is that I suppose you forgot the interval between 01/10 and 01/11

answered Oct 16 '18 at 08:36

StephaneM

4,364
1
14
32

Yes. I forgot the interval between 01/10 and 01/11. Your solution is almost what I am looking for. enddt should be '-1 day' if there is a startdt that is equal with the enddt. For example: 2018-01-01 |2018-01-07 |80 | 2018-01-07 |2018-01-10 |100 | Should become 2018-01-01 |2018-01-06 |80 | 2018-01-07 |2018-01-10 |100 | – Matti Ilvonen Oct 16 '18 at 08:52
2

@MattiIlvonen - I'd especially recommend with `datetime`, but also often with `date`, it's better to model such intervals as semi-open intervals, with an *inclusive* start date and an *exclusive* end date. It's often easier to reason about them (e.g. all dates appearing in your original data also appear in your de-overlapped intervals and for each of those de-overlapped intervals, each end date corresponds with another interval's start date (except for the very starts/ends) – Damien_The_Unbeliever Oct 16 '18 at 09:00
@Damien_The_Unbeliever: Postgres' `range` is a perfect fit for this and if you convert the startdate/enddate values in the example to a `daterange` Postgres will actually make it an semi-open interval. – a_horse_with_no_name Oct 16 '18 at 09:01

score 0 · Answer 3 · edited Aug 26 '19 at 16:39

0

Okay, so I'm going to help you with the logic bit, the syntax you can find online.

What you can do, is create a temp table and move you data there, then select each row of data and for each column data store the value in a declared variable.

Then simply use a cursor, and select all the data from your source table and again use a normal greater then or less than operator and work your way to get the count.

Simple get row 1 column 1, compare against all other column 1 and column 2 data.

edited Aug 26 '19 at 16:39

marc_s

675,133
158
1,253
1,388

answered Oct 16 '18 at 08:33

Shalvin

188
1
1
12

3

RDBMSs strengths are dealing with sets of data and in being able to reorganize/reorder operations to achieve best results. Two concepts crippled by doing procedural logic with cursors. There are occasions where temp tables and/or cursors are warranted, but you'd usually want to have exhausted set-based approaches first. – Damien_The_Unbeliever Oct 16 '18 at 09:16

score -1 · Answer 4 · answered Oct 17 '18 at 03:59

Correct answer:

This is my test data

drop table coinsonperiod2;
create table coinsonperiod2(
  id serial,
  startdate date,
  enddate date,
  coins integer,
  userid integer
);
insert into coinsonperiod2 (startdate, enddate, coins,userid) values
  ('2018-01-01','2018-01-31', 80,1)
, ('2018-01-07','2018-01-10', 10,1)
, ('2018-01-07','2018-01-31', 10,1)
, ('2018-01-11','2018-01-31', 5,1)
, ('2018-01-25','2018-01-27', 5,1)
, ('2018-02-02','2018-02-23', 100,2)
, ('2018-01-01','2018-01-31', 80,2)
, ('2018-01-07','2018-01-10', 10,2)
, ('2018-01-07','2018-01-31', 10,2)
, ('2018-01-11','2018-01-31', 5,2)
, ('2018-01-25','2018-01-27', 5,2)
, ('2018-02-02','2018-02-23', 100,3)
;

UPDATE2: Now I understood what is the difference between open, half open and closed intervals. In joops solution, calculation must be done against half open intervals, but my desired outcome is closed interval. That is why enddate must be reduced to make the outcome as closed interval. Correct me if I am wrong.

I also added userid in the sample data and modified joops solution some more. Here is the query that gives me my desired outcome.

with changes AS (
  SELECT
    userid,
    startdate AS tickdate,
    coins,
    1         AS cover
  FROM coinsonperiod2
  UNION ALL
  -- add 1 day to correct intervals into half open intervals, so the calculation is correct
  SELECT
    userid,
    1 + enddate AS tickdate,
    -1 * coins,
    -1          AS cover
  FROM coinsonperiod2
)
, sumchanges  AS (
    SELECT
      userid,
      tickdate,
      SUM(coins) AS change,
      SUM(cover) AS cover
    FROM changes
    GROUP BY tickdate, userid
)
, aggregated AS (
    SELECT
      userid   AS userid,
      tickdate AS startdate,
      lead(tickdate)
      over www AS enddate,
      sum(change)
      OVER www AS cash,
      sum(cover)
      OVER www AS cover
    FROM sumchanges
    WINDOW www AS (
      partition by userid
      ORDER BY tickdate )
)
-- reduce 1 day from the enddate to make closed interval
SELECT
userid
, startdate
, enddate-1 as enddate
, cash
, cover
FROM aggregated
WHERE cover > 0
ORDER BY userid, startdate
;

Outcome:

Get distinct consecutive date ranges from overlapping date ranges

4 Answers4

Linked