Intervals to non-overlapping subintervals

Question

I'm trying to divide a list of intervals into non-overlapping subintervals. For example, if my input were

[1,5],[4,9],[6,12],[11,17]

I would want the output to be

[1,4], [4,5], [5,6], [6,9], [9,11], [11,12], [12,17]

I want the output to be a list of intervals that has the same union as the original list of intervals but where every overlapping sub-interval of multiple different subintervals is made into a different interval.

My first thought is that I should sort all intervals by their first element, and if there is an overlap I should create a new interval, but I've been having some trouble getting this to work. This seems different in nature than many interval problems, so any suggestion would be great!

So what's the question? What have you tried, what are you having difficulty with? — dcsohl, Jul 14 '14 at 19:45
And why doesn't [6,11] split up into [6, 9],[9,11]? Your "Basically" is not so basic — Ben, Jul 14 '14 at 19:48
To me, `[1,4]` and `[4,5]` overlaps. The "non-overlapping" interval set representing your intput would be the singleton `[1,17]`. — Rerito, Feb 13 '15 at 09:45

score 0 · Answer 1 · edited Jun 20 '20 at 09:12

0

I'm not sure, if I understand correctly, but as long as there are no holes in the list of intervals, the following should work:

make a list of all endpoints of the intervals

1, 5, 4, 9, 6, 12, 11, 17
sort that list

1, 4, 5, 6, 9, 11, 12, 17
create new intervals by always using adjacent numbers as endpoints for the new intervals

[1,4] [4,5] [5,6] [6,9] [9,11] [11,12] [12,17]

If your list of intervals has some holes, you would have to check, whether each of the newly created intervals overlaps with one of the source intervals.

edited Jun 20 '20 at 09:12

Community

1
1

answered Jul 14 '14 at 20:26

Sirko

65,767
19
135
167

Thanks a lot! Really clever response, but unfortunately there are holes. – Stefan Ivanovic Jul 14 '14 at 20:51

score 0 · Accepted Answer · edited Feb 13 '15 at 08:44

0

Sort your original interval end-points. Then every consecutive pair of end-points defines an interval in your output unless the interval is not contained in the union of your original intervals. To determine whether an interval is contained in the union of your original intervals, process end-points from left to right and maintain a count of how many original intervals are currently overlapping the interval between the two endpoints. When you encounter a left-end point of an original interval, you increase the count by 1 and when you encounter a right-end point of an original interval, you decrease the count by 1. If the count is greater than 0 then the current interval between the two end points should be included in your output, otherwise not.

edited Feb 13 '15 at 08:44

Adriaan Koster

14,226
4
41
56

answered Jul 14 '14 at 20:36

user2566092

4,473
12
19

This sounds promising but I think there is a flaw in your explanation somewhere. Can you elaborate with an example following your steps? – Adriaan Koster Feb 12 '15 at 11:18
I added an example myself! It does not follow your steps but I think it is correct and efficient. – Adriaan Koster Feb 12 '15 at 17:19
I moved my example into a separate answer because my method is not exactly the same. – Adriaan Koster Feb 13 '15 at 09:19

Adriaan Koster · Answer 3 · 2015-02-16T09:09:05.877

Inspired by @user2566092's answer I came up with a different method.

First, here is an example with a 'hole':

[1.3]
       [6..9]
         [8...12]

This should result in:

[1.3]
       [6,8]
         [8,9]
           [9..12]

To get this, first determine the set of unique endpoints and order it:

1, 3, 6, 8, 9, 12

Then iterate over all possible sub intervals A-B formed by consecutive pairs:

[1,3]
[3,6]
[6,8]
[8,9]
[9,12]

For each interval A-B try to find an original interval X-Y which intersects with A-B. This is the case if:

(B > X) && (Y > A)

For example:

[A=1, B=3] is included because of [X=1, Y=3] 
[A=3, B=6] is excluded, interval [X=1, Y=3] is excluded because Y=A, 
           the other original intervals are excluded because B<X

EDIT: Here is a Java implementation.

import java.util.*;

public class Main {

    public static void main(final String[] args) {
        determineSubIntervals(new Interval[] { new Interval(1, 5), new Interval(4, 9), new Interval(6, 12), new Interval(11, 17) });
        determineSubIntervals(new Interval[] { new Interval(1, 3), new Interval(6, 9), new Interval(8, 12) });
    }

    private static List<Interval> determineSubIntervals(final Interval[] originals) {
        System.out.println("Originals: " + Arrays.toString(originals));
        final Set<Integer> endpoints = extractOrderedUniqueEndpoints(originals);
        final List<Interval> candidates = createConsecutiveIntervals(endpoints);
        final List<Interval> subIntervals = removeDisjunct(candidates, originals);
        System.out.println("Sub intervals: " + subIntervals);
        return subIntervals;
    }

    private static Set<Integer> extractOrderedUniqueEndpoints(final Interval[] intervals) {
        final Set<Integer> endpoints = new TreeSet<Integer>();
        for (final Interval interval : intervals) {
            endpoints.add(interval.getStart());
            endpoints.add(interval.getEnd());
        }
        return endpoints;
    }

    private static List<Interval> createConsecutiveIntervals(final Set<Integer> endpoints) {
        final List<Interval> intervals = new ArrayList<Interval>();
        final Iterator<Integer> it = endpoints.iterator();
        Integer start = null;
        while (it.hasNext()) {
            final Integer end = it.next();
            if (start != null) {
                final Interval candidate = new Interval(start, end);
                intervals.add(candidate);
            }
            start = end;
        }
        return intervals;
    }

    private static List<Interval> removeDisjunct(final List<Interval> candidates, final Interval[] intervals) {
        final Iterator<Interval> it = candidates.iterator();
        while (it.hasNext()) {
            final Interval candidate = it.next();
            if (isDisjunct(candidate, intervals)) {
                it.remove();
            }
        }
        return candidates;
    }

    private static boolean isDisjunct(final Interval candidate, final Interval[] intervals) {
        final int a = candidate.getStart();
        final int b = candidate.getEnd();
        for (final Interval interval : intervals) {
            final int x = interval.getStart();
            final int y = interval.getEnd();
            if ((b > x) && (y > a)) {
                return false;
            }
        }
        return true;
    }
}

The Interval class:

public class Interval {
    private final int start;
    private final int end;

    public Interval(final int start, final int end) {
        this.start = start;
        this.end = end;
    }

    public int getStart() {
        return start;
    }

    public int getEnd() {
        return end;
    }

    @Override
    public String toString() {
        return String.format("[%s,%s]", start, end);
    }
}

Output:

Originals: [[1,5], [4,9], [6,12], [11,17]]
Sub intervals: [[1,4], [4,5], [5,6], [6,9], [9,11], [11,12], [12,17]]
Originals: [[1,3], [6,9], [8,12]]
Sub intervals: [[1,3], [6,8], [8,9], [9,12]]

score 0 · Answer 4 · answered Nov 29 '16 at 07:03

Intro

Below picture demonstrates our input and desired output. We have 5 intervals (I added one to the original, to cover the holes case).

We want to decompose these intervals (labelled 1..5) into non-overlapping sub-intervals (labelled A..I). So for example interval 1 [1,5] can be decomposed into sub-intervals A [1,4] and B [4,5].

R Solution

First we set up the data. Then we take the unique endpoints and sort them, creating new intervals from the adjacent numbers.

# setup data and start decomposition
input <- c(1,5,4,9,6,12,11,17, 18,20)
intervals <- data.table(matrix(input,ncol=2,byrow=TRUE))
endpoints <- sort(unique(input))
decomp <- data.table(matrix(c(head(endpoints, -1), endpoints[-1]), ncol=2))

Finally, we need to join these new sub-intervals to the originals. This can be used to exclude holes, and also to identify which sub-intervals are needed to construct which primary intervals. Here, a binary-search based interval join is used (foverlaps).

# align decomposition to segs
intervals[, intid := seq_len(length(input)/2)]
decomp[, subid := LETTERS[seq_len(length(endpoints)-1)]]
setkeyv(decomp, c('V1','V2'))
setkeyv(intervals, c('V1','V2'))
relations <- foverlaps(decomp, intervals, type='within', nomatch=0)

Results

From the many-to-many relations table we can see which interval ids (intid) match to which sub-interval ids (subid). For example, intid 1 matches to subid A and subid B. Note that sub-interval H is not present in this table, since it is a hole.

> relations
    V1 V2 intid i.V1 i.V2 subid
 1:  1  5     1    1    4     A
 2:  1  5     1    4    5     B
 3:  4  9     2    4    5     B
 4:  4  9     2    5    6     C
 5:  4  9     2    6    9     D
 6:  6 12     3    6    9     D
 7:  6 12     3    9   11     E
 8:  6 12     3   11   12     F
 9: 11 17     4   11   12     F
10: 11 17     4   12   17     G
11: 18 20     5   18   20     I

Code used for plotting

Note, I'm using the multiplot function.

# problem
p1 <- ggplot(intervals, aes(x=V1,xend=V2,y=intid,yend=intid))+geom_segment()+geom_vline(xintercept=endpoints, linetype=3)+xlab('')+ylab('')+ggtitle('Input')

# solution
p2 <- ggplot(relations)+
  geom_segment(aes(x=i.V1, xend=i.V2, y=intid,yend=intid, color=as.factor(subid)))+
  geom_vline(xintercept=endpoints, linetype=3)+
  geom_text(aes(x=(i.V1+i.V2)/2, y=intid+0.2, label=subid), color='black')+
  geom_segment(data=decomp, aes(x=V1, xend=V2, y=0, yend=0, color=as.factor(subid)))+
  geom_text(data=decomp, aes(x=(V1+V2)/2, y=0+0.2, label=subid), color='black')+
  geom_hline(yintercept=0.5)+guides(color='none')+xlab('')+ylab('')+ggtitle('Output')

multiplot(p1,p2)

Intervals to non-overlapping subintervals

4 Answers4

Intro

R Solution

Results

Code used for plotting