11

We have a data set which is comprised of Connectors and Segments. Each segment has exactly two connectors, but each connector can belong to zero or more segments (i.e. connector 'A' in the left image below has no segments, while connector 'M' has three, M-R, M-L and M-N.)

It is understood that wherever any lines meet or intersect, there will be a connector so we don't have to worry about even/odd rules, overlapping or partially-enclosed polygons, etc. as they don't apply.

In short, we're trying to identify all of the created polygons (the colored shapes in the right image.) I believe this can be completed in two steps.

Polygons

Part 1: Removing superfluous items

Stand-alone connectors (connector 'A' here) can simply be removed since they can't be part of a shape's outline.

Floating end-points referencing a single segment (connectors 'B' and 'E') can also be removed as they too can't be part of a shape's outline. This will also remove their referenced segments (B-C and E-D).

Performing the above recursively will next identify 'C' as an endpoint (since 'B' and B-C were already removed) so it and it's remaining segment C-D can also be removed. On the next recursive pass, connector 'D' and segment D-F will also be removed, etc.

However, I haven't found a good way to identify segment H-I. That said, I think that can be achieved during polygon detection since such segments would only be the result of compound paths and would be traced in both directions during one shape's detection. (More on that below.)

Step 2: Polygon Detection

Each segment can be traced in two directions. For instance, the segment connecting 'O' and 'P' can be either O-P or P-O. Picking a trace-direction of clockwise, O-P would belong to the polygon O-P-Q-N whereas P-O would belong to the polygon P-O-I.

The following logic assumes a trace-direction of clockwise.

Starting from any segment, when tracing around, if you get back to your starting point, you have identified a potential polygon. By keeping a running delta of your heading's angle as you trace around (this is how much your heading turns and is not to be confused with simply adding the angles between segments), when done, if that angle is positive, you've detected a valid polygon. If it's negative, you've detected a 'containing' polygon, meaning one that contains one or more 'valid' polygons. The outer perimeter of the entire shape (or shapes) are all containing polygons.

Consider the case of a square, diagonally divided into two triangles. Tracing each segment twice--once in each direction--you will end up with three potentially-valid polygons: a square and two triangles. The triangles will have a positive angle delta telling you they're valid, but the square's angle delta will be negative telling you that's the containing polygon.

Note: A containing polygon can be equal to a valid polygon too. It will just be 'wound' in the opposite direction.

Consider a simple triangle. The clockwise trace will yield the valid polygon. The second attempt to trace clockwise will actually yield a counter-clockwise trace which will give you a negative angle delta, telling you that's actually the outline of the shape.

Note: You also have to test for other polygons encountered along the way by also testing each point for any previously-encountered point during that shape detection. If you find you've revisited the same point, save off the polygon created since the first encounter of that point, check it's angle. If it's positive, it's a valid polygon (and you're actually currently tracing a containing polygon.) If it's negative, you've detected a containing polygon (in which case you're currently tracing a valid polygon.) Finally, remove all segments on your accumulation stack back to the first instance that point was last encountered and continue on with your detection.

For instance, if you started at 'J' and traced around counter-clockwise, you would go through 'I', 'H', then 'G', then 'F' then you'd be back at 'H'. You just found a polygon H-G-F which has a negative angle so you know it's a containing polygon. Remove those three segments from your stack and continue on. Now you'll again hit 'I'. In this case, you already visited that same segment during this pass, but in the other direction, so simply remove that segment completely from your stack and continue on, next to 'O' then 'N', etc. You'll eventually be back at 'J'.

When a segment has been traced in both directions, it can be considered 'used' and no further processing of that segment is needed. Continue processing all non-used segments. Once all segments have been traced in both directions, you can be sure all polygons--valid and containing--have been found.

Finally, check each containing polygon to see if it falls within any valid polygon. If so, exclude it from that valid polygon creating a compound path. In the example here, containing polygon H-G-F is contained by the valid cyan polygon so it should be excluded. Note there is also a valid H-F-G polygon which is marked in red here.

Anyway, that's what I've come up with, but I'm wondering if there's a better/simpler way. Thoughts?

Mark A. Donohoe
  • 23,825
  • 17
  • 116
  • 235
  • 1
    It took me some puzzling to decide I understood what you're asking for. To see if I'm right, let me put it in slightly mathematical terms: given a planar graph, is there an algorithm for choosing a maximal set of polygons (whose edges are drawn from the graph, of course) such that each point in the plane is either on a polygon boundary or else contained in exactly one polygon? Does that seem like a fair restatement of your question? – Daniel Wagner Oct 25 '15 at 23:54
  • 1
    After some Googling: it looks like boost has [planar_face_traversal](http://www.boost.org/doc/libs/1_49_0/libs/graph/doc/planar_face_traversal.html), which does something similar (though perhaps not exactly equal) to what you want. You might look at their implementation for some ideas -- or perhaps even just use it as is. – Daniel Wagner Oct 26 '15 at 00:16
  • The case of HI is unclear. If you remove it, the face KJINML gets a hole. Is this allowed ? – Yves Daoust Oct 26 '15 at 09:55
  • @DanielWagner's link gives an O(n+m)-time algorithm for a biconnected graph. All you have to do is first delete all pendant edges (easy to find -- one endpoint has degree 1) and bridges (edges whose deletion would disconnect the graph) -- there are also fairly simple, efficient algorithms for this. Then you can look for polygons that fully contain other polygons -- this should be fairly quick, as you can sort by decreasing area first, and then you only need to check whether an earlier poly contains a later poly, and testing a single point is enough. – j_random_hacker Oct 26 '15 at 13:14
  • @Yves, yes, that is allowed. See the caption over the right image. – Mark A. Donohoe Oct 26 '15 at 13:18
  • This is a really great question. I might suggest also posting on [math.stackexchange.com](http://math.stackexchange.com/) to see what help they can provide outside of strict programming. – Jake Bathman Nov 11 '15 at 14:37
  • Good idea about math.StackExchange.com. Didn't realize there was such a site. Then again, of course there is! :) – Mark A. Donohoe Nov 11 '15 at 14:48

1 Answers1

3

Hint:

Your problem has a geometrical aspect (not pure connectivity) because the faces may not overlap and are known to be simple. I would recommend a sweepline approach.

First cleanup to discard all floating endpoints.

Then consider an horizontal line that moves from top to bottom, vertex by vertex. On every position of the seewpline, it includes or intersects a number of segments. Sorting all vertices/intersections from left to right, you get non-overlapping line segments.

The trick is to track the endpoints as the sweepline progresses in order to find the left and right boundaries of the regions.

In the given example, you will successively consider the points

R  K        J
RM KL G     JI
 M  L GF GH JI
 MN    F GH JI
 MN       H JI
  N  O       I
  NQ   P
   Q

(pairs denote intersections).

From this, you should be able to reconstruct the left/right outlines from connectivity considerations

R M | K L
K L M N | G F H | G H | J I (and embedded G F H | G H)
N Q | O P Q
O P | I P

enter image description here

Here is the graph that you obtain by linking the endpoints and intersections of existing edges from scanline to scanline.

enter image description here

And after cleanup, removing the intermediate vertices:

enter image description here

Yves Daoust
  • 48,767
  • 8
  • 39
  • 84
  • I followed the first half of that, but not the second 'reconstruction' part. Can you explain how you arrived at those items a little more clearly? Looks very promising though! – Mark A. Donohoe Oct 28 '15 at 00:52
  • I didn't expand that completely and I don' want to, sorry. The main idea is to consider the trapeziums (trapezia ?) like K-J-JI-KL, KL-C-CL-L, C'-JI-JI-C'H... (cyan area) and merge them, just by using the label information. You do this while you scan. You should be able to manage this. The really important idea is to sort the segments left to right to avoid overlaps and determine the planar subdivision. – Yves Daoust Oct 28 '15 at 07:53
  • IMO during the scan it is better to split an up vertex like C in two representatives (CC') for uniformity: you always handle line segments with two endpoints, even though on occasion some may have the same position. (You may cleanup later.) – Yves Daoust Oct 28 '15 at 07:57
  • Hi. You said 'C' but did you mean 'G'? Also, again, I'm not following adding 'G'' just to remove it later. If that's the case, why add it at all? I do thank you for adding to the explanation (especially since you said you don't want to) but I'm still not quite following this. – Mark A. Donohoe Nov 11 '15 at 14:47
  • There is no C, just overwritten G (your own notation !) Keeping an even number of points per row makes it easier. – Yves Daoust Nov 12 '15 at 07:44
  • I was referring to your prior comment where you said 'it is better to split an up vertex like C in two representatives (CC')' and my point being in my notation, c would have been removed in step 1 whereas you did split G. Also, obviously you feel even numbers make it easier or you wouldn't have suggested it, but that doesn't explain why. That's where I'm not following. – Mark A. Donohoe Nov 12 '15 at 07:49
  • @MarqueIV I trapped myself ! It is important to keep two representatives so that you can construct independent left and right outlines. – Yves Daoust Nov 12 '15 at 07:50