9

I currently have a list of connections stored in a list where each connection is a directed link that connects two points and no point ever links to more than one point or is linked to by more than one point. For example:

connections = [ (3, 7), (6, 5), (4, 6), (5, 3), (7, 8), (1, 2), (2, 1) ]

Should produce:

ordered = [ [ 4, 6, 5, 3, 7, 8 ], [ 1, 2, 1 ] ]

I have attempt to do this using an algorithm that takes an input point and a list of connections and recursively calls itself to find the next point and add it to the growing ordered list. However, my algorithm breaks down when I don't start with the correct point (though this should just be a matter of repeating the same algorithm in reverse), but also when there are multiple unconnected strands

What would be the best way of writing an efficient algorithm to order these connections?

Matt Dodge
  • 9,728
  • 6
  • 32
  • 53
Jonathon Vandezande
  • 2,258
  • 2
  • 20
  • 29

3 Answers3

19

Algorithm for a Solution

You're looking for a topological sort algorithm:

from collections import defaultdict

def topological_sort(dependency_pairs):
    'Sort values subject to dependency constraints'
    num_heads = defaultdict(int)   # num arrows pointing in
    tails = defaultdict(list)      # list of arrows going out
    for h, t in dependency_pairs:
        num_heads[t] += 1
        tails[h].append(t)

    ordered = [h for h in tails if h not in num_heads]
    for h in ordered:
        for t in tails[h]:
            num_heads[t] -= 1
            if not num_heads[t]:
                ordered.append(t)
    cyclic = [n for n, heads in num_heads.iteritems() if heads]
    return ordered, cyclic

if __name__ == '__main__':
    connections = [(3, 7), (6, 5), (4, 6), (5, 3), (7, 8), (1, 2), (2, 1)]
    print topological_sort(connections)

Here is the output for your sample data:

([4, 6, 5, 3, 7, 8], [1, 2])

The runtime is linearly proportional to the number of edges (dependency pairs).

HOW IT WORKS

The algorithm is organized around a lookup table called num_heads that keeps a count the number of predecessors (incoming arrows). Consider an example with the following connections: a->h b->g c->f c->h d->i e->d f->b f->g h->d h->e i->b, the counts are:

node  number of incoming edges
----  ------------------------
 a       0
 b       2
 c       0
 d       2
 e       1
 f       1  
 g       2
 h       2
 i       1

The algorithm works by "visting" nodes with no predecessors. For example, nodes a and c have no incoming edges, so they are visited first.

Visiting means that the nodes are output and removed from the graph. When a node is visited, we loop over its successors and decrement their incoming count by one.

For example, in visiting node a, we go to its successor h to decrement its incoming count by one (so that h 2 becomes h 1.

Likewise, when visiting node c, we loop over its successors f and h, decrementing their counts by one (so that f 1 becomes f 0 and h 1 becomes h 0).

The nodes f and h no longer have incoming edges, so we repeat the process of outputting them and removing them from the graph until all the nodes have been visited. In the example, the visitation order (the topological sort is):

a c f h e d i b g

If num_heads ever arrives at a state when there are no nodes without incoming edges, then it means there is a cycle that cannot be topologically sorted and the algorithm exits to show the requested results.

Raymond Hettinger
  • 182,864
  • 54
  • 321
  • 419
  • My program needs to be able to still function if there are cycles, thus this solution won't work for me. – Jonathon Vandezande May 07 '13 at 01:46
  • 4
    @Jonathon It gives the output you requested and functions even when there are cycles. It sequences the connections, it identifies the cyclic portion (as in your example), and it works even if there are "islands". IOW, I don't understand why you're rejecting a solution that exactly matches what you asked for. – Raymond Hettinger May 07 '13 at 02:36
  • 1
    Sorry, from how I understood what you wrote, it seemed as if it would not deal with cycles. Thanks for editing your answer to make it clearer. I just checked out your algorithm and it works great. Thanks. – Jonathon Vandezande May 07 '13 at 16:00
2

Something like this:

from collections import defaultdict
lis = [ (3, 7), (6, 5), (4, 6), (5, 3), (7, 8), (1, 2), (2, 1) ]
dic = defaultdict(list)

for k,v in lis:
    if v not in dic:
        dic[k].append(v)
    else:
        dic[k].extend([v]+dic[v])
        del dic[v]

for k,v in dic.items():
    for x in v:
        if x in dic and x!=k:
            dic[k].extend(dic[x])
            del dic[x]

print dic
print [[k]+v for k,v in dic.items()]

output:

defaultdict(<type 'list'>, {2: [1, 2], 4: [6, 5, 3, 7, 8]})
[[2, 1, 2], [4, 6, 5, 3, 7, 8]]
Ashwini Chaudhary
  • 217,951
  • 48
  • 415
  • 461
0

I think you can probably do it in O(n) with something like this:

ordered = {}

for each connection (s,t):
  if t exists in ordered :
     ordered[s] = [t] + ordered[t]
     del ordered[t]
  else:
     ordered[s] = [t]

# Now merge...
for each ordered (s : [t1, t2, ... tN]):
  ordered[s] = [t1, t2, ... tN] + ordered[tN]
  del ordered[tN]

At the end you will be left with something like this, but you can convert that rather easily

ordered = { 
  4 : [6, 5, 3, 7, 8],
  2 : [1, 2]
}
nutship
  • 3,804
  • 12
  • 43
  • 58
Matt Dodge
  • 9,728
  • 6
  • 32
  • 53