3

Given a list of tuples, I need to find all unique paths from this list :

Input: [('a','b'),('b','c'),('c','d'),('g','i'),('d','e'),('e','f'),('f','g'),('c','g')]
Output: [['a','b','c','d','e','f','g'],['a','b','c','g','i']] (the 2 possible unique paths)

Two tuples can connect if the second element of the tuple matches with the first element of the other tuple i.e: One tuple is (_,a) and other tuple is like (a,_).

This issue has already been raised there: Getting Unique Paths from list of tuple but the solution is implemented in haskell (and I know nothing about this language).

But do you know if there's an efficient way to do this in Python?
I know the library itertools has many efficient built in functions for stuff like that, but I'm not too familiar with this.

Valentino
  • 6,643
  • 6
  • 14
  • 30
Roulio
  • 61
  • 3

3 Answers3

3

You are wanting to find all simple paths in your graph.

Python has an amazing library for graph processing: networkx. You can solve your problem with literally several lines of code:

import networkx as nx

a = [('a','b'),('b','c'),('c','d'),('g','i'),('d','e'),('e','f'),('f','g'),('c','g')]

# Create graph
G = nx.Graph()
# Fill graph with data
G.add_edges_from(a)

# Get all simple paths from node 'a' to node 'i'
list(nx.all_simple_paths(G, 'a', 'i'))

will return you:

[['a', 'b', 'c', 'g', 'i'], ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'i']]


If you want ALL possible paths, just replace the last line with it:

for start in G.nodes:
    for end in G.nodes:
        if start != end:
            print(list(nx.all_simple_paths(G, start, end)))
vurmux
  • 8,002
  • 2
  • 18
  • 38
  • Thanks a lot :) But in your solution you assume that 'a' and 'i' are necessarily the beginning and end of your possible paths (in my problem I make no assuptions regarding how my paths begin and end, I just know how they should be built). But I guess I could generate all possible paths by simply generating all (begin / end) combinations and then applying your method to each combination (might be very long though) – Roulio May 15 '19 at 13:52
  • Updated the answer. – vurmux May 15 '19 at 13:56
  • Note that the updated answer with a call to `nx.all_simple_paths` in a nested loop over all the nodes turn the solution into *O(n ^ 3)* in average time complexity. – blhsing May 15 '19 at 16:05
1

You can build a dict that maps each parent to a list of connected children, so that you can recursively yield the paths from each parent node in an average time complexity of O(n):

def get_paths(parent, mapping):
    if parent not in mapping:
        yield [parent]
        return
    for child in mapping[parent]:
        for path in get_paths(child, mapping):
            yield [parent, *path]

edges = [('a','b'),('b','c'),('c','d'),('g','i'),('d','e'),('e','f'),('f','g'),('c','g')]
parents = set()
children = set()
mapping = {}
for a, b in edges:
    mapping.setdefault(a, []).append(b)
    parents.add(a)
    children.add(b)
print([path for parent in parents - children for path in get_paths(parent, mapping)])

This outputs:

[['a', 'b', 'c', 'd', 'e', 'f', 'g', 'i'], ['a', 'b', 'c', 'g', 'i']]
blhsing
  • 70,627
  • 6
  • 41
  • 76
0

You can use recursion with a generator:

d = [('a','b'),('b','c'),('c','d'),('g','i'),('d','e'),('e','f'),('f','g'),('c','g')]
def get_paths(start, c = []):
   r = [b for a, b in d if a == start]
   if r:
     for i in r:
        yield from get_paths(i, c+[i])
   else:
     yield c

print(list(get_paths('a', ['a'])))

Output:

[['a', 'b', 'c', 'd', 'e', 'f', 'g', 'i'], ['a', 'b', 'c', 'g', 'i']]
Ajax1234
  • 58,711
  • 7
  • 46
  • 83
  • Note that having to iterate through all the nodes in every recursive call makes this solution *O(n ^ 2)* in average time complexity. It also requires knowing the starting node in advance, which is not given as an input to this question. – blhsing May 15 '19 at 16:12
  • @blhsing How are you deriving `O(n^2)` and `O(n)` for our answers? Your dictionary lookup is `O(n)`, but with a recursive call in a `for` loop, the worst-case result is `>= O(2^n)` (for mine as well). – Ajax1234 May 15 '19 at 16:29
  • Dictionary lookup costs *O(1)* on average, not *O(n)*, whereas your linear search for a given parent node in all nodes with `[b for a, b in d if a == start]` costs *O(n)*. The recursive calls traverse all the nodes linearly, so my solution would cost *O(n)* while yours costs *O(n ^ 2)*. The other factor, however, is the number of diverging paths, which occurs when a parent node has more than one child node, and if that number is significant then my solution would cost *O(n x m)*, while yours would cost *O(n ^ 2 x m)*, where `n` is the number of nodes and `m` is the number of diverging paths. – blhsing May 15 '19 at 16:43
  • @blhsing My answer contains the additional pass over `d` but coupled with the linear time complexity for the recursive calls, would not the result be `O(n)+O(n) => O(n)`? The additional linear pass outside the loop does not increase the power of `n`. – Ajax1234 May 15 '19 at 17:18
  • No, you're iterating over the length of all nodes in each recursive call, and the recursive calls are done for at least the number of times of the number of nodes, so that results in `n` times `n` steps. Looking up a node in a dict eliminates the need to iterate over all nodes in each recursive call, and hence the *O(n)* complexity with my solution. – blhsing May 15 '19 at 17:31