I'm trying to optimize a graph-traversal problem, but can't figure out the best way to tackle it. It seems neither like A* search problem (because we want to maximize the path rather than minimizing it), nor traveling salesman problem (because we don't have to visit all cities). The simplified version of it is something along these lines:
We have a set of nodes and connections/edges. Connections are arbitrary and nodes can have one or more of them. Connections also have an interface/type associated with them, and interfaces can't support more than a single connection. So for example, if node A
can connect to nodes B
or C
via interface alpha
, and we decide to connect it to node B
, that interface on node A
can no longer support other connections, so C
can't be connected to A
anymore. However, we could connect node C
to node D
, if it happens to have the same alpha
interface.
I should also mention that these interfaces work like lock-and-key, so A
can connect to either B
or C
, but B
and C
can't connect to eachother (the interface is like a mirror). Also, while A
can no longer connect to anything via the alpha
interface because it's used by B
, if it happens to have another interface (bravo
) and something else can connect to bravo
, then we can connect more than one node to A
. The goal is to obtain the largest group of connected nodes (discarding all smaller groups).
There are a few heuristics I'm considering:
- prefer nodes with more interfaces (I already discarded interfaces without pairs)
- prefer interfaces that are more popular
The above two rules can be useful for prioritizing which node to try connecting to next (for now I naively grouped them into one rank - total number of connectable nodes), but my gut is telling me I can do better. Moreover, I don't think this would favor an optimal solution.
I was trying to figure out if I can invert the heuristic somehow to create a variation of A* Search
such that the A* 'optimistic heuristic cost' rule still applies (i.e. heuristic cost = number of nodes discarded, however, this breaks the actual cost computation - since we'd be starting with all but one node discarded).
Another idea I had was computing the distance (number of intermediate nodes) to each node from the starting node and using the average of that as a heuristic, with goal being all nodes connected. However, I'm not guaranteed that all nodes will connect.
EDIT: Here is an example
- dashed lines represent allowed (but not activated/traveled) connections
- interfaces are not allowed to connect to the interface with identical name, but can connect to the
'
version of itself - interface can only be used once (if we connect
A
toB
viaα
, we can no longer connectA
toC
becauseA
no longer has interfaceα
available) - number of nodes is arbitrary (but constant during the algorithm's execution), and should be assumed to be very large
- number of interfaces per node is going to be at least one, we could assume an upper limit if it makes the problem easier - i.e. 3
- number of possible connections is simply a function of interface compatibility, interface defines what the node can connect to, whether/how you use that interface is up to you
- direction/order of activating the connections doesn't matter
- the goal is to generate the largest set of connected nodes (we don't care about number of connections or interfaces used)