Algorithm to maximize number of traversed nodes

Question

I'm trying to optimize a graph-traversal problem, but can't figure out the best way to tackle it. It seems neither like A* search problem (because we want to maximize the path rather than minimizing it), nor traveling salesman problem (because we don't have to visit all cities). The simplified version of it is something along these lines:

We have a set of nodes and connections/edges. Connections are arbitrary and nodes can have one or more of them. Connections also have an interface/type associated with them, and interfaces can't support more than a single connection. So for example, if node A can connect to nodes B or C via interface alpha, and we decide to connect it to node B, that interface on node A can no longer support other connections, so C can't be connected to A anymore. However, we could connect node C to node D, if it happens to have the same alpha interface.

I should also mention that these interfaces work like lock-and-key, so A can connect to either B or C, but B and C can't connect to eachother (the interface is like a mirror). Also, while A can no longer connect to anything via the alpha interface because it's used by B, if it happens to have another interface (bravo) and something else can connect to bravo, then we can connect more than one node to A. The goal is to obtain the largest group of connected nodes (discarding all smaller groups).

There are a few heuristics I'm considering:

prefer nodes with more interfaces (I already discarded interfaces without pairs)
prefer interfaces that are more popular

The above two rules can be useful for prioritizing which node to try connecting to next (for now I naively grouped them into one rank - total number of connectable nodes), but my gut is telling me I can do better. Moreover, I don't think this would favor an optimal solution.

I was trying to figure out if I can invert the heuristic somehow to create a variation of A* Search such that the A* 'optimistic heuristic cost' rule still applies (i.e. heuristic cost = number of nodes discarded, however, this breaks the actual cost computation - since we'd be starting with all but one node discarded).

Another idea I had was computing the distance (number of intermediate nodes) to each node from the starting node and using the average of that as a heuristic, with goal being all nodes connected. However, I'm not guaranteed that all nodes will connect.

EDIT: Here is an example

dashed lines represent allowed (but not activated/traveled) connections
interfaces are not allowed to connect to the interface with identical name, but can connect to the ' version of itself
interface can only be used once (if we connect A to B via α, we can no longer connect A to C because A no longer has interface α available)
number of nodes is arbitrary (but constant during the algorithm's execution), and should be assumed to be very large
number of interfaces per node is going to be at least one, we could assume an upper limit if it makes the problem easier - i.e. 3
number of possible connections is simply a function of interface compatibility, interface defines what the node can connect to, whether/how you use that interface is up to you
direction/order of activating the connections doesn't matter
the goal is to generate the largest set of connected nodes (we don't care about number of connections or interfaces used)

Example

Can you give some sample valid input? I'm not sure I understand the question totally - are there 'Nodes', 'Connections', and 'Interfaces'? And an interface defines some set of connections, but only allows 1 to be used? Are connections bidirectional? How many nodes/connections/interfaces could there be? — Alex Anderson, Apr 01 '15 at 02:24
Added an example, nodes are like regular nodes in a graph, connections are like regular connections in the graph, with the exception that node defines an interface for a said connection. Interfaces are the gateways with a limit of one connection, so `α` interface can connect to **any** node with `α'` interface. — Alexander Tsepkov, Apr 01 '15 at 15:03
This looks an awful lot like the [longest path problem](http://en.wikipedia.org/wiki/Longest_path_problem). — Jim Mischel, Apr 01 '15 at 16:10
Additional questions: Can any node with interface A connect to any other node with interface A'? Or can nodes with interface A only connect to some node-specific subset of nodes with interface A'? (EDIT: this seems to be clarified as YES to first question) How many different types of interfaces are there? Will there be O(N) nodes, but only O(1) different types of interfaces? Can nodes have multiple sockets for the same interface? I.e., a node has interfaces A, A, A, B? Can a node have opposite interfaces, i.e., a node with A, A'? Thanks for answering these questions — Alex Anderson, Apr 03 '15 at 03:35
Yes, A can connect to any A'. Number of interfaces is arbitrary, just like number of nodes, a node will have a relatively small number of interfaces, but the cumulative number of interfaces may be large. And yes, node can have several copies of the same interface (in which case it can connect to more than one node of the opposite interface), or even opposite interfaces (but if they're opposite, there is no benefit in connecting the node to itself) — Alexander Tsepkov, Apr 03 '15 at 15:13
Needs more clear explanation about the nature/motivation/design/limitations of the "interface". It sounds like what you have is an undirected graph for which edges always have some restrictions of some sort. That's probably fine, but IMHO your formulation (including the diagram) feels ineffective because it looks like beta should be a *property* of the edge EA, and similarly there is some kind of shared (alpha) commonality between the edges AB, BD, CD, and CA. And then maybe it will work better to explain what that "lock and key" concept (and the others) means. — Steven Lu, Apr 05 '15 at 18:31
For instance, what would you label the alpha interface circles if node B didn't exist and A was connected to D? In this case you have a triangle, and A, C, D all "share a single instance of interface alpha" (please correct me if I'm wrong). So that breaks down the whole pairs-of-interfaces thing here. Unless the interfaces actually need to be represented as pairs due to their nature (which again is still unclear to me) — Steven Lu, Apr 05 '15 at 18:38
@StevenLu I think the explanation for interface is clear now - essentially, for an edge to exist, it needs to have one end be interface X, and the other end interface X'. This "uses up" those sockets, wherever the edge was placed. Hence, in your question, A can connect to C, or C can connect to D, but you can't connect both to C, because there is only one instance of interface A' on node C. — Alex Anderson, Apr 08 '15 at 04:29

Algorithm to maximize number of traversed nodes

0 Answers0