23

I'm looking for an algorithm which can diff two Directed Acyclic Graphs(DAGs). That is, I'd like an algorithm which produces a sequence of deletions and insertions on the first DAG to produce the second DAG.

I'm not a hundred percent sure, but I think a longest common subsequence can be applied to the DAGs. I'm less concerned about the length of the resulting edit sequence (as long as it's short enough) and more concerned about the running time of the algorithm.

One complication is that none of my vertices are labelled except for a single root node. The root node is also the only node with zero in-edges. The graph's edges are labelled, and the 'data' in the graph is represented by the paths from the root to the leaves. This is similar to a trie but with a directed graph instead of a tree. Actually my graphs are quite similar to the directed acyclic word graph data structure.

Here's an example.

DAG1

DAG1

DAG2

DAG2

To get DAG 2, you simply add a vertex from the root to another vertex with label 'b'. From that vertex there is an edge to the final 'ac' vertex in DAG 1 and an edge to a new vertex whose label is 'd'. From that final vertex there is another edge to the 'ac' vertex in DAG 1. I'd post a link to the diff in DAG form, but I can't post more than two links.

Thanks and hope this is legible enough.

Andrew Rose
  • 732
  • 9
  • 24
Nomad010
  • 243
  • 4
  • 8
  • 2
    Can a node have two edges leading out from it which are labelled identically? – borrible May 15 '13 at 09:54
  • @borrible: That's a good question, I don't think they can. Would it change it drastically if they did? – Nomad010 May 15 '13 at 12:30
  • 2
    say you have a DAG with lots of vertices, and among 2 of them somewhere in the middle of the DAG you establish a new edge (without creating a cycle, of course). The task of finding that simple diff **if the vertices are not labelled** is daunting – and furthermore, how do you even describe it? – Walter Tross Mar 22 '19 at 22:49
  • 1
    @WalterTross My mistake, the use case I had in mind when I issued the bounty was with labeled vertices: every vertex would have multiple attributes. – phant0m Mar 25 '19 at 11:35

2 Answers2

10

This might be a bit too late but just for fun: Both of your DAGs can be expressed as matrices, with row index indicating the "from" vertex, and the column index indicating the "to" vertex, and the corresponding cell labeled with edge id. You can give vertex unique and random ids.

The next part is a bit tricky, because only your edges have meaningful label that maps from DAG1 to DAG2. Suppose you have a set of edges E* that are the intersect of labeled edges from DAG1 and DAG2, you will need to perform a series of row shift (move up or down) or column shift (move left or right) so position of all edges in E* in DAG1 and DAG2 maps to each other. Note that for a DAG represented in Matrix, shifting position of entire row or entire column still makes the representation equivalent.

The remaining operation would be to rename the vertex according to the mapped matrices, compare the two matrices, and identify the new edges and new vertex required (and edges and vertices that can be removed.

firemana
  • 467
  • 4
  • 7
3

How would your specific data representation show that edges c and x in your DAG 2 example terminate in the same vertex?

If we assume Wikipedia's general definitions of "directed graph", "vertex", and "edge", there is no such thing as an "unlabeled vertex" because without labeling them, there would be no way to describe edges, according to the definition there.

As is, it seems to me your question is impossible to answer. Please provide (1) a simple example of the input provided to the algorithm — a data structure describing each graph as a collection of vertices and edges — and the expected output in a similar way, and (2) a consistent way to distinguish if an edge or vertex in the first DAG is equivalent to one in the second DAG, implying no difference in that aspect of the graphs.

Perhaps your question is actually mostly about how to determine the labels for the vertices in each DAG in the input and how to best correlate them. Or, alternatively, perhaps labels are just a convenience to describe each graph and the question is actually seeking the minimal set of changes to describe a transformation of the structure of one graph to another.

That said, edges and vertices in a traditional, mathematical, definition of a graph are atomic. Each vertex or edge either exists or does not exist in any one graph, making the concept of a diff somewhat meaningless, or otherwise trivial to build, if we assume that an identical label for any specific vertex or edge represents the exact same vertex or edge in both graphs.

Such a trivial algorithm would basically just enumerate each vertex and edge in the two DAGs and add the appropriate operations to the diff, choosing only from the following operations:

add vertex v
remove vertex v
add edge e
remove edge e
switch direction for edge e
גלעד ברקן
  • 21,095
  • 3
  • 19
  • 57
  • 2
    You raise a good point. The use case I had in mind when I was issuing the bounty was with labeled vertices: Every vertex would have multiple attributes. – phant0m Mar 25 '19 at 11:30
  • @phant0m can you explain and give an example? What do you mean by "multiple attributes"? There could be different ways to determine if an edge or vertex in the first DAG is equivalent to one in the second DAG. Without clarifying how the OP would do it for *their* use case (or yours), I don't see how we can answer. – גלעד ברקן Mar 25 '19 at 12:29
  • Let's just say one of the nodes has an `id` attribute that serves as a key. Or, as a generalization, have some ` 0 <= similar(a, b) <= 1` measure where `1` would mean the vertex `a` from graph A is equivalent to vertex `b` in graph B. So in the case with `id`, the function evaluates to `1` iff `a` and `b` have the same `id`, ìt evaluates to zero otherwise. In my case, there can be only one edge between each pair of nodes, so edges can be identified relative to their vertices. – phant0m Mar 28 '19 at 19:27
  • @phant0m thanks. So we can identify vertices by their unique `id`. As I understand, an "edge" in a DAG is normally defined as `(vertex_1, vertex_2, direction)`. Can you please help me understand why creating a diff is not trivial if we have vertex identity as unique ID? Either a vertex (and by extension an edge) exists in the DAG or it doesn't. Just enumerate them all and mark the operations - if it's in both, do nothing, otherwise: if it's in 1 and not 2, mark `remove`; if it's in 2 and not 1, mark `add`, if it's in 1 and 2 but changed direction, mark `change direction`. What am I missing? :) – גלעד ברקן Mar 28 '19 at 21:28