112

Can someone explain in simple terms to me what a directed acyclic graph is? I have looked on Wikipedia but it doesn't really make me see its use in programming.

Mark Amery
  • 110,735
  • 57
  • 354
  • 402
yazz.com
  • 52,748
  • 62
  • 227
  • 363
  • 26
    Wikipedia frequently contains overwhelming technical content that would take beginners a great deal of studying to comprehend. Many of the math help sites are superior in this regard, but they tend not to get into computation related subjects, unfortunately. – Jonathon Faust Feb 17 '10 at 19:41
  • 1
    Whoever uses git actually uses DAG without knowing it, https://ericsink.com/vcbe/html/directed_acyclic_graphs.html – Qiulang May 05 '20 at 03:02

13 Answers13

177

graph = structure consisting of nodes, that are connected to each other with edges

directed = the connections between the nodes (edges) have a direction: A -> B is not the same as B -> A

acyclic = "non-circular" = moving from node to node by following the edges, you will never encounter the same node for the second time.

A good example of a directed acyclic graph is a tree. Note, however, that not all directed acyclic graphs are trees.

Zoltán
  • 19,217
  • 10
  • 81
  • 124
Roland Bouman
  • 28,589
  • 6
  • 63
  • 64
  • I understand what nodes are. When you say "edge", do you mean an arrow pointing from Node A to Node B? – yazz.com Feb 17 '10 at 19:30
  • Better explanation. So what has this got to do with programming? Is it related to functional programming? – yazz.com Feb 17 '10 at 19:32
  • 2
    It's typically represented by an arrow, but it's really just that there is a relation between A and B. In your program this might be a true value in an adjacency matrix at the indices representing those two nodes. – tvanfosson Feb 17 '10 at 19:32
  • Yes, "edge" is the arrow. If you imagine a larger graph, one which IS cyclic, as a 3-dimenional object, the "edge" term makes more sense. – James Curran Feb 17 '10 at 19:33
  • Trees are what first came to mind when I thought of how a directed acyclic graph might apply in programming, but there's a minor technicality: trees don't have to be directed. – hbw Feb 17 '10 at 19:34
  • Zubair, the relation to programming is that many data structures take the form of a graph. So some components in your code play the role of node, some of edge. For example, a parse tree can be seen as a directed acyclic graph: the nodes are the operators and constants, and the edges represent association. For example, parsing 1+2*3 would give you a tree with + as root, 1 as leftmost child, * as rightmost child, and * would have 2 as leftmost child, and 3 as rightmost child – Roland Bouman Feb 17 '10 at 19:37
  • This I understand, thanks Roland. So its really discussign a type of data structure, is that correct? Or is it program flow? – yazz.com Feb 17 '10 at 19:39
  • Not to mention that a directed acyclic graph doesn't have to be one directed tree, but could be many. Of course, a connected directed acyclic graph is a directed tree. – David Thornley Feb 17 '10 at 19:39
  • 43
    All directed trees are DAGs, but not all DAGs are trees. The DAG A->B, A->C, B->C cannot be represented as a tree since node C has more than one parent. – Jason S Feb 17 '10 at 19:39
  • Zubair, well, the term graph really is in the field of mathematics. The graph and terminology can sometimes be used to express the nature of a data structure, so it is a useful analogy. But I think the graph analogy could equally well be used to talk about program flow. Take again the parse tree example: the parse tree is a data structure, and to evaluate it (=program flow) you have to walk the graph, executing the expressions as you encounter them whilst walking the tree. – Roland Bouman Feb 17 '10 at 19:44
  • Actually, a tree is a UNdirected graph that is acyclic and connected. If you erase the arrows from a DAG, you won't necessarily get a tree. That's because a pair of nodes can have more than one path between them. – Derek Ledbetter Feb 17 '10 at 19:59
  • 2
    Directedness of edges is not the only feature separating DAGs from trees. A DAG can have more than |V|-1 edges, unlike a tree. For instance, A->B, A->C, B->D, C->D is a DAG but clearly not a tree since it has the same number of edges and nodes. – Anonym Mus Feb 24 '10 at 10:17
  • A DAG is not always tree, but a tree is always a DAG. A node in a DAG can have more than one parent, thus the DAG is not always a tree. –  Mar 01 '13 at 15:29
90

dots with lines pointing to other dots

smartcaveman
  • 38,142
  • 26
  • 119
  • 203
  • 24
    This is one of the best answers because it is a simple way of describing what is a simple concept buried in complex terminology (if we're asking this question, we might not know graph theory... or even need to know). My variant would be something like "bar-hopping where you can never go to the same bar twice". Although the family-tree example from another answer is probably conceptually simpler, especially for those of us who aren't college students or alcoholics. – Tom Harrison Jul 23 '16 at 16:48
  • 28
    ... in one direction – Mark Robson Jul 04 '17 at 12:34
  • 3
    This is a good example of failing to express an inherently complex concept in less than possible terms. That's why Euclid's fifth postulate still exists. – Xaqron Jul 19 '18 at 14:04
  • 5
    You have to include "where the lines do not form cycles", otherwise you're just describing a directed graph, not a directed acyclic graph. – Pharap Mar 13 '19 at 13:42
  • "dots with lines point to other dots, with no loops" would be an improvement. – John DeRegnaucourt Jan 12 '20 at 15:45
  • Unfortunately I was compelled to downvote this answer since it is not 100% factually correct at this point. I tried proposing an edit but it seems to have been silently dropped for some reason. I just tried again and got a message that the "suggested edit queue is full". I hope we can get this answer corrected someday. – brodybits Jun 29 '20 at 17:03
  • This answer is incorrect as it only describes a node-based graph, not a directed acyclic one. – Tom Bowers Nov 12 '20 at 04:25
50

I see lot of answers indicating the meaning of DAG (Directed Acyclic Graph) but no answers on its applications. Here is a very simple one -

Pre-requisite graph - During an engineering course every student faces a task of choosing subjects that follows requirements such as pre-requisites. Now its clear that you cannot take a class on Artificial Intelligence[B] without a pre requisite course on Algorithms[A]. Hence B depends on A or in better terms A has an edge directed to B. So in order to reach Node B you have to visit Node A. It will soon be clear that after adding all the subjects with its pre-requisites into a graph, it will turn out to be a Directed Acyclic Graph.

If there was a cycle then you would never complete a course :p

A software system in the university that allows students to register for courses can model subjects as nodes to be sure that the student has taken a pre-requisite course before registering for the current course.

My professor gave this analogy and it has best helped me understand DAG rather than using some complicated concept!

Another real time example -> Real Time example of how DAG's can be used in version system

human.js
  • 1,257
  • 12
  • 15
  • 4
    This should be the most highly ranked answer. Simple analogy and doesn't use the text book definition the OP isn't able to easily comprehend. – kimathie Aug 17 '17 at 12:43
25

Example uses of a directed acyclic graph in programming include more or less anything that represents connectivity and causality.

For example, suppose you have a computation pipeline that is configurable at runtime. As one example of this, suppose computations A,B,C,D,E,F, and G depend on each other: A depends on C, C depends on E and F, B depends on D and E, and D depends on F. This can be represented as a DAG. Once you have the DAG in memory, you can write algorithms to:

  • make sure the computations are evaluated in the correct order (topological sort)
  • if computations can be done in parallel but each computation has a maximum execution time, you can calculate the maximum execution time of the entire set

among many other things.

Outside the realm of application programming, any decent automated build tool (make, ant, scons, etc.) will use DAGs to ensure proper build order of the components of a program.

Jason S
  • 171,795
  • 155
  • 551
  • 900
  • +1 for mention of causality. This comes up a lot when you need to represent a complex systems where the output of one process is the input for one or more other processes. – Alex Feinman Feb 17 '10 at 20:03
14

Several answers have given examples of the use of graphs (e.g. network modeling) and you've asked "what does this have to do with programming?".

The answer to that sub-question is that it doesn't have much of anything to do with programming. It has to do with problem solving.

Just like linked-lists are data structures used for certain classes of problems, graphs are useful for representing certain relationships. Linked lists, trees, graphs, and other abstract structures only have a connection to programming in that you can implement them in code. They exist at a higher level of abstraction. It's not about programming, it's about applying data structures in the solution of problems.

Jonathon Faust
  • 11,876
  • 3
  • 47
  • 61
  • Can be implemented in programming. Yes, I like that, as graphs exist in the real world independant of computers! – yazz.com Feb 18 '10 at 08:44
13

Directed Acyclic Graphs (DAG) have the following properties which distinguish them from other graphs:

  1. Their edges show direction.
  2. They don't have cycles.

Well, I can think of one use right now - DAG (known as Wait-For-Graphs - more technical details) are handy in detecting deadlocks as they illustrate the dependencies amongst a set of processes and resources (both are nodes in the DAG). Deadlock would happen when a cycle is detected.

Mark Amery
  • 110,735
  • 57
  • 354
  • 402
Arnkrishn
  • 27,376
  • 39
  • 108
  • 127
  • 1
    Andriyev, +1 for the deadlock example. This is in fact used by MySQL's InnoDB engine, and they call it a "wait-for-graph", as in, "that row has to wait for the lock on that row to be released" – Roland Bouman Feb 17 '10 at 19:40
  • yes, you are dead right with the name - Wait For Graph. Some how missed that. Updated the response. :) – Arnkrishn Feb 17 '10 at 19:43
  • How do they know there is a dependency? Is it by checking to see if two nodes have a common ancestor? – yazz.com Feb 18 '10 at 08:38
  • This link -http://www.cis.temple.edu/~ingargio/cis307/readings/deadlock.html has more technical details. – Arnkrishn Feb 18 '10 at 14:10
11

I assume you already know basic graph terminology; otherwise you should start from the article on graph theory.

Directed refers to the fact that the edges (connections) have directions. In the diagram, these directions are shown by the arrows. The opposite is an undirected graph, whose edges don't specify directions.

Acyclic means that, if you start from any arbitrary node X and walk through all possible edges, you cannot return to X without going back on an already-used edge.

Several applications:

  • Spreadsheets; this is explained in the DAG article.
  • Revision control: if you have a look at the diagram in that page, you will see that the evolution of revision-controlled code is directed (it goes "down", in this diagram) and acyclic (it never goes back "up").
  • Family tree: it's directed (you are your parents' child, not the other way around) and acyclic (your ancestors can never be your descendant).
Johannes Sasongko
  • 4,010
  • 21
  • 33
5

A DAG is a graph where everything flows in the same direction and no node can reference back to itself.

Think of ancestry trees; they are actually DAGs.

All DAGs have

  • Nodes (places to store data)
  • Directed Edges (that point in the same direction)
  • An ancestral node (a node without parents)
  • Leaves (nodes that have no children)

DAGs are different from trees. In a tree-like structure, there must a unique path between every two nodes. In DAGs, a node can have two parent nodes.

Here's a good article about DAGs. I hope that helps.

Mickey
  • 51
  • 1
  • 1
4

Graphs, of all sorts, are used in programming to model various different real-world relationships. For example, a social network is often represented by a graph (cyclic in this case). Likewise, network topologies, family trees, airline routes, ...

tvanfosson
  • 490,224
  • 93
  • 683
  • 780
2

From a source code or even three address(TAC) code perspective you can visualize the problem really easily at this page...

http://cgm.cs.mcgill.ca/~hagha/topic30/topic30.html#Exptree

If you go to the expression tree section, and then page down a bit it shows the "topological sorting" of the tree, and the algorithm for how to evaluate the expression.

So in that case you can use the DAG to evaluate expressions, which is handy since evaluation is normally interpreted and using such a DAG evaluator will make simple intrepreters faster in principal because it is not pushing and popping to a stack and also because it is eliminating common sub-expressions.

The basic algorithm to compute the DAG in non ancient egyptian(ie English) is this:

1) Make your DAG object like so

You need a live list and this list holds all the current live DAG nodes and DAG sub-expressions. A DAG sub expression is a DAG Node, or you can also call it an internal node. What I mean by live DAG Node is that if you assign to a variable X then it becomes live. A common sub-expression that then uses X uses that instance. If X is assigned to again then a NEW DAG NODE is created and added to the live list and the old X is removed so the next sub-expression that uses X will refer to the new instance and thus will not conflict with sub-expressions that merely use the same variable name.

Once you assign to a variable X, then co-incidentally all the DAG sub-expression nodes that are live at the point of assignment become not-live, since the new assignment invalidates the meaning of sub expressions using the old value.

class Dag {
  TList LiveList;
  DagNode Root;
}

// In your DagNode you need a way to refer to the original things that
// the DAG is computed from. In this case I just assume an integer index
// into the list of variables and also an integer index for the opertor for
// Nodes that refer to operators. Obviously you can create sub-classes for
// different kinds of Dag Nodes.
class DagNode {
  int Variable;
  int Operator;// You can also use a class
  DagNode Left;
  DagNode Right;
  DagNodeList Parents;
}

So what you do is walk through your tree in your own code, such as a tree of expressions in source code for example. Call the existing nodes XNodes for example.

So for each XNode you need to decide how to add it into the DAG, and there is the possibility that it is already in the DAG.

This is very simple pseudo code. Not intended for compilation.

DagNode XNode::GetDagNode(Dag dag) {
  if (XNode.IsAssignment) {
    // The assignment is a special case. A common sub expression is not
    // formed by the assignment since it creates a new value.

    // Evaluate the right hand side like normal
    XNode.RightXNode.GetDagNode();  


    // And now take the variable being assigned to out of the current live list
    dag.RemoveDagNodeForVariable(XNode.VariableBeingAssigned);

    // Also remove all DAG sub expressions using the variable - since the new value
    // makes them redundant
    dag.RemoveDagExpressionsUsingVariable(XNode.VariableBeingAssigned);

    // Then make a new variable in the live list in the dag, so that references to
    // the variable later on will see the new dag node instead.
    dag.AddDagNodeForVariable(XNode.VariableBeingAssigned);

  }
  else if (XNode.IsVariable) {
    // A variable node has no child nodes, so you can just proces it directly
    DagNode n = dag.GetDagNodeForVariable(XNode.Variable));
    if (n) XNode.DagNode = n;
    else {
      XNode.DagNode = dag.CreateDagNodeForVariable(XNode.Variable);
    }
    return XNode.DagNode;
  }
  else if (XNode.IsOperator) {
    DagNode leftDagNode = XNode.LeftXNode.GetDagNode(dag);
    DagNode rightDagNode = XNode.RightXNode.GetDagNode(dag);


    // Here you can observe how supplying the operator id and both operands that it
    // looks in the Dags live list to check if this expression is already there. If
    // it is then it returns it and that is how a common sub-expression is formed.
    // This is called an internal node.
    XNode.DagNode = 
      dag.GetOrCreateDagNodeForOperator(XNode.Operator,leftDagNode,RightDagNode) );

    return XNode.DagNode;
  }
}

So that is one way of looking at it. A basic walk of the tree and just adding in and referring to the Dag nodes as it goes. The root of the dag is whatever DagNode the root of the tree returns for example.

Obviously the example procedure can be broken up into smaller parts or made as sub-classes with virtual functions.

As for sorting the Dag, you go through each DagNode from left to right. In other words follow the DagNodes left hand edge, and then the right hand side edge. The numbers are assigned in reverse. In other words when you reach a DagNode with no children, assign that Node the current sorting number and increment the sorting number, so as the recursion unwinds the numbers get assigned in increasing order.

This example only handles trees with nodes that have zero or two children. Obviously some trees have nodes with more than two children so the logic is still the same. Instead of computing left and right, compute from left to right etc...

// Most basic DAG topological ordering example.
void DagNode::OrderDAG(int* counter) {
  if (this->AlreadyCounted) return;

  // Count from left to right
  for x = 0 to this->Children.Count-1
    this->Children[x].OrderDag(counter)

  // And finally number the DAG Node here after all
  // the children have been numbered
  this->DAGOrder = *counter;

  // Increment the counter so the caller gets a higher number
  *counter = *counter + 1;

  // Mark as processed so will count again
  this->AlreadyCounted = TRUE;
}
1

If you know what trees are in programming, then DAGs in programming are similar but they allow a node to have more than one parent. This can be handy when you want to let a node be clumped under more than just a single parent, yet not have the problem of a knotted mess of a general graph with cycles. You can still navigate a DAG easily, but there are multiple ways to get back to the root (because there can be more than one parent). A single DAG could in general have multiple roots but in practice may be better to just stick with one root, like a tree. If you understand single vs. multiple inheritance in OOP, then you know tree vs. DAG. I already answered this here.

Jamin
  • 111
  • 1
  • 3
1

The name tells you most of what you need to know of its definition: It's a graph where every edge only flows in one direction and once you crawl down an edge your path will never return you to the vertex you just left.

I can't speak to all the uses (Wikipedia helps there), but for me DAGs are extremely useful when determining dependencies between resources. My game engine for instance represents all loaded resources (materials, textures, shaders, plaintext, parsed json etc) as a single DAG. Example:

A material is N GL programs, that each need two shaders, and each shader needs a plaintext shader source. By representing these resources as a DAG, I can easily query the graph for existing resources to avoid duplicate loads. Say you want several materials to use vertex shaders with the same source code. It is wasteful to reload the source and recompile the shaders for every use when you can just establish a new edge to the existing resource. In this way you can also use the graph to determine if anything depends on a resource at all, and if not, delete it and free its memory, in fact this happens pretty much automatically.

By extension, DAGs are useful for expressing data processing pipelines. The acyclic nature means you can safely write contextual processing code that can follow pointers down the edges from a vertex without ever reencountering the same vertex. Visual programming languages like VVVV, Max MSP or Autodesk Maya's node-based interfaces all rely on DAGs.

-5

A directed acyclic graph is useful when you want to represent...a directed acyclic graph! The canonical example is a family tree or genealogy.

Jonathan Feinberg
  • 42,017
  • 6
  • 77
  • 101