79

What is a dataflow programming language? Why use it? And are there any benefits to it?

Peter Mortensen
  • 28,342
  • 21
  • 95
  • 123
Anton
  • 11,675
  • 20
  • 59
  • 83

10 Answers10

107

In a control flow language, you have a stream of instructions which operate on external data. Conditional execution, jumps and procedure calls change the instruction stream to be executed. This could be seen as instructions flowing through data (for example, instructions operate on registers which are loaded with data by instructions - the data is static unless the instruction stream moves it). A control flow "if" statement jumps to the correct branch in the instruction stream, but the data does not get moved.

In a dataflow language, you have a stream of data which is passed from instruction to instruction to be processed. Conditional execution, jumps and procedure calls route the data to different instructions. This could be seen as data flowing through otherwise static instructions like how electrical signals flow through circuits or water flows through pipes. A dataflow "if" statement would route the data to the correct branch.

Some examples of dataflow features and languages:

Textual languages

Visual Languages

Products which embed a visual dataflow language:

Cœur
  • 32,421
  • 21
  • 173
  • 232
  • Adding to the lesser known list: – Beef Feb 25 '11 at 16:23
  • another one is called expecco, complete with GUI (google and download the demo) – blabla999 Apr 02 '11 at 11:59
  • What's an example of something that's NOT dataflow? – solvingPuzzles Sep 28 '12 at 04:40
  • Most languages not listed above. C++, Java, Python, COBOL and pretty much any other imperative language out there, most OO languages, most functional language, logic programming languages like Prolog etc etc –  Dec 29 '12 at 01:18
  • 2
    Why does no one mention Unix pipelines as a common example of dataflow programming? Even Wikipedia doesn't mention it. Is there something that disqualifies it? – Sridhar Sarnobat Aug 13 '14 at 19:06
  • @Sridhar-Sarnobat just haven't thought of it, but you're right. Feel free to edit my answer. –  Aug 16 '14 at 18:32
  • I found many of these when researching into visual programming. The "dataflow" paradigm seems more suited to these than imperative programming models. – masterxilo Feb 28 '17 at 23:29
  • @masterxilo I think a big part of it is that its easy to visualise flow of data through static instructions, so the paradigm (or one of the many variants: dataflow is really a very loose term referring to many flavours) is well suited to visual representation and therefore is often presented that way. I'm personally interested in languages with both textual and visual interchangeable representations. For imperative languages, its harder to visualise flow of data, so the best you can do is visualise the relationship between instructions (like a flow chart). –  Jul 05 '17 at 11:40
  • are high-level shading language like GLSL/HLSL considered as dataflow programming languages ? – tigrou Dec 11 '18 at 21:54
  • @tigrou I've not thought about it. I'm by no means an expert and the definition is very subjective, but I'd say that the languages themselves are not, but their runtime model is. What I mean is that the GLSL/HLSL code itself is standard control flow: you have a stream of instructions that get executed in turn and you can jump around this instruction stream (conditionals, function calls and loops), however the runtime model seems like a dataflow model, as the buffer objects (vertex data/fragment data) get streamed through the shader pipeline. Data and control flow are two sides of the same coin –  Dec 20 '18 at 00:55
  • do you know some books talking about data flow programming? I find some but few relevant on amazon – Webwoman Jan 29 '19 at 15:48
  • Small correction: Esterel is a control-flow centric language. Both, Lustre and Esterel belong to the synchronous languages. Lustre is a typical example for a dataflow language of this domain, whereas Esterel is an example for a control-flow language of this domain. – mihca May 13 '19 at 07:48
  • @mihca Ah, you are correct. I've removed it. Interestingly, I've actually developed a synchronous (control flow) language myself recently. Its a nice model (both to implement and work in), if you can get away with it. –  Jun 09 '19 at 17:12
28

Dataflow programming languages are ones that focus on the state of the program and cause operations to occur according to any change in the state. Dataflow programming languages are inherently parallel, because the operations rely on inputs that when met will cause the operation to execute. This means unlike a normal program where one operation is followed by the next operation, in a dataflow program operations will execute as long as the inputs are met and thus there is no set order.

Often dataflow programming languages use a large hashtable where the keys are the data of the program and the values of the table are pointers to the operations of the program. This makes multicore programs easier to create in a dataflow programming language, since each core would only need the hashtable to work.

A common example of a dataflow programming language is a spread sheet program which has columns of data that are affected by other columns of data. Should the data in one column change, other data in the other columns will probably change with it. Although the spread sheet program is the most common example of a dataflow programming language, most of them tend to be graphical languages.

Anton
  • 11,675
  • 20
  • 59
  • 83
17

One kind of dataflow programming is reactive programming. When this style of programming is used in a functional language, it's called functional reactive programming. An example of a functional reactive programming language for the web is Flapjax.

Also, anic is a dataflow language recently discussed on Hacker News.

Another example is Martlet from Oxford.

Jeff Hammerbacher
  • 4,081
  • 2
  • 27
  • 35
  • 1
    +1 for mentioning reactive programming. – Jus12 Mar 21 '14 at 05:48
  • Interestingly, this answer resulted in a discussion on SO: "https://stackoverflow.com/questions/30685707/what-is-the-difference-between-dataflow-programming-and-reactive-programming". It might depend on the academic definition whether reactive programming and dataflow are the same. I can agree that it is "one kind of dataflow programming". – mihca May 13 '19 at 08:21
10

Dataflow programming languages propose to isolate some local behaviors in so called "actors", that are supposed to run in parallel and exchange data through point-to-point channels. There is no notion of central memory (both for code and data) unlike the Von Neumann model of computers.

These actors consume data tokens on their inputs and produce new data on their outputs.

This definition does not impose the means to run this in practice. However, the production/consumption of data needs to be analyzed with care: for example, if an actor B does not consume at the same speed as the actor A that produce the data, then a potentially unbounded memory (FIFO) is required between them. Many other problems can arise like deadlocks.

In many cases, this analysis will fail because the interleaving of the internal behaviors is intractable (beyond reach of today formal methods).

Despite this, dataflow programming languages remain attractive in many domains:

  • for instance to define reference models for video encoding : a pure C program won't do the job because it makes the assumption that everything run as a sequence of operations, which is not true in computers (pipeline, VLIW, mutlicores, and VLSI). Maybe you could have a look at this: recent PhD thesis. CAL dataflow language is proposed as a unifying language for next generation video encoders/decoders reference.
  • Mission critical where safety is required: if you add some strong assumptions on the production/consumption of data, then you get a language with strong potential in terms of code generation, proofs, etc. (see synchronous languages)
Peter Mortensen
  • 28,342
  • 21
  • 95
  • 123
JCLL
  • 5,059
  • 4
  • 38
  • 62
5

Excel (and other spreadsheets) are essentially dataflow languages. Dataflow languages are a lot like functional programming languages, except that the values at the leaves of the whole program graph are not values at all, but variables (or value streams), so that when they change, the changes ripple and flow up the graph.

Barry Kelly
  • 39,856
  • 4
  • 99
  • 180
  • 1
    I don't agree; constraint-solvers generally work by discovering optima within search spaces by propagating constraints. Spreadsheets propagate values, not constraints. – Barry Kelly Jan 20 '09 at 16:19
  • 1
    Solve is a supplementary feature most people don't use. And trees are graphs too; moreover, if any two cells refer to the same third cell, they form a dag and are no longer a tree. – Barry Kelly Jan 20 '09 at 21:50
  • 4
    "Spreadsheets propagate values, not constraints." Yes, the flow of values is what makes it "dataflow". –  Jul 30 '09 at 10:15
5

Mozart has support for dataflow-like synchronization, and it does have some commercial applications. You could also argue that make is a dataflow programming language.

Peter Mortensen
  • 28,342
  • 21
  • 95
  • 123
JesperE
  • 59,843
  • 19
  • 133
  • 192
2

Many ETL tools are also in this realm. The dataflow tasks in MS SSIS are a good example. Graphical tool in this case.

Teun D
  • 4,864
  • 1
  • 32
  • 41
2

It is actually quite an old concept - in the 1970s, there was even a language + machine built for efficient dataflow programming and execution (Manchester Dataflow Machine).

The great thing about it is its duality to lazy functional languages like Haskell. Therefore, if your processing steps are pure functional, and given you have enough processing units to evaluate them and pass results around, you get maximum parallelity for free - automatically and without any programming effort!

Peter Mortensen
  • 28,342
  • 21
  • 95
  • 123
blabla999
  • 3,074
  • 19
  • 22
1

You could try Cameleon: www.shinoe.org/cameleon which seems to be simple to use. It's a graphical language for functional programming which has a data(work)-flow approach.

It's written in C++, but it can call any type of local or distant programs written in any programming language.

It has a multi-scale approach and seems to be Turing complete (this is a Petri net extension).

Peter Mortensen
  • 28,342
  • 21
  • 95
  • 123
Myosis.sh
  • 31
  • 2
  • a |> f = f a . Is this data flow in Haskell (it gets the average of a list of items from 1..200)? [1..200] |> map (*5) |> filter (> 66) |> dup ( sum, length) |> uncurry (div) where dup (f1, f2) v = (f1 v, f2 v) – aoeu256 Jul 20 '19 at 20:12
1

There are certain domains where dataflow programming just makes a lot more sense. Realtime media is one example, and two widely used graphical dataflow programming environments, Pure Data and Max/MSP, are both focused on realtime media programming. I suppose their visual nature also maps nicely to the dataflow programming.