Questions tagged [dataflow]

Dataflow programming is a programming paradigm in which computations are modeled through directed graphs: nodes are instructions and data flows through the connections between them.

Dataflow programming is a programming paradigm which models programs as directed graphs and calculation proceeds in a way similar to electrical circuits. More precisely:

  • nodes are instructions that takes one or more inputs, perform calculation on them and present the result(s) as output;
  • edges connects inputs and outputs of the instructions -- this way the output of an instruction can be fed directly to the input on another node to trigger another calculation;
  • data "travels" using the directed edges and triggers the instructions as they pass through the nodes.

Often dataflow programming languages are visual, the most prominent example being LabView.

Resources

964 questions
10
votes
3 answers

More efficiently compute transitive closures of each dependents while incrementally building the directed graph

I need to answer the question: given a node in a dependency graph, group its dependents by their own transitive dependents which would be impacted by a particular start node. In other words, given a node in a dependency graph, find the set of sets…
10
votes
5 answers

How do I get SSIS Data Flow to put '0.00' in a flat file?

I have an SSIS package with a Data Flow that takes an ADO.NET data source (just a small table), executes a select * query, and outputs the query results to a flat file (I've also tried just pulling the whole table and not using a SQL select). The…
theog
  • 1,044
  • 2
  • 19
  • 32
9
votes
1 answer

Using Clojure DataFlow programming idioms

Can someone explain why and how I would use the Clojure Dataflow programming API as I can't seem to find much about it on the internet.
yazz.com
  • 52,748
  • 62
  • 227
  • 363
8
votes
3 answers

How to Monitor/inspect data/attribute flow in Java code

I have a use case when I need to capture the data flow from one API to another. For example my code reads data from database using hibernate and during the data processing I convert one POJO to another and perform some more processing and then…
M.J.
  • 14,866
  • 25
  • 70
  • 95
8
votes
1 answer

Dataflow processing

I have a class of computations that seems to naturally take a graph structure. The graph is far from linear, as there are multiple inputs as well as nodes that fan out and nodes that require the result of several other nodes. In all of these…
em70
  • 5,958
  • 4
  • 43
  • 79
8
votes
3 answers

TPL Dataflow block consumes all available memory

I have a TransformManyBlock with the following design: Input: Path to a file Output: IEnumerable of the file's contents, one line at a time I am running this block on a huge file (61GB), which is too large to fit into RAM. In order to avoid…
brianberns
  • 6,160
  • 2
  • 29
  • 35
8
votes
4 answers

Dataflow Programming API for Java?

I am looking for a Dataflow / Concurrent Programming API for Java. I know there's DataRush, but it's not free. What I'm interested in specifically is multicore data processing, and not distributed, which rules out MapReduce or Hadoop. Any…
Rollo Tomazzi
  • 3,020
  • 3
  • 26
  • 21
7
votes
0 answers

grpc StatusRuntimeException on Dataflow

I have a dataflow pipeline in which I consume pubsub messages, treat them, and then publish to pubsub. Whenever I have too many calculations (ie I increase the amount of treatment for each message) I get an Exception. :…
7
votes
3 answers

How to get apache beam for dataflow GCP on Python 3.x

I'm very newby with GCP and dataflow. However , I would like to start to test and deploy few flows harnessing dataflow on GCP. According to the documentation and everything around dataflow is imperative use the Apache project BEAM. Therefore and…
7
votes
2 answers

ClassNotFound exception when attempting to use DataflowRunner

I'm trying to launch a Dataflow job on GCP using Apache Beam 0.6.0. I am compiling an uber jar using the shade plugin because I cannot launch the job using "mvn:execjava". I'm including this dependency:
7
votes
2 answers

What's the crucial difference between Angular 2 Data Flow and Flux?

Hi I am studying Angular 2 and React + Redux right now, and I have a question on the difference of the difference in data flow of those two choices. Angular 2 uses uni-directional data flow by default. Redux is a Flux implementation, which (also)…
sangyongjung
  • 151
  • 1
  • 5
7
votes
2 answers

How to skip last row in the SSIS data flow

I am using FlatFile Source Manager --> Script COmponent as Trans --> OLEDB destination in my data flow. Source reads all the rows from flat file and i want to skip the last row (Trailer record) updating the database. Since it contains the NULL…
VHK
  • 173
  • 4
  • 12
7
votes
6 answers

Reference manual for Apache Pig Latin

Pig is a dataflow programming environment for processing very large files. Pig's language is called Pig Latin. Does anyone know of a good reference manual for PigLatin? I'm looking for something that includes all the syntax and commands descriptions…
Ori lahav
  • 125
  • 1
  • 2
7
votes
3 answers

MQ to process, aggregate and publish data asynchronously

Some background, before getting to the real question: I am working on a back-end application that consists of several different modules. Each module is, currently, a command-line java application, which is run "on demand" (more details later). Each…
Lorenzo Dematté
  • 6,161
  • 2
  • 33
  • 71
6
votes
2 answers

spring data flow : IAM role assignment to pods using pod-annotations

We are currently in the process of deploying a new spring data flow stream application in our aws EKS cluster. As part of this, the pods launched by the skipper should have the IAM roles defined in the annotation so that they can access the required…
1
2
3
64 65