Can I make TPL DataFlow BatchBlock Singleton?

Question

All the example I can find shows that I have to call .Complete() to get messages pushed to next node, I am wondering If I make BatchBlock Singlton, and have it receive messasges continuously, auto push messagses to next block when messasges reach batchsize? any downside of using BatchBlock this way?

No, you don't need to call `.Complete()`, unless you have **less** messages than the batch size. You call `Complete()` on the *root* block when you *complete* your work and want to close down the pipeline. Completion will propagage to the block and make it push any remaining messages to the next step. — Panagiotis Kanavos, Sep 08 '17 at 15:11

Panagiotis Kanavos · Answer 1 · 2018-06-25T07:40:00.073

The question is a bit weird, since the you don't have to create a new batch block each time. You create one instance and pump messages to it. That's what all tutorials and examples show.

Lets say you want to read files and send the contents to a database. You could use one block to read the file contents, another to batch records together and a final one that would send data to the database one batch at a time. This would look like this :

var readerBlock=new TransformManyBlock<string,string>(path=>File.ReadLines(path));
var batchBlock=new BatchBlock<string,string>(500);
var dbBlock = new ActionBlock<string[]>(batch=>MyBulkInsertMethod(batch);

var linkOptions = DataflowLinkOptions{PropagateCompletion=true};

readerBlock.LinkTo(batchBlock,linkOptions);
batchBlock.LintTo(dbBlock,linkOptions);

//Start pumping files

forech(var file in Directory.EnumerateFiles(someFolder)
{
    readerBlock.Post(file);
}

//Finished pumping, tell the reader 
readerBlock.Complete();
//Wait untile all messages reach the database block and get processed
await dbBlock.Completion;

You have to explicitly specify that when one block completes, its linked blocks will also complete. That's what PropagateCompletion does. While this may seem a strange choice for a simple pipeline, TPL Dataflow is used to create arbitrarily complex meshes of steps. In this case you'd want to be able to control what gets completed when explicitly.

Once we pump all files to the first block, readerBlock we tell it that we are finished. When that block finishes processing it will signal the next block in the pipeline.

BatchBlock only sends messages when a batch is complete, in this case when it gathers 500 lines. The final batch will probably contain less lines. It would never be sent if Complete() wasn't called on it. With PropagateCompletion though, completion will propagage to BatchBlock and make it send the leftovers to the next block.

Finally, we await the Completion task on the last block, to ensure all messages are written to the database

You don't actually use `linkOptions` anywhere. Where does that option go? — mınxomaτ, Jun 24 '18 at 14:34

Can I make TPL DataFlow BatchBlock Singleton?

1 Answers1