6

I have producer / consumer pattern in my app implemented used TPL Dataflow. I have the big dataflow mesh with about 40 blocks in it. There are two main functional parts in the mesh: producer part and consumer part. Producer supposed to continuosly provide a lot of work for consumer while consumer handling incoming work slowly sometimes. I want to pause producer when consumer is busy with some specified amount of work items. Otherwise the app consumes a lot of memory / CPU and behaves unsustainable.

I made demo app that demonstrates the issue:

mesh

using System;
using System.Linq;
using System.Threading.Tasks;
using System.Threading.Tasks.Dataflow;

namespace DataflowTest
{
    class Program
    {
        static void Main(string[] args)
        {
            var options = new ExecutionDataflowBlockOptions
            {
                MaxDegreeOfParallelism = 4,
                EnsureOrdered = false
            };

            var boundedOptions = new ExecutionDataflowBlockOptions
            {
                MaxDegreeOfParallelism = 4,
                EnsureOrdered = false,
                BoundedCapacity = 5
            };

            var bufferBlock = new BufferBlock<int>(boundedOptions);
            var producerBlock = new TransformBlock<int, int>(x => x + 1, options);
            var broadcastBlock = new BroadcastBlock<int>(x => x, options);

            var consumerBlock = new ActionBlock<int>(async x =>
            {
                var delay = 1000;
                if (x > 10) delay = 5000;

                await Task.Delay(delay);

                Console.WriteLine(x);
            }, boundedOptions);

            producerBlock.LinkTo(bufferBlock);
            bufferBlock.LinkTo(broadcastBlock);
            broadcastBlock.LinkTo(producerBlock);
            broadcastBlock.LinkTo(consumerBlock);

            bufferBlock.Post(1);

            consumerBlock.Completion.Wait();            
        }        
    }
}

The app prints something like this:

2
1
3
4
5
69055
69053
69054
69057
438028
438040
142303
438079

That means the producer keeps spinning and pushing messages to consumer. I want it to pause and wait until the consumer have finished current portion of work and then the producer should continue providing messages for consumer.

My question is quote similar to other question but it wasn't answered properly. I tried that solution and it doesn't work here allowing producer to flood the consumer with messages. Also setting BoundedCapacity doesn't work too.

The only solution I guess so far is make my own block that will monitor target block queue and acts according to target block's queue. But I hope it is kind of overkill for this issue.

Community
  • 1
  • 1
kseen
  • 1,606
  • 7
  • 49
  • 98
  • Have you considered using `Rx` instead? Take a look at this answer: http://stackoverflow.com/questions/2542764/tpl-vs-reactive-framework – Luc Morin Aug 13 '16 at 17:36
  • I hope there won't be a need for this since a lot of time spent on the Dataflow and it fits my needs good. – kseen Aug 13 '16 at 17:37
  • 1
    In your demo, the producer could produce all the messages on its own, without receiving messages from itself through the broadcast block. Is your real code like that too, or is that producer → producer cycle necessary? – svick Aug 14 '16 at 17:56
  • @svick But if I will link the producer to itself how will the consumer get the data coming from the producer? The real producer cycle consists of a bunch of blocks which actually load sequential comment pages, parse the pages and pass the comments to the consumer. – kseen Aug 14 '16 at 18:33
  • 1
    @kseen I was asking if the broadcast → producer link was necessary, or if it could be avoided. – svick Aug 14 '16 at 18:39
  • @svick Sorry for the misunderstanding. That would be great to leave it there but if it's critical and would lead to more elegant code we can remove that. – kseen Aug 14 '16 at 18:46
  • It seems to me that all this complexity introduced by the buffer-broadcast-produced circle is unneeded. All you need is just two blocks, a producer and a consumer, both configured with the `BoundedCapacity` option. In the specific case that the producer is a `TransformManyBlock` block, then the `BoundedCapacity` has no effect to the output queue of this block, but there are solutions [here](https://stackoverflow.com/questions/30994544/tpl-dataflow-block-consumes-all-available-memory). – Theodor Zoulias Jun 26 '20 at 16:20

2 Answers2

6

If you need to keep the producer → buffer → broadcast cycle intact, then you need to replace the broadcast block with some other block that still broadcasts messages it receives, but waits when one of its targets is full.

As long as you know the targets of that block when you're creating it, you can build it using ActionBlock (code copied from another answer of mine):

public static ITargetBlock<T> CreateGuaranteedBroadcastBlock<T>(
    DataflowBlockOptions options, params ITargetBlock<T>[] targets)
{
    var block = new ActionBlock<T>(
        async item =>
        {
            foreach (var target in targets)
            {
                await target.SendAsync(item);
            }
        }, new ExecutionDataflowBlockOptions
        {
            BoundedCapacity = options.BoundedCapacity,
            CancellationToken = options.CancellationToken
        });

    block.Completion.ContinueWith(task =>
    {
        foreach (var target in targets)
        {
            if (task.Exception != null)
                target.Fault(task.Exception);
            else
                target.Complete();
        }
    });

    return block;
}

Using this, you can declare the broadcast block:

var broadcastBlock = CreateGuaranteedBroadcastBlock(
    boundedOptions, producerBlock, consumerBlock);

(You will also need to remove the LinkTo lines that link from broadcastBlock.)

One issue with your original code that this does not fix is completion, but that's a hard problem in TPL Dataflow with cycles in general.

Community
  • 1
  • 1
svick
  • 214,528
  • 47
  • 357
  • 477
  • Regarding completion, what if my network is going to be continuous? Like there is no any completion in future, it should keep working while the app is working. – kseen Aug 14 '16 at 19:27
  • I just tried this `GuaranteedBroadcastBlock` in my demo application and it works like a charm! Perfect! Thank you so much. – kseen Aug 14 '16 at 19:33
  • That is the best case scenario: you don't need completion, so it's fine it does not work. – svick Aug 14 '16 at 19:43
0

It looks like your producer generates a sequence, so there is no need for the whole producer → buffer → broadcast cycle. Instead, all three blocks could be replaced by an async loop that generates the next item and then sends it to the consumer using await SendAsync():

Task.Run(async () =>
{
    int i = 1;
    while (true)
    {
        await consumerBlock.SendAsync(i);
        i++;
    }
    consumerBlock.Complete();
});

This way, once the consumer reaches its capacity, await SendAsync() will ensure that the producer waits until the consumer consumes an item.

If you wanted to encapsulate such producer into a dataflow block, so that you can e.g. link it to the consumer, you can.

Community
  • 1
  • 1
svick
  • 214,528
  • 47
  • 357
  • 477
  • My real "producer" is a set of blocks that load comments page (which contains the link to the next comments page), parse the content of current comment page, send the comments to the consumer and start this cycle again passing the address of next comment page to the first block in this producer cycle. So, unfortunately, this is not just a sequence. It is like linked sequence where each element in sequence has the address to the next element in it and the last element in the sequence doesn't have the address of next element. Sorry for this question is so simple. – kseen Aug 14 '16 at 19:01
  • I just made the diagram that represents the real situation better. Here you go: http://imgur.com/iEklfeG – kseen Aug 14 '16 at 19:10