3

I recently started using the TPL Dataflow library from .NET 4.5 and the whole concept of blocks is new to me. I'm implementing a producer-consumer queue in my application, and I need to protect against duplicate messages being put in the queue, and therefore the need to check if a message has already been queued. I am using a BufferBlock<Message> type (Message is a custom type). BufferBlock has the Count property but that doesn't help in this issue because the messages need to be uniquely identified.

Is there any way to check if a BufferBlock contains an item or go through all the items and inspect them? Is it possible to cast BufferBlock to something that allows iteration over the items? I'm following an example I saw on MSDN and it doesn't check if the item is in the queue, but I would think that checking the contents of a queue is a fairly needed operation. Any help is appreciated.

Anshul
  • 1,172
  • 1
  • 11
  • 33
  • You're lucky... I faced and fixed exactly this problem only a couple of days ago... – spender Dec 28 '13 at 00:45
  • @spender Glad I'm not the only one. I actually read one of the questions you posted while I was searching for an answer here: http://stackoverflow.com/questions/10068451/apparent-bufferblock-post-receive-receiveasync-race-bug – Anshul Dec 28 '13 at 00:56
  • Grr...never got to the bottom of that bug... I couldn't recreate a suitable testcase. I've re-adopted Dataflow recently and haven't run into any problems. – spender Dec 28 '13 at 01:05

1 Answers1

5

Rather than breaking into the BufferBlock, why not instead insert a TransformManyBlock into the chain that does this for you? You can use a HashSet, where the Add method only returns true if the item hasn't already been added. It ends up being quite simple, but storage requirements obviously increase with time...

void Main()
{
    var bb = new BufferBlock<string>();
    var db = DataflowEx.CreateDistinctBlock<string>();
    var ab = new ActionBlock<string>(x => Console.WriteLine(x));
    bb.LinkTo(db);
    db.LinkTo(ab);
    bb.Post("this");
    bb.Post("this");
    bb.Post("this");
    bb.Post("is");
    bb.Post("is");
    bb.Post("a");
    bb.Post("test");
}

public class DataflowEx
{
    public static TransformManyBlock<T, T> CreateDistinctBlock<T>()
    {
        var hs = new HashSet<T>();
        //hs will be captured in the closure of the delegate
        //supplied to the TransformManyBlock below and therefore
        //will have the same lifespan as the returned block.
        //Look up the term "c# closure" for more info
        return new TransformManyBlock<T, T>(
                         x => Enumerable.Repeat(x, hs.Add(x) ? 1 : 0));
    }
}

The reason this works is that, just like Linq's SelectMany, the TransformManyBlock effectively flattens out lists of lists. So, the TransformManyBlock takes a delegate that returns an IEnumerable<T>, but offers the items in the returned IEnumerable<T> one at a time. By returning an IEnumerable<T> that either has 0 or 1 items in it, we can effectively create Where-like behaviour, either allowing an item through or preventing it from passing, depending on whether or not some predicate is satisfied. In this case, the predicate is whether or not we can add the item to the captured HashSet.

spender
  • 106,080
  • 28
  • 202
  • 324
  • I'm trying to follow the code you provided. So as I understand, the messages go through the `TransformManyBlock`, and then to the `ActionBlock`? How does the `TransformBlock` keep track of all the items that have gone through it? The variable `hs` is a local to that call so it would get destroyed after the call? – Anshul Dec 28 '13 at 00:51
  • 1
    @Anshul Normally yes.. but not in the context of a lambda. – Simon Whitehead Dec 28 '13 at 00:53
  • @Anshul: Ok. I'll annotate my answer to address your additional questions. – spender Dec 28 '13 at 00:53
  • @SimonWhitehead That's interesting, so how long does the variable persist in the context of the lambda? – Anshul Dec 28 '13 at 00:57
  • It is promoted to a field of a compiler generated class. Look up closures for more info. @spender This is quite nice. I will be saving this for future reference :) – Simon Whitehead Dec 28 '13 at 01:00
  • @spender Thanks for the annotation, I will try out your solution. – Anshul Dec 28 '13 at 01:03
  • @Anshul, I mention linq's `SelectMany` above because it's effectively the same operation as is performed by the TransformManyBlock. You could probably rewrite all of Linq using only the SelectMany method (it wouldn't be so efficient though), so it's worth figuring out why it's so important. Bart de Smet has a good write up here:http://dotnet.dzone.com/news/selectmany-probably-the-most-p – spender Dec 28 '13 at 01:03
  • @SimonWhitehead I see, thanks for that info, I'll have to look it up. – Anshul Dec 28 '13 at 01:03
  • @spender That make sense. I'm assuming the hashset itself enforces the uniqueness of the items so we don't have to manually do it? – Anshul Dec 28 '13 at 01:06