7

I'm playing around with Storm, and I'm wondering where Storm specifies (if possible) the (tumbling/sliding) window size upon an aggregation. E.g. If we want to find the trending topics for the previous hour on Twitter. How do we specify that a bolt should return results for every hour? Is this done programatically inside each bolt? Or is it some way to specify a "window" ?

Matthias J. Sax
  • 51,194
  • 7
  • 89
  • 108
gronnbeck
  • 803
  • 1
  • 7
  • 16

2 Answers2

17

Disclaimer: I wrote the Trending Topics with Storm article referenced by gakhov in his answer above.

I'd say the best practice is to use the so-called tick tuples in Storm 0.8+. With these you can configure your own spouts/bolts to be notified at certain time intervals (say, every ten seconds or every minute).

Here's a simple example that configures the component in question to receive tick tuples every ten seconds:

// in your spout/bolt
@Override
public Map<String, Object> getComponentConfiguration() {
    Config conf = new Config();
    int tickFrequencyInSeconds = 10;
    conf.put(Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS, tickFrequencyInSeconds);
    return conf;
}

You can then use a conditional switch in your spout/bolt's execute() method to distinguish "normal" incoming tuples from the special tick tuples. For instance:

// in your spout/bolt
@Override
public void execute(Tuple tuple) {
    if (isTickTuple(tuple)) {
        // now you can trigger e.g. a periodic activity
    }
    else {
        // do something with the normal tuple
    }
}

private static boolean isTickTuple(Tuple tuple) {
    return tuple.getSourceComponent().equals(Constants.SYSTEM_COMPONENT_ID)
        && tuple.getSourceStreamId().equals(Constants.SYSTEM_TICK_STREAM_ID);
}

Again, I wrote a pretty detailed blog post about doing this in Storm a few days ago as gakhov pointed out (shameless plug!).

Community
  • 1
  • 1
Michael G. Noll
  • 12,674
  • 3
  • 40
  • 58
  • We ended up using tick tuples for "triggering" an aggregating function (bolt). Thanks a bunch=) – gronnbeck Jun 19 '13 at 09:24
  • Hi Michael, I am wondering about this: while storm is running, can I somehow change the frequency of the tick tuples? If we can, we can change the frequency at which storm will write the log of the trending results, or it can change the window size at which storm is calculating the topic trend. Thanks! – user2895478 Sep 18 '14 at 02:55
  • AFAIK you can't change the tick frequency at runtime. – Michael G. Noll Sep 18 '14 at 07:22
1

Add a new spout with parallelism degree of 1, and have it emit an empty signal and then Utils.sleep until next time (all done in nextTuple). Then, link all relevant bolts to that spout using all-grouping, so all of their instances will receive that same signal.

Moshe Bixenshpaner
  • 1,750
  • 1
  • 16
  • 21