0

I'm trying to run a basic pipe from one topic to another using Kafka Streams 0.10.2.1

    KStream<ByteBuffer, ByteBuffer> stream = builder
            .stream("transactions_load");

    stream.to("transactions_fact");

    KafkaStreams streams = new KafkaStreams(builder, config);
    streams.start();

If I watch the destination topic, I can see records are produced there. Records are produced for about 1 minute and then the process fails with the error below:

ERROR task [0_19] Error sending record to topic transactions_fact. No more offsets will be recorded for this task and the exception will eventually be thrown (org.apache.kafka.streams.processor.internals.RecordCollectorImpl:102) 
 [2017-10-02 16:30:54,516]org.apache.kafka.common.errors.TimeoutException: Expiring 24 record(s) for transactions_fact-5: 30012 ms has passed since last append
ERROR task [0_9] Error sending record to topic transactions_fact. No more offsets will be recorded for this task and the exception will eventually be thrown (org.apache.kafka.streams.processor.internals.RecordCollectorImpl:102) 
 [2017-10-02 16:30:54,519]org.apache.kafka.common.errors.TimeoutException: Expiring 24 record(s) for transactions_fact-5: 30012 ms has passed since last append
 ....
 [2017-10-02 16:30:54,650]org.apache.kafka.common.errors.TimeoutException: Expiring 24 record(s) for transactions_fact-14: 30068 ms has passed since last append
ERROR task [0_2] Error sending record to topic transactions_fact. No more offsets will be recorded for this task and the exception will eventually be thrown (org.apache.kafka.streams.processor.internals.RecordCollectorImpl:102) 
 [2017-10-02 16:30:54,650]org.apache.kafka.common.errors.TimeoutException: Expiring 24 record(s) for transactions_fact-14: 30061 ms has passed since last append
ERROR stream-thread [StreamThread-1] Failed to commit StreamTask 0_0 state:  (org.apache.kafka.streams.processor.internals.StreamThread:813) 
 [2017-10-02 16:31:02,355]org.apache.kafka.streams.errors.StreamsException: task [0_0] exception caught when producing
    at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.checkForException(RecordCollectorImpl.java:121)
    at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.flush(RecordCollectorImpl.java:129)
    at org.apache.kafka.streams.processor.internals.StreamTask$1.run(StreamTask.java:76)
    at org.apache.kafka.streams.processor.internals.StreamsMetricsImpl.measureLatencyNs(StreamsMetricsImpl.java:188)
    at org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:280)
    at org.apache.kafka.streams.processor.internals.StreamThread.commitOne(StreamThread.java:807)
    at org.apache.kafka.streams.processor.internals.StreamThread.commitAll(StreamThread.java:794)
    at org.apache.kafka.streams.processor.internals.StreamThread.maybeCommit(StreamThread.java:769)
    at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:647)
    at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:361)
Caused by: org.apache.kafka.common.errors.TimeoutException: Expiring 24 record(s) for transactions_fact-5: 30012 ms has passed since last append
...
ERROR stream-thread [StreamThread-1] Failed while executing StreamTask 0_19 due to flush state:  (org.apache.kafka.streams.processor.internals.StreamThread:503) 
 [2017-10-02 16:31:02,378]org.apache.kafka.streams.errors.StreamsException: task [0_19] exception caught when producing
    at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.checkForException(RecordCollectorImpl.java:121)
    at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.flush(RecordCollectorImpl.java:129)
    at org.apache.kafka.streams.processor.internals.StreamTask.flushState(StreamTask.java:422)
    at org.apache.kafka.streams.processor.internals.StreamThread$4.apply(StreamThread.java:555)
    at org.apache.kafka.streams.processor.internals.StreamThread.performOnTasks(StreamThread.java:501)
    at org.apache.kafka.streams.processor.internals.StreamThread.flushAllState(StreamThread.java:551)
    at org.apache.kafka.streams.processor.internals.StreamThread.shutdownTasksAndState(StreamThread.java:449)
    at org.apache.kafka.streams.processor.internals.StreamThread.shutdown(StreamThread.java:391)
    at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:372)
Caused by: org.apache.kafka.common.errors.TimeoutException: Expiring 24 record(s) for transactions_fact-5: 30012 ms has passed since last append
Exception in thread "StreamThread-1" org.apache.kafka.streams.errors.StreamsException: task [0_0] exception caught when producing
    at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.checkForException(RecordCollectorImpl.java:121)
    at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.flush(RecordCollectorImpl.java:129)
    at org.apache.kafka.streams.processor.internals.StreamTask$1.run(StreamTask.java:76)
    at org.apache.kafka.streams.processor.internals.StreamsMetricsImpl.measureLatencyNs(StreamsMetricsImpl.java:188)
    at org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:280)
    at org.apache.kafka.streams.processor.internals.StreamThread.commitOne(StreamThread.java:807)
    at org.apache.kafka.streams.processor.internals.StreamThread.commitAll(StreamThread.java:794)
    at org.apache.kafka.streams.processor.internals.StreamThread.maybeCommit(StreamThread.java:769)
    at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:647)
    at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:361)
Caused by: org.apache.kafka.common.errors.TimeoutException: Expiring 24 record(s) for transactions_fact-5: 30012 ms has passed since last append

Some more info:

  • I am running one instance of the streams app (on my laptop)
  • I am writing about 400 records per second into the source topic.
  • The source topic has 20 partitions
  • The target topic has 20 partitions

The error suggests that the problem is with producing to the target topic? What is the next step in debugging this further?

Chris Snow
  • 20,818
  • 29
  • 115
  • 263
  • I believe the error was due to network congestion. I have redeployed the app in the same data center and all appears ok so far. – Chris Snow Oct 04 '17 at 13:53
  • How could your application be made resilient to this error somehow? Mine gets stalled and no longer processes data. – xmar Jun 05 '18 at 10:19
  • @xmar same thing for me. the app turns into a zombie... even restarts won't fix it. – Lo-Tan Aug 09 '18 at 00:02
  • @Lo-Tan what's your infrastructure? – xmar Aug 09 '18 at 10:49
  • The problem ended up being a detached [lost] broker node in our dc/os|mesos cluster. We have 3 nodes running, but apparently a Kafka streams app couldn't function with only one node dropped out of the cluster. We're going to bump our instance count to 5 here soon, but no idea what was wrong as the topic the app was consuming from had a replication factor of 3. – Lo-Tan Aug 10 '18 at 15:28

0 Answers0