2

I have a topic where all logs are pushed to centralized topic but I would like to filter out some of those records to a separate topic and cluster if possible.

Thanks

OneCricketeer
  • 126,858
  • 14
  • 92
  • 185
user432024
  • 3,572
  • 6
  • 39
  • 70
  • Possible duplicate of [Kafka Streams - connecting to multiple clusters](https://stackoverflow.com/questions/45847690/kafka-streams-connecting-to-multiple-clusters) – OneCricketeer Oct 28 '18 at 21:52

1 Answers1

4

Kafka streams not allow to create stream with source and output topics from different Kafka clusters. So the following code will not work for you

streamsBuilder.stream(sourceTopicName).filter(..).to(outputTopicName)

in this case it expects that outputTopicName is from the same cluster as topic sourceTopicName.

As a workaround, in order to send messages into output topic from another cluster, you could use additionally created KafkaProducer with property bootstrap.servers that will point to external cluster and KStream.foreach() method.

streamsBuilder.stream(sourceTopicName)
    .filter((key, value) -> ..)
    .foreach((key, value) -> 
        sendMessage(kafkaProducerFromAnotherCluster, destinationTopicName, key, value);


public static void sendMessage(KafkaProducer<String, String> kafkaProducer, 
                               String destinationTopicName, String key, String value) {
    try {
        kafkaProducer.send(new ProducerRecord(destinationTopicName, key, value));
    } catch (RuntimeException ex) {
        log.error(errorMessage, ex);
    }
}

Another option is to create output topic in your Kafka cluster that will have filtered messages and setup Kafka Mirroring between two clusters (so messages will be copied from one topic to second from another cluster).

Vasyl Sarzhynskyi
  • 2,854
  • 2
  • 17
  • 37
  • 1
    If you create a `KafkaProducer` and us within `foreach()`, you will need to do sync write to the target cluster. Otherwise, you can loose data if something crashes! This will have a performance impact of course. Thus, the recommended approach is, to write the result back into the first cluster, and replicate the result topic into the target cluster. – Matthias J. Sax Oct 28 '18 at 19:27
  • @MatthiasJ.Sax do you mean replicate with `Kafka MirrorMaker`? – Vasyl Sarzhynskyi Oct 28 '18 at 19:29
  • `MirrorMaker` is one tool you can use. There are other open-source and proprietary cross-cluster replication tools, too. – Matthias J. Sax Oct 28 '18 at 19:33
  • So we cannot create two separate streams with separate connection properties and have one stream write to the other stream? Referring to this: https://docs.confluent.io/3.1.1/streams/concepts.html#processor-topology – user432024 Oct 29 '18 at 20:52
  • So, KStream stream 2 = stream1.toStream(); stream2.to("OUTPUT_TOPIC"); Or basically that copies the stream connection properties over? – user432024 Oct 29 '18 at 21:08
  • @user432024 could you please be more specific? which option do you want to use: using KafkaProducer inside foreach or mirroring? – Vasyl Sarzhynskyi Oct 30 '18 at 06:41
  • Neither. I was thinking through stream chaining using toStream(). But I guess the first stream always holds the connection info regardless. – user432024 Oct 30 '18 at 09:35