0

I'm stuck on this problem and I can't figure out what's going on. I am trying to use Kafka streams to write a log to a topic. On the other end, I have Kafka-connect entering the each entry into MySQL. So, basically what I need is a Kafka streams program which reads a topic as strings and parses it into Avro format and then enters it into a different topic.

Here's the code I wrote:

        //Define schema
        String userSchema = "{"
                + "\"type\":\"record\","
                + "\"name\":\"myrecord\","
                + "\"fields\":["
                + "  { \"name\":\"ID\", \"type\":\"int\" },"
                + "  { \"name\":\"COL_NAME_1\", \"type\":\"string\" },"
                + "  { \"name\":\"COL_NAME_2\", \"type\":\"string\" }"
    + "]}";

        String key = "key1";
        Schema.Parser parser = new Schema.Parser();
        Schema schema = parser.parse(userSchema);

//Settings
       System.out.println("Kafka Streams Demonstration");
        //Settings
        Properties settings = new Properties();
        // Set a few key parameters
        settings.put(StreamsConfig.APPLICATION_ID_CONFIG, APP_ID);
        // Kafka bootstrap server (broker to talk to); ubuntu is the host name for my VM running Kafka, port 9092 is where the (single) broker listens
        settings.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
        // Apache ZooKeeper instance keeping watch over the Kafka cluster; ubuntu is the host name for my VM running Kafka, port 2181 is where the ZooKeeper listens
        settings.put(StreamsConfig.ZOOKEEPER_CONNECT_CONFIG, "localhost:2181");
        // default serdes for serialzing and deserializing key and value from and to streams in case no specific Serde is specified
        settings.put(StreamsConfig.KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
        settings.put(StreamsConfig.VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
        settings.put(StreamsConfig.STATE_DIR_CONFIG ,"/tmp");
        // to work around exception Exception in thread "StreamThread-1" java.lang.IllegalArgumentException: Invalid timestamp -1
        // at org.apache.kafka.clients.producer.ProducerRecord.<init>(ProducerRecord.java:60)
        // see: https://groups.google.com/forum/#!topic/confluent-platform/5oT0GRztPBo

        // Create an instance of StreamsConfig from the Properties instance
        StreamsConfig config = new StreamsConfig(getProperties());
        final Serde < String > stringSerde = Serdes.String();
        final Serde < Long > longSerde = Serdes.Long();
        final Serde<byte[]> byteArraySerde = Serdes.ByteArray();

        // building Kafka Streams Model                                                                                                                                                       
        KStreamBuilder kStreamBuilder = new KStreamBuilder();
        // the source of the streaming analysis is the topic with country messages                                                                                                            
        KStream<byte[], String> instream =
            kStreamBuilder.stream(byteArraySerde, stringSerde, "sqlin");

       final KStream<byte[], GenericRecord> outstream = instream.mapValues(new ValueMapper<String, GenericRecord>() {
            @Override
            public GenericRecord apply(final String record) {
                System.out.println(record);
                GenericRecord avroRecord = new GenericData.Record(schema);
                String[] array = record.split(" ", -1);
                for (int i = 0; i < array.length; i = i + 1) {
                    if (i == 0)
                        avroRecord.put("ID", Integer.parseInt(array[0]));
                    if (i == 1)
                        avroRecord.put("COL_NAME_1", array[1]);
                    if (i == 2)
                        avroRecord.put("COL_NAME_2", array[2]);
                }
                System.out.println(avroRecord);
                return avroRecord;
            }
          });
        outstream.to("sqlout");

Here's the output after I get a Null Pointer Exception:

java -cp streams-examples-3.2.1-standalone.jar io.confluent.examples.streams.sql
Kafka Streams Demonstration
Start
Now started CountriesStreams Example
5 this is
{"ID": 5, "COL_NAME_1": "this", "COL_NAME_2": "is"}
Exception in thread "StreamThread-1" java.lang.NullPointerException
    at org.apache.kafka.streams.processor.internals.SinkNode.process(SinkNode.java:81)
    at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:83)
    at org.apache.kafka.streams.kstream.internals.KStreamMapValues$KStreamMapProcessor.process(KStreamMapValues.java:42)
    at org.apache.kafka.streams.processor.internals.ProcessorNode$1.run(ProcessorNode.java:48)
    at org.apache.kafka.streams.processor.internals.StreamsMetricsImpl.measureLatencyNs(StreamsMetricsImpl.java:188)
    at org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:134)
    at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:83)
    at org.apache.kafka.streams.processor.internals.SourceNode.process(SourceNode.java:70)
    at org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:197)
    at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:627)
    at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:361)

The topic sqlin contains a few message which consists of a digit followed by two words. Note the two print lines: The function gets one message, and successfully parses it before catching a null pointer. The problem is I am new to Java, Kafka, and Avro so I'm not sure where I'm going wrong. Did I set up the Avro Schema right? Or am using kstream wrong? Any help here would be greatly appreciated.

Omal Perera
  • 2,535
  • 2
  • 18
  • 26
Nishant
  • 11
  • 4
  • Take a look at this [answer](https://stackoverflow.com/questions/218384/what-is-a-nullpointerexception-and-how-do-i-fix-it) – Fady Saad Jun 08 '17 at 04:34

1 Answers1

2

I think the problem is the following line:

outstream.to("sqlout");

Your application is configured to, by default, use a String serde for record keys and record values:

settings.put(StreamsConfig.KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
settings.put(StreamsConfig.VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());

Since outstream has type KStream<byte[], GenericRecord>, you must provide explicit serdes when calling to():

// sth like
outstream.to(Serdes.ByteArray(), yourGenericAvroSerde, "sqlout");

FYI: The next version of Confluent Platform (ETA: this month = June 2017) will ship with a ready-to-use generic + specific Avro serde that integrates with Confluent schema registry. This should make your life easier.

See my answer at https://stackoverflow.com/a/44433098/1743580 for further details.

Michael G. Noll
  • 12,674
  • 3
  • 40
  • 58
  • Btw, you may also want to take a look at our new end-to-end demo for Kafka Connect + Kafka Streams integration: https://github.com/confluentinc/examples/tree/3.2.x/kafka-connect-streams – Michael G. Noll Jun 08 '17 at 10:41