Does Kafka python API support stream processing?

Question

I have used Kafka Streams in Java. I could not find similar API in python. Do Apache Kafka support stream processing in python?

There is https://github.com/wintoncode/winton-kafka-streams -- this is not part of Apache Kafka. I don't know how stable it is and if it's suitable for production yet. — Matthias J. Sax, Aug 19 '18 at 17:43

OneCricketeer · Accepted Answer · 2020-10-28T17:47:20.197

17

Kafka Streams is only available as a JVM library, but there are at least two Python implementations of it

robinhood/faust
wintincode/winton-kafka-streams (appears not to be maintained)

In theory, you could try playing with Jython or Py4j to support it the JVM implementation, but otherwise you're stuck with consumer/producer or invoking the KSQL REST interface.

Outside of those options, you can also try Apache Beam, Flink or Spark, but they each require an external cluster scheduler to scale out.

edited Oct 28 '20 at 17:47

answered Aug 19 '18 at 15:59

OneCricketeer

126,858
14
92
185

Is there any example or tutorials to use https://docs.confluent.io/current/ksql/docs/tutorials/index.html#ksql-tutorials with faust streaming? – Mahamutha M Apr 08 '19 at 06:47
KSQL is implemented in Java, so I'm not sure I understand the question – OneCricketeer Apr 08 '19 at 22:31
@circket_007, KSQL is not available in python. This is what you mean. Am I right? – Mahamutha M Apr 09 '19 at 04:09
3

@Maha KSQL server has a REST API, so you can submit queries from any language – OneCricketeer Apr 11 '19 at 00:58

TruthTeller · Answer 2 · 2021-03-21T14:34:46.013

If you are using Apache Spark, you can use Kafka as producer and Spark Structured Streaming as consumer. No need to rely on 3rd part libraries like Faust.

To consume Kafka data streams in Spark, use the Structured Streaming + Kafka Integration Guide.

Keep in mind that you will have to append spark-sql-kafka package when using spark-submit:

spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.1 StructuredStreaming.py

This solution has been tested with Spark 3.0.1 and Kafka 2.7.0 with PySpark.

This resource can also be useful.

Does Kafka python API support stream processing?

2 Answers2