0

streaming.concurrentJobs is not documented and used when we want to add parallel in our system. so multiple micro batch from same kafka topic can be processed concurrently. (if I understand correctly)

My question is whether it means there will be multiple thread runs in executor level? for example, we generally assume everything runs inside "foreachpartition" is with one single thread, and we do not do thread safe lock, but if we set spark.streaming.concurrentJobs >1, should we pat attention to thread safe? since multi thread will operate for same partition concurrently?

  • Please take a look at https://stackoverflow.com/questions/23528006/how-jobs-are-assigned-to-executors-in-spark-streaming – Natalia Jun 05 '20 at 06:29

1 Answers1

0

Thanks, I am more interested to know for concurrent>1 case, do I need to worry thread safe to process a partition? will there be multiple threading operate same partition, and we need to ensure thread safe if needed? or we can assume each partition is executed in single thread

Zhang Rui
  • 11
  • 1