0

I have this use case that I have 2 topics

Topic 1 (Units) -> P0 / Topic 2 (Reservations) -> P0

I have a single consumer that needs to have the up to data from both topics/partitions in order to take the correct decision (either delete a unit if not reserved or reserve the unit if it exists)

I decided to have them in 2 different topics for concept segregation as well as at any single time I would need a 3rd service to interact with units only then that's possible in an easy way.

But the idea of how to handle the concurrent operation that could arise or maybe delayed events from a topic over another topic?

Thanks

Ahmed Alaa El-Din
  • 1,663
  • 10
  • 19

1 Answers1

2

assuming each individual topic (or partition) is ordered to your liking, you could achieve this using the pause(), resume() and seek() calls.

spin up a consumer for each topic, then pause() it if it gets too much "ahead" of the other, and resume() when the other catches back up.

this is basically merging two sorted lists, just over kafka.

depending on whether or not you use subscribe() or assign() you may still need to keep polling the paused consumer to not trigger a rebalance.

if the topics in question are not fully ordered (meaning some events are published late out of order) you'd need local state (basically a windowed stream to stream join). its possible to implement this yourself, but this is where stream processing frameworks start to come in handy.

radai
  • 22,610
  • 8
  • 59
  • 108
  • I don't have a proper vision how pause() and resume() would help to maintain concurrency issues. The only point that I see that scenario might be if using version comparison then by pausing, and need to write new data then you will have to resume the consuming to have the latest messages regarding this specific object, is that correct? – Ahmed Alaa El-Din Sep 03 '19 at 08:02
  • yes. you basically keep 2 ordered streams "in sync" by selectively pausing one and consuming the other. – radai Sep 03 '19 at 20:31