I am working on a project and i am stuck on following scenario.
I have a table: superMerge(id, name, salary)
and I have 2 other tables: table1 and table2
all the tables ( table1, table2 and superMerge) has same structure.
Now, my challenge is to insert/update superMerge table from table1 and table2. table1 is updated every 10mins and table2 every 20 mins therefore at time t=20mins i have 2 jobs trying to update same table(superMerge in this case.)
I want to understand how can i acheive this parallel insert/update/merge into superMerge table using Spark or any other hadoop application.