Questions tagged [oozie]

Oozie is a workflow/coordination system to manage Hadoop Map Reduce jobs

Oozie is a workflow scheduler system to manage Apache Hadoop jobs.

Oozie Workflow jobs are Directed Acyclical Graphs (DAGs) of actions.

Oozie Coordinator jobs are recurrent Oozie Workflow jobs triggered by time (frequency) and data availabilty.

Oozie is integrated with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Java map-reduce, Streaming map-reduce, Pig, Hive, Sqoop and Distcp) as well as system specific jobs (such as Java programs and shell scripts).

Oozie is a scalable, reliable and extensible system.

References

Related Tags

1941 questions
27
votes
3 answers

DAG(directed acyclic graph) dynamic job scheduler

I need to manage a large workflow of ETL tasks, which execution depends on time, data availability or an external event. Some jobs may fail during execution of the workflow and the system should have the ability to restart a failed workflow branch…
22
votes
1 answer

Which one to choose Apache Oozie or Apache Airflow? Need a comparison

I am new to job schedulers and was looking out for one to run jobs on big data cluster. I was quite confused with the available choices. Found Oozie to have many limitations as compared to the already existing ones such as TWS, Autosys, etc. Need…
Vishal786btc
  • 379
  • 1
  • 2
  • 15
14
votes
3 answers

What is difference between Oozie workflow, coordinator and bundle

What is difference between Oozie workflow, coordinator and bundle ? Oozie workflow defines a sequence of actions. And we need to invoke it manually every time we want it to run. Where as same workflow can be scheduled through coordinator. Is this…
Kaushik Lele
  • 5,681
  • 9
  • 44
  • 68
13
votes
9 answers

Running shell script with Oozie

I am trying to run a sh script through Oozie, but I am facing a problem: Cannot run program "script.sh" (in directory "/mapred/local/taskTracker/dell/jobcache/job_201312061003_0001/attempt_201312061003_0001_m_000000_0/work"): …
user3072994
  • 141
  • 1
  • 1
  • 3
11
votes
1 answer

Ext JS library not installed correctly in Oozie

I'm getting the following message when I access to the oozie UI. Oozie web console is disabled. To enable Oozie web console install the Ext JS library. I'm using HDP distribution and installed through ambari service installer. I tried to follow…
JaviOverflow
  • 1,142
  • 1
  • 9
  • 20
11
votes
2 answers

IOException: Filesystem closed exception when running oozie workflow

We are running a workflow in oozie. It contains two actions: the first is a map reduce job that generates files in the hdfs and the second is a job that should copy the data in the files to a database. Both parts are done successfully but oozie…
user3660070
  • 111
  • 1
  • 1
  • 4
11
votes
3 answers

oozie timezone settings

i am new to oozie and having problem in changing oozie default time zone. I am writing oozie coordinator job and have tried to specify timezone like
Junaid
  • 111
  • 1
  • 1
  • 3
11
votes
2 answers

Can I submit an oozie job with multiple configuration files?

From the Oozie CLI I want to do something like this: oozie job -oozie http://host:port/oozie -config jobConfig.properties, baseConfig.properties -submit I have a lot of different jobs I'm running where a portion of the .properties file is…
Tim Goodman
  • 20,835
  • 7
  • 54
  • 80
10
votes
2 answers

oozie create a parameter with today date

How can I create a parameter with today date of the format : yyyy-mm-dd in oozie. I am passing this variable to hive script which is adding the partition for that date, I found the function to create timestamp using :…
bigData
  • 1,268
  • 4
  • 14
  • 26
10
votes
2 answers

how to deploy and run oozie job?

I'm trying to do a simple job using oozie. It will be a one simple Pig Action. I have a file : FirstScript.pig containing: dual = LOAD 'default.dual' USING org.apache.hcatalog.pig.HCatLoader(); store dual into 'dummy_file.txt' using…
psmith
  • 1,580
  • 4
  • 31
  • 56
9
votes
1 answer

Difference between job, application, task, task attempt logs in Hadoop, Oozie

I'm running an Oozie job with multiple actions and there's a part I could not make it work. In the process of troubleshooting I'm overwhelmed with lots of logs. In YARN UI (yarn.resourceman­ager.webapp.address in yarn-site.xml, normally on port…
oikonomiyaki
  • 6,659
  • 11
  • 52
  • 85
9
votes
2 answers

Hadoop job fails, Resource Manager doesnt recognize AttemptID

Im trying to aggregate some data in an Oozie workflow. However the aggregation step fails. I found two points of interests in the logs: The first is an error(?) that seems to occur repeatedly: After a container finishes, it gets killed but exits…
h2b
  • 279
  • 3
  • 13
9
votes
3 answers

Oozie Job Error - java.io.IOException: configuration is not specified

I have created one oozie workflow for hive script to load data in a table. My workflow.xml contains -
Sneha
  • 131
  • 1
  • 1
  • 5
9
votes
1 answer

How oozie handle dependencies?

I have several questions about oozie 2.3 share libraries: Currently, I defined the share libraries in our coordinator.properties: oozie.use.system.libpath=true oozie.libpath= Here are my questions: When share libraries are copied to…
Terminal User
  • 803
  • 3
  • 12
  • 21
8
votes
4 answers

oozie -- Output data exceeds its limit [2048]

I am trying to run a simple workflow executing a hive script. This hive script just calls joining(tables is very large); Once the hive script execution ends I was expecting to see the workflow status changing from RUNNING to successful, but this is…
jingtao
  • 81
  • 1
  • 1
  • 3
1
2 3
99 100