oozie -- Output data exceeds its limit [2048]

Question

I am trying to run a simple workflow executing a hive script. This hive script just calls joining(tables is very large); Once the hive script execution ends I was expecting to see the workflow status changing from RUNNING to successful, but this is not happening.

This is the content of the workflow log:

2016-05-31 15:52:34,590 WARN 

org.apache.oozie.action.hadoop.HiveActionExecutor: 
SERVER[hadoop02] U
SER[scapp] 
GROUP[-] 
TOKEN[] 
APP[wf-sqoop-hive-agreement] 
JOB[0000001-160531143657136-oozie-oozi-W] 
ACTION[0000001-160531143657136-oozie-oozi-W@hive-query-agreement] Launcher 
ERROR, reason: Main class [org.apache.oozie.action.hadoop.HiveMain], exception invoking main(), Output data exceeds its limit [2048] 2016-05-31 15:52:34,591 

WARN org.apache.oozie.action.hadoop.HiveActionExecutor: 
SERVER[hadoop02] 
USER[scapp] 
GROUP[-] 
TOKEN[] 
APP[wf-sqoop-hive-agreement] 
JOB[0000001-160531143657136-oozie-oozi-W] 
ACTION[0000001-160531143657136-oozie-oozi-W@hive-query-agreement] 
Launcher exception: Output data exceeds its limit [2048] 
org.apache.oozie.action.hadoop.LauncherException: Output data exceeds its limit [2048]  
at org.apache.oozie.action.hadoop.LauncherMapper.getLocalFileContentStr(LauncherMapper.java:415)    
at org.apache.oozie.action.hadoop.LauncherMapper.handleActionData(LauncherMapper.java:391)  
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:275) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)  
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)  
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)   
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)     
at java.security.AccessController.doPrivileged(Native Method)   
at javax.security.auth.Subject.doAs(Subject.java:415)   
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)     
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

Weird. That error would make sense for a Shell or Java action with `` flag but too much key/value data in the output, but that's a Hive action, no output to capture and process in Oozie. Unless you run a plain SELECT that vomits results to StdOut -- which would be stupid for a batch job scheduled by Oozie (why want to flood the YARN logs with SELECT results that nobody will be able to access?) — Samson Scharfrichter, May 31 '16 at 10:16

score 9 · Answer 1 · answered Aug 09 '16 at 20:03

@BorderStark I don't think the property signifies its size in MB. The size is in "characters" i.e. bytes according to following entry in oozie-default.xml file.

<property>
     <name>oozie.action.max.output.data</name>
     <value>2048</value>
     <description>
         Max size in characters for output data.
     </description>
 </property>

score 6 · Answer 2 · answered Jul 05 '17 at 11:23

I assume that you might have included <capture-output> element in your hive action or any other actions of the workflow. Try removing that element from the workflow and run it again.

<capture-output> will hold the STDOUT of the ssh command and is limited to 2KiB [2048 bytes]

You can learn more about it here

BorderStark · Answer 3 · 2016-10-30T19:39:16.773

4

It is related to below property, can you increase the value for it and try again?

oozie-default.xml

<property>
<name>oozie.action.max.output.data</name>
<value>XXXX</value>
</property>

Ambari: Add this in Oozie service configuration -> oozie.action.max.output.data=4096

Increase the value as much as neccesary in order to get the results of your query. Currently the results are exceeding 2048B, try doubling the value.

edited Oct 30 '16 at 19:39

answered Jun 01 '16 at 13:42

BorderStark

121
1
4

2

I can confirm that this worked for me when I got the same error (although I was using the ssh action). If you are using Cloudera, the parameter can be adjusted under Cloudera Manager > Oozie > Configuration > “Oozie Server Advanced Configuration Snippet (Safety Valve) for oozie-site.xml” . I set it to 8192, just to be sure. – Kim Moritz Oct 25 '18 at 12:35

score 2 · Answer 4 · answered Jun 01 '16 at 10:49

I think execution of your HIVE query results huge output and its not being redirected to somewhere.

I suggest the output of your select query should go into somewhere in HDFS, for that you need to redirect the output of your select query to some external/internal HIVE tables.

Refer: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Writingdataintothefilesystemfromqueries

oozie -- Output data exceeds its limit [2048]

4 Answers4