8

I am trying to run a simple workflow executing a hive script. This hive script just calls joining(tables is very large); Once the hive script execution ends I was expecting to see the workflow status changing from RUNNING to successful, but this is not happening.

This is the content of the workflow log:

2016-05-31 15:52:34,590 WARN 

org.apache.oozie.action.hadoop.HiveActionExecutor: 
SERVER[hadoop02] U
SER[scapp] 
GROUP[-] 
TOKEN[] 
APP[wf-sqoop-hive-agreement] 
JOB[0000001-160531143657136-oozie-oozi-W] 
ACTION[0000001-160531143657136-oozie-oozi-W@hive-query-agreement] Launcher 
ERROR, reason: Main class [org.apache.oozie.action.hadoop.HiveMain], exception invoking main(), Output data exceeds its limit [2048] 2016-05-31 15:52:34,591 

WARN org.apache.oozie.action.hadoop.HiveActionExecutor: 
SERVER[hadoop02] 
USER[scapp] 
GROUP[-] 
TOKEN[] 
APP[wf-sqoop-hive-agreement] 
JOB[0000001-160531143657136-oozie-oozi-W] 
ACTION[0000001-160531143657136-oozie-oozi-W@hive-query-agreement] 
Launcher exception: Output data exceeds its limit [2048] 
org.apache.oozie.action.hadoop.LauncherException: Output data exceeds its limit [2048]  
at org.apache.oozie.action.hadoop.LauncherMapper.getLocalFileContentStr(LauncherMapper.java:415)    
at org.apache.oozie.action.hadoop.LauncherMapper.handleActionData(LauncherMapper.java:391)  
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:275) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)  
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)  
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)   
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)     
at java.security.AccessController.doPrivileged(Native Method)   
at javax.security.auth.Subject.doAs(Subject.java:415)   
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)     
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
JaggenSWE
  • 1,716
  • 2
  • 21
  • 37
jingtao
  • 81
  • 1
  • 1
  • 3
  • 1
    Weird. That error would make sense for a Shell or Java action with `` flag but too much key/value data in the output, but that's a Hive action, no output to capture and process in Oozie. Unless you run a plain SELECT that vomits results to StdOut -- which would be stupid for a batch job scheduled by Oozie (why want to flood the YARN logs with SELECT results that nobody will be able to access?) – Samson Scharfrichter May 31 '16 at 10:16

4 Answers4

9

@BorderStark I don't think the property signifies its size in MB. The size is in "characters" i.e. bytes according to following entry in oozie-default.xml file.

<property>
     <name>oozie.action.max.output.data</name>
     <value>2048</value>
     <description>
         Max size in characters for output data.
     </description>
 </property>
rp1
  • 149
  • 1
  • 9
6

I assume that you might have included <capture-output> element in your hive action or any other actions of the workflow. Try removing that element from the workflow and run it again.

<capture-output> will hold the STDOUT of the ssh command and is limited to 2KiB [2048 bytes]

You can learn more about it here

Alex Raj Kaliamoorthy
  • 1,761
  • 2
  • 23
  • 35
4

It is related to below property, can you increase the value for it and try again?

oozie-default.xml

<property>
<name>oozie.action.max.output.data</name>
<value>XXXX</value>
</property> 

Ambari: Add this in Oozie service configuration -> oozie.action.max.output.data=4096

Increase the value as much as neccesary in order to get the results of your query. Currently the results are exceeding 2048B, try doubling the value.

BorderStark
  • 121
  • 1
  • 4
  • 2
    I can confirm that this worked for me when I got the same error (although I was using the ssh action). If you are using Cloudera, the parameter can be adjusted under Cloudera Manager > Oozie > Configuration > “Oozie Server Advanced Configuration Snippet (Safety Valve) for oozie-site.xml” . I set it to 8192, just to be sure. – Kim Moritz Oct 25 '18 at 12:35
2

I think execution of your HIVE query results huge output and its not being redirected to somewhere.

I suggest the output of your select query should go into somewhere in HDFS, for that you need to redirect the output of your select query to some external/internal HIVE tables.

Refer: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Writingdataintothefilesystemfromqueries

shahjapan
  • 11,781
  • 21
  • 66
  • 98