Questions tagged [apache-tez]

The Apache Tez project is aimed at building an application framework which allows for a complex directed-acyclic-graph of tasks for processing data.

The Apache Tez project is aimed at building an application framework which allows for a complex directed-acyclic-graph of tasks for processing data. It is currently built atop Apache Hadoop YARN

See Hive-on-Tez configuration properties.

172 questions
9
votes
2 answers

Why hive_staging file is missing in AWS EMR

Problem - I am running 1 query in AWS EMR. It is failing by throwing exception - java.io.FileNotFoundException: File…
devsda
  • 3,746
  • 9
  • 47
  • 82
9
votes
1 answer

could only be replicated to 0 nodes instead of minReplication (=1). There are 4 datanode(s) running and no node(s) are excluded in this operation

I don't know how to fix this error: Vertex failed, vertexName=initialmap, vertexId=vertex_1449805139484_0001_1_00, diagnostics=[Task failed, taskId=task_1449805139484_0001_1_00_000003, diagnostics=[AttemptID:attempt_1449805139484_0001_1_00_000003_0…
Mona Jalal
  • 24,172
  • 49
  • 166
  • 311
8
votes
1 answer

Why would someone run Spark / Flink on Tez?

In the Tez paper from Saha et al., the following modular architecture of Hadoop 2 with Tez is shown: Why would someone run Spark/Flink on Tez? What are the advantages? Better utilization of YARN?
j9dy
  • 1,779
  • 1
  • 20
  • 35
7
votes
2 answers

Performance of Apache Drill

Are there any performance benchmark(genuine ones) that compare Stinger vs Impala vs Drill? Also, which is preferred - my use case will be mainly towards ad-hoc interactive queries on top of Hive. Thanks.
Sai
  • 107
  • 1
  • 2
  • 8
7
votes
1 answer

How wordCount mapReduce jobs, run on hadoop yarn cluster with apache tez?

As the github page of tez says, tez is very simple and at its heart has just two components: The data-processing pipeline engine, and A master for the data-processing application, where-by one can put together arbitrary data-processing 'tasks'…
SonOfSun
  • 783
  • 2
  • 6
  • 25
7
votes
0 answers

Getting error while running query on hive over tez

Getting error while running query on hive over tez. As per logs, hive is failing while copying tez jars to a hdfs location on start of tez session.Below is the complete log obtained from hive log file : 2015-06-19 01:23:52,289 INFO …
Saurabh
  • 169
  • 2
  • 6
6
votes
2 answers

How to reduce generating files of SQL "Alter Table/Partition Concatenate" in Hive?

Hive version: 1.2.1 Configuration: set hive.execution.engine=tez; set hive.merge.mapredfiles=true; set hive.merge.smallfiles.avgsize=256000000; set hive.merge.tezfiles=true; HQL: ALTER TABLE `table_name` PARTITION (partion_name1 = 'val1',…
Po Zhou
  • 565
  • 1
  • 5
  • 16
5
votes
3 answers

Is Hive faster than Spark?

After reading What is hive, Is it a database?, a colleague yesterday mentioned that he was able to filter a 15B table, join it with another table after doing a "group by", which resulted in 6B records, in only 10 minutes! I wonder if this would be…
gsamaras
  • 66,800
  • 33
  • 152
  • 256
5
votes
0 answers

Hive index creation using TEZ

Is it possible to generate indexes using Tez instead of MR job? When we try to set the hive.execution.engine=Tez and try to generate index then the index creation is failing. Below are the list of commands that i have used: CREATE TABLE…
4
votes
1 answer

Difference between hive.tez.container.size and tez.task.resource.memory.mb

Would someone know and explain to me please the difference between these settings of Tez ? hive.tez.container.size and tez.task.resource.memory.mb thanks.
Ulky Igor
  • 178
  • 1
  • 11
4
votes
1 answer

Why hdfs throwing LeaseExpiredException in Hadoop cluster (AWS EMR)

I am getting LeaseExpiredException in hadoop cluster - tail -f /var/log/hadoop-hdfs/hadoop-hdfs-namenode-ip-172-30-2-148.log 2016-09-21 11:54:14,533 INFO BlockStateChange (IPC Server handler 10 on 8020): BLOCK* InvalidateBlocks: add…
devsda
  • 3,746
  • 9
  • 47
  • 82
4
votes
1 answer

Map-Reduce Logs on Hive-Tez

I want to get the interpretation of Map-Reduce logs after running a query on Hive-Tez ? What the lines after INFO: conveys ? Here I have attached a sample INFO : Session is already open INFO : Dag name: SELECT a.Model...) INFO : Tez session was…
Flash
  • 45
  • 4
4
votes
0 answers

Managing input split sizes in Hive running the tez engine

I want to gain a better understanding of how in the input splits are calculated in the tez engine. I am aware that the hive.input.format property can be set to either HiveInputFormat (default) or to CombineHiveInputFormat (generally accepted for…
Nitin Kumar
  • 705
  • 1
  • 10
  • 26
3
votes
1 answer

Do Tez containers run inside of YARN containers, or instead of YARN containers?

I'm running Hive + Tez on EMR and I'd like some clarity for how Tez interacts with YARN. I read in this article: Set tez.am.resource.memory.mb to be the same as yarn.scheduler.minimum-allocation-mb (the YARN minimum container size) Set…
S.S.
  • 506
  • 2
  • 15
3
votes
2 answers

Apache Hive Not Returning YARN Application Results Correctly

I'm running a from-scratch cluster on AWS EC2. I have an external table (partitioned) defined with data on S3. I'm able to query this table and receive results to the console with a simple select * statement: hive> set…
1
2 3
11 12