Questions tagged [hfile]

File format for hbase. A file of sorted key/value pairs. Both keys and values are byte arrays.

File format for hbase. A file of sorted key/value pairs. Both keys and values are byte arrays.

In HBase 0.20, MapFile is replaced by HFile: a specific map file implementation for HBase. The idea is quite similar to MapFile, but it adds more features than just a plain key/value file. Features such as support for metadata and the index is now kept in the same file.

In HBase 0.92, HFile v2 features improved speed, memory, and cache usage.

Blog: http://blog.cloudera.com/blog/2012/06/hbase-io-hfile-input-output/

Class: https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/io/hfile/HFile.html

34 questions
10
votes
2 answers

Hbase FuzzyRowFilter how jumping of keys work

I know that fuzzy row filter takes two parameters first being row key and second being fuzzy logic. What i understood from the corresponding java class FuzzyRowFilter is, the filter evaluates the current row and try to compute the next higher row…
5
votes
1 answer

HBase Scan based on specific HFile(s) as roots

Is there any Scan/Filter API with the following behavior? Given time range, I would like the scanner to include data from HFiles out of range, for row keys included in the HFiles which are in range. The idea is to scan in-memory indexes of all…
shay__
  • 3,217
  • 14
  • 30
5
votes
2 answers

Why does HBase need to store the Column Family for every Value?

Because HBase tables are sparse tables, HBase stores for every cell not only the value, but all the information required to identify the cell (often described as the Key, not to be confused with the RowKey). The Key looks as…
4
votes
2 answers

How to move HBase tables to HDFS in Parquet format?

I have to build a tool which will process our data storage from HBase(HFiles) to HDFS in parquet format. Please suggest one of the best way to move data from HBase tables to Parquet tables. We have to move 400 million records from HBase to Parquet.…
Pardeep Sharma
  • 521
  • 5
  • 17
4
votes
1 answer

HBase: How does data get written in a sorted manner into HFile?

I had a fairly basic doubt on HFiles. When a put/insert request is initiated, the value is first written into the WAL and then into the memstore. The values in the memstore is stored in the same sorted manner as in the HFile. Once the memstore is…
user3031097
  • 157
  • 2
  • 13
2
votes
4 answers

Massive inclusion of .h files in my code

I'm a programmer for several years. I was always told (and told others) that you should include in your .c files only the .h files that you need. Nothing more, nothing less. But let me ask - WHY? Using today's compilers I can include the entire h…
user972014
  • 1,829
  • 32
  • 58
2
votes
1 answer

HBase Bulk Load MapReduce HFile exception (netty jar)

I am attempting to run a simple MapReduce process to write HFiles for later import into an HBase table. When the job is submitted: hbase com.pcoa.Driver /test /bulk pcoa I am getting the following exception indicating that netty-3.6.6.Final.jar…
2
votes
2 answers

Reduce job pending in HFileOutputFormat

I am using Hbase:0.92.1-cdh4.1.2, and Hadoop:2.0.0-cdh4.1.2 I have a mapreduce program that will load data from HDFS to HBase using HFileOutputFormat in cluster mode. In that mapreduce program i'm using HFileOutputFormat.configureIncrementalLoad()…
user2742998
2
votes
1 answer

Are there any libraries to work with HFile format in C++

Searching for "hfile cpp" was quite an experience, that didn't end well. So I'm asking here if there are libraries that support HBase HFile manipulations (reading, writing, mapping to memory) in C++? HFile is implementation of Google's SSTable…
ddinchev
  • 29,115
  • 26
  • 82
  • 124
1
vote
4 answers

Basic ODR violation: member functions in .h files

Disclaimer: This is probably a basic question, but I'm a theoretical physicist by training trying to learn to code properly, so please bear with me. Let's say that I want to model a fairly involved physical system. In my understanding, one way of…
storluffarn
  • 81
  • 1
  • 10
1
vote
0 answers

How to fix 'No symbol could be loaded from org.apache.hbase.classification.InterfaceAudience'?

I'm trying to prepare a DataFrame to be stored in HFile format on HBase using Apache Spark. I'm using Spark 2.1.0, Scala 2.11 and HBase 1.1.2 Here is my code: val df = createDataframeFromRow(Row("mlk", "kpo", "opi"), "a b c") val cols =…
1
vote
0 answers

Unable to insert more than 3 columns in a column family

I am new to HBase and have a issue which i am having trouble finding an answer in Google. I am trying to bulk insert data from Hive to HBase using the salted Table approach as described in:…
1
vote
1 answer

Cloudera CDH 5.7.2 / HBase: How to Set hfile.format.version?

With CDH 5.7.2-1.cdh5.7.2.po.18, I am trying to use Cloudera Manager to configure HBase to use visibility labels and authorizations, as described in the Cloudera Community post below: Cloudera Manager Hbase Visibility Labels Using Cloudera Manager,…
1
vote
1 answer

Spark job failed due to not serializable objects

I'm running a spark job to generate HFiles for my HBase data store. It used to be working fine with my Cloudera cluster, but when we switched to EMR cluster, it fails with following stacktrace: Serialization stack: - object not serializable…
Fisher Coder
  • 2,388
  • 10
  • 32
  • 65
1
vote
1 answer

Cannot run Spark jobs for large datasets

I wrote a Spark job to read from Hive data in S3 and generate HFiles. This job works fine when reading only one ORC file (about 190 MB), however, when I used it to read the entire S3 directory, about 400 ORC files, so about 400*190 MB = 76 GB data,…
Fisher Coder
  • 2,388
  • 10
  • 32
  • 65
1
2 3