i
Characteristics of Big Data
Application of Big Data Processing
Introduction to BIG DATA
Where to get Big Data?
Types of Big Data
Storage layer - HDFS (Hadoop Distributed File System)
MapReduce
YARN
How Hadoop works?
Hadoop Eco System
Hadoop Architecture
Hadoop Installation & Environment Setup
Setting Up A Single Node Hadoop Cluster
Ubuntu User Configuration
SSH Setup With Key Generation
Disable IPv6
Download and Install Hadoop 3.1.2
Working with Configuration Files
Start The Hadoop instances
Hadoop Distributed File System (HDFS)
HDFS Features and Goals
HDFS Architecture
Read Operations in HDFS
Write Operations In HDFS
HDFS Operations
YARN
YARN Features
YARN Architecture
Resource Manager
Node Manager
Application Master
Container
Application Workflow in Hadoop YARN
Hadoop MapReduce
How MapReduce Works?
MapReduce Examples with Python
Running The MapReduce Program & Storing The Data File To HDFS
Create A Python Script
Hadoop Environment Setup
Execute The Script
Apache Hive Definition
Why Apache Hive?
Features Of Apache Hive
Hive Architecture
Hive Metastore
Hive Query Language
SQL vs Hive
Hive Installation
Apache Pig Definition
MapReduce vs. Apache Pig vs. Hive
Apache Pig Architecture
Installation Process Of Apache Pig
Execute Apache Pig Script
Hadoop Eco Components
NoSQL Data Management
Apache Hbase
Apache Cassandra
Mongodb
Introduction To Kafka
The Architecture of Apache Flume
Apache Spark Ecosystem
HBase is a column-oriented database that gives the user a dynamic database schema. It is called the Hadoop database because though it is a NoSQL database, it runs on top of Hadoop. NoSQL runs on the Hadoop Distributed File System (HDFS), it blends Hadoop's scalability with real-time data access as a key/value store and Map Reduce's deep analytical capabilities. In addition, HBase also supports other high-level languages for data processing. The unique features of Apache HBase are Consistency, High Availability and many more.
HBase can store huge quantities of terabyte-to-petabyte data. HBase tables are made up of billions of rows and millions of columns. HBase is designed for low latency operations, with specific characteristics compared to traditional relational models.
Features of HBase:
HBase offers consistent reads and writes.
While one read or write process is going on, all other processes are prevented from performing any read or write operations, that is "Atomic read and write". So, on a row-level, HBase provides atomic read and write.
HBase provides automatic and manual splitting of regions into smaller sub-regions, as soon as it reaches a threshold size, which reduces I/O time and overhead.
It also provides LAN and WAN, enabling failover and recovery. In fact, at the core, there is a master server, which handles monitoring both the region servers and metadata for the cluster.
HBase supports both linear and modular scalability.
As well as Hadoop / HDFS integration, HBase will operate on top of other file systems.
HBase supports data replication across clusters.
HBase supports Failover and load sharing
HBase supports MapReduce, which enables it to parallel processing of a large volume of data. It also supports back-up of Hadoop MapReduce jobs in HBase tables.
An optimal application can be made here since searching happens on the range of rows, HBase stores row keys in lexicographical orders. Hence, an optimized request can be built by using these sorted row keys and timestamp.
While performing real-time query processing, it supports block cache and Bloom filters.
For faster lookups, HBase internally uses Hash tables and offers random access, simultaneously stores the data in indexed HDFS files.
HBase supports both structured and semi-structured data
As HBase is schema-less, there is no concept of fixed columns schema. Hence, it defines only column families.
For non-Java front-ends, HBase supports Thrift and REST API.
Storage Mechanism in HBase:
HBase is a column-oriented database, where data is stored in tables, it has RowId. RowId is the collection of several column families that are present in the table. The tables are sorted by RowId.
The column families in the schema are key-value pairs. Upon detailed observation, it can be found that each column family has multiple columns. The column values are stored into disk memory. Each cell of the table has its own Metadata like timestamp and other information.
Rowid
|
Column Family 1 |
Column Family 2 |
Column Family 3 |
||||||
Col 1 |
Col 2 |
Col 3 |
Col 1 |
Col 2 |
Col 3 |
Col 1 |
Col 2 |
Col 3 |
|
1 |
|
|
|
|
|
|
|
|
|
2 |
|
|
|
|
|
|
|
|
|
3 |
|
|
|
|
|
|
|
|
|
4 |
|
|
|
|
|
|
|
|
|
Fig: Storage Mechanism in HBase
The following are the key terms representing the table schema of HBase:
Key terms representing table schema of HBase:
Table: Collection of rows present.
Row: Collection of available column families.
Column Family: Set of columns.
Column: Set of key-value pairs
Namespace: Logical grouping of tables.
Cell: A {row, column, version} tuple precisely specifies a cell definition in HBase.
Don't miss out!