i
Characteristics of Big Data
Application of Big Data Processing
Introduction to BIG DATA
Where to get Big Data?
Types of Big Data
Storage layer - HDFS (Hadoop Distributed File System)
MapReduce
YARN
How Hadoop works?
Hadoop Eco System
Hadoop Architecture
Hadoop Installation & Environment Setup
Setting Up A Single Node Hadoop Cluster
Ubuntu User Configuration
SSH Setup With Key Generation
Disable IPv6
Download and Install Hadoop 3.1.2
Working with Configuration Files
Start The Hadoop instances
Hadoop Distributed File System (HDFS)
HDFS Features and Goals
HDFS Architecture
Read Operations in HDFS
Write Operations In HDFS
HDFS Operations
YARN
YARN Features
YARN Architecture
Resource Manager
Node Manager
Application Master
Container
Application Workflow in Hadoop YARN
Hadoop MapReduce
How MapReduce Works?
MapReduce Examples with Python
Running The MapReduce Program & Storing The Data File To HDFS
Create A Python Script
Hadoop Environment Setup
Execute The Script
Apache Hive Definition
Why Apache Hive?
Features Of Apache Hive
Hive Architecture
Hive Metastore
Hive Query Language
SQL vs Hive
Hive Installation
Apache Pig Definition
MapReduce vs. Apache Pig vs. Hive
Apache Pig Architecture
Installation Process Of Apache Pig
Execute Apache Pig Script
Hadoop Eco Components
NoSQL Data Management
Apache Hbase
Apache Cassandra
Mongodb
Introduction To Kafka
The Architecture of Apache Flume
Apache Spark Ecosystem
Metastore stores the metadata information using RDBMS and an open-source Object-Relational Model (ORM) layer called Data Nucleus. That converts the object representation into the structural, relational schema and vice versa. Low latency is the reason to choose RDBMS instead of HDFS. We can implement metastore in the following three configurations:
Embedded Metastore:
Both the metastore and the Hive services run in the same JVM by default using an embedded Derby Database instance, where metadata is stored in the local disk. This is known as embedded metastore configuration. In this case, the metastore database can be connected at a time by only one user. If we start the second instance of a Hive driver, we will get an error. This is ideal for unit testing but not practical solutions.
Fig: Hive Embedded Metastore
Local Metastore:
This configuration enables us to have multiple Hive sessions so that multiple users can access the metastore at the same time. Using any JDBC-compliant database such as MySQL, this is accomplished. This will be done in a different JVM or machine other than the one running in the same JVM for the Hive system and metastore service. In general, implementing a MySQL server as the metastore database is the most popular choice.
Fig: Hive Local Metastore
Remote Metastore:
The metastore service runs on its own independent JVM in the remote metastore configuration and not in the Hive service JVM. Specific processes use Thrift Network APIs to connect with the metastore server. In this case, to provide more availability, we can have one or more metastore servers. The main advantage of using remote metastore is you do not need to share JDBC login credentials with each Hive user to access the metastore database.
Fig: Hive Remote Metastore
Don't miss out!