i
Characteristics of Big Data
Application of Big Data Processing
Introduction to BIG DATA
Where to get Big Data?
Types of Big Data
Storage layer - HDFS (Hadoop Distributed File System)
MapReduce
YARN
How Hadoop works?
Hadoop Eco System
Hadoop Architecture
Hadoop Installation & Environment Setup
Setting Up A Single Node Hadoop Cluster
Ubuntu User Configuration
SSH Setup With Key Generation
Disable IPv6
Download and Install Hadoop 3.1.2
Working with Configuration Files
Start The Hadoop instances
Hadoop Distributed File System (HDFS)
HDFS Features and Goals
HDFS Architecture
Read Operations in HDFS
Write Operations In HDFS
HDFS Operations
YARN
YARN Features
YARN Architecture
Resource Manager
Node Manager
Application Master
Container
Application Workflow in Hadoop YARN
Hadoop MapReduce
How MapReduce Works?
MapReduce Examples with Python
Running The MapReduce Program & Storing The Data File To HDFS
Create A Python Script
Hadoop Environment Setup
Execute The Script
Apache Hive Definition
Why Apache Hive?
Features Of Apache Hive
Hive Architecture
Hive Metastore
Hive Query Language
SQL vs Hive
Hive Installation
Apache Pig Definition
MapReduce vs. Apache Pig vs. Hive
Apache Pig Architecture
Installation Process Of Apache Pig
Execute Apache Pig Script
Hadoop Eco Components
NoSQL Data Management
Apache Hbase
Apache Cassandra
Mongodb
Introduction To Kafka
The Architecture of Apache Flume
Apache Spark Ecosystem
The hive was developed by Facebook and also one of the technologies that are being used to address the requirements on Facebook. They had faced a lot of challenges before the implementation of the Hive. Problems like the size of the data being generated increased or exploded, making it very difficult to handle them. The traditional Relational database could not handle the pressure. As a consequence, Facebook was looking out for better options and to overcome this problem; Facebook initially tried using the MapReduce framework. But it has complexity in programming and mandatory java knowledge making it an impractical approach. After a long effort, Apache Hive allowed them to overcome the challenges they were facing. They are now performing the following jobs using Hive:
Apache Hive tables are directly specified in the HDFS
1. Schema flexibility and evolution
2. Tables can be portioned and bucketed
3. Apache Hive saves from writing complex Hadoop
4. JDBC/ODBC drivers are available
MapReduce jobs for ad-hoc requirements. Hence, the hive provides analysis, summarization, and query of data. Hive is scalable and very fast. Since Apache Hive is a SQL like a language, it becomes straightforward for the SQL programmers to learn and implement Hive Queries in the data processing.
Hive eliminates MapReduce's complexity by offering an interface that allows the user to send SQL queries. So, now, business analysts can work with Big Data using Hive and generate meaningful insights. It also provides file access to various data stores like HDFS and HBase. The most crucial feature of Apache Hive is that to learn Hive; we don't have to learn Java.
Don't miss out!