i
Characteristics of Big Data
Application of Big Data Processing
Introduction to BIG DATA
Where to get Big Data?
Types of Big Data
Storage layer - HDFS (Hadoop Distributed File System)
MapReduce
YARN
How Hadoop works?
Hadoop Eco System
Hadoop Architecture
Hadoop Installation & Environment Setup
Setting Up A Single Node Hadoop Cluster
Ubuntu User Configuration
SSH Setup With Key Generation
Disable IPv6
Download and Install Hadoop 3.1.2
Working with Configuration Files
Start The Hadoop instances
Hadoop Distributed File System (HDFS)
HDFS Features and Goals
HDFS Architecture
Read Operations in HDFS
Write Operations In HDFS
HDFS Operations
YARN
YARN Features
YARN Architecture
Resource Manager
Node Manager
Application Master
Container
Application Workflow in Hadoop YARN
Hadoop MapReduce
How MapReduce Works?
MapReduce Examples with Python
Running The MapReduce Program & Storing The Data File To HDFS
Create A Python Script
Hadoop Environment Setup
Execute The Script
Apache Hive Definition
Why Apache Hive?
Features Of Apache Hive
Hive Architecture
Hive Metastore
Hive Query Language
SQL vs Hive
Hive Installation
Apache Pig Definition
MapReduce vs. Apache Pig vs. Hive
Apache Pig Architecture
Installation Process Of Apache Pig
Execute Apache Pig Script
Hadoop Eco Components
NoSQL Data Management
Apache Hbase
Apache Cassandra
Mongodb
Introduction To Kafka
The Architecture of Apache Flume
Apache Spark Ecosystem
When we load big data on Hadoop, the first thing we think, how to process this data? Collecting vast amounts of unstructured data does not help unless there is an effective way to draw meaningful insights from it. We have many compelling alternatives to analyze the data like the Hadoop MapReduce or other components like Apache Pig and Hive. They have their processing way, and they work effectively. In the below section, I have summarized their properties:
MapReduce is a compiled language, whereas Pig is a high-level scripting language, and Hive is a SQL like a query language.
Pig and Hive provide a higher level of abstraction, whereas Hadoop MapReduce delivers a low level of abstraction.
Hadoop MapReduce requires more lines of code compared to Pig and Hive. Hive requires very few lines of SQL like queries when compared to Pig and MapReduce.
MapReduce requires more development effort than Apache Pig and Hive.
Pig and Hive coding approaches are much slower than a fully tuned Hadoop MapReduce program.
For executing jobs in Pig and Hive, Hadoop developers need not worry about any version mismatch.
There is a minimal possibility for the developer to write java level bugs when coding in Pig or Hive.
Apache Pig has problems in dealing with unstructured data like images, videos, audio, text that is ambiguously delimited, log data, etc.
The pig cannot deal with the poor design of XML or JSON and flexible schemas.
Don't miss out!