i
Characteristics of Big Data
Application of Big Data Processing
Introduction to BIG DATA
Where to get Big Data?
Types of Big Data
Storage layer - HDFS (Hadoop Distributed File System)
MapReduce
YARN
How Hadoop works?
Hadoop Eco System
Hadoop Architecture
Hadoop Installation & Environment Setup
Setting Up A Single Node Hadoop Cluster
Ubuntu User Configuration
SSH Setup With Key Generation
Disable IPv6
Download and Install Hadoop 3.1.2
Working with Configuration Files
Start The Hadoop instances
Hadoop Distributed File System (HDFS)
HDFS Features and Goals
HDFS Architecture
Read Operations in HDFS
Write Operations In HDFS
HDFS Operations
YARN
YARN Features
YARN Architecture
Resource Manager
Node Manager
Application Master
Container
Application Workflow in Hadoop YARN
Hadoop MapReduce
How MapReduce Works?
MapReduce Examples with Python
Running The MapReduce Program & Storing The Data File To HDFS
Create A Python Script
Hadoop Environment Setup
Execute The Script
Apache Hive Definition
Why Apache Hive?
Features Of Apache Hive
Hive Architecture
Hive Metastore
Hive Query Language
SQL vs Hive
Hive Installation
Apache Pig Definition
MapReduce vs. Apache Pig vs. Hive
Apache Pig Architecture
Installation Process Of Apache Pig
Execute Apache Pig Script
Hadoop Eco Components
NoSQL Data Management
Apache Hbase
Apache Cassandra
Mongodb
Introduction To Kafka
The Architecture of Apache Flume
Apache Spark Ecosystem
Resource Manager
It is the ultimate authority in the allocation of resources.
It transfers parts of requests to corresponding node managers when receiving the requests for processing where the actual processing takes place.
It is the cluster resource arbitrator and decides to allocate the resources available to competing applications.
Optimizes the use of clusters such as keeping all resources in use all the time against various constraints such as guarantees of availability, equality, and SLAs
Resource Manager has two major components:
a) Scheduler
1. It is the responsibility of the scheduler to allocate resources to different running applications subject to space limitations, queues, etc.
2. In Resource Manager, it is called a pure scheduler, which means it does not monitor or track the status of the application.
3. The scheduler does not guarantee to restart the failed tasks if there is an application failure or hardware failure.
4. Perform scheduling, depending on the resource requirements of the application.
5. It has a plug-in policy which is responsible for partitioning the resources of the cluster between the different applications. It has two such plug-ins: Capacity Scheduler and Fair Scheduler, actually used as Resource Manager Schedulers.
b) Application Manager
Application Manager is responsible for accepting job submissions.
Negotiates the Resource Manager's first container to execute the application-specific Application Master.
Manages the Application Masters running in a cluster and provides service in case of failure to restart the Application Master container.
Don't miss out!