i
Characteristics of Big Data
Application of Big Data Processing
Introduction to BIG DATA
Where to get Big Data?
Types of Big Data
Storage layer - HDFS (Hadoop Distributed File System)
MapReduce
YARN
How Hadoop works?
Hadoop Eco System
Hadoop Architecture
Hadoop Installation & Environment Setup
Setting Up A Single Node Hadoop Cluster
Ubuntu User Configuration
SSH Setup With Key Generation
Disable IPv6
Download and Install Hadoop 3.1.2
Working with Configuration Files
Start The Hadoop instances
Hadoop Distributed File System (HDFS)
HDFS Features and Goals
HDFS Architecture
Read Operations in HDFS
Write Operations In HDFS
HDFS Operations
YARN
YARN Features
YARN Architecture
Resource Manager
Node Manager
Application Master
Container
Application Workflow in Hadoop YARN
Hadoop MapReduce
How MapReduce Works?
MapReduce Examples with Python
Running The MapReduce Program & Storing The Data File To HDFS
Create A Python Script
Hadoop Environment Setup
Execute The Script
Apache Hive Definition
Why Apache Hive?
Features Of Apache Hive
Hive Architecture
Hive Metastore
Hive Query Language
SQL vs Hive
Hive Installation
Apache Pig Definition
MapReduce vs. Apache Pig vs. Hive
Apache Pig Architecture
Installation Process Of Apache Pig
Execute Apache Pig Script
Hadoop Eco Components
NoSQL Data Management
Apache Hbase
Apache Cassandra
Mongodb
Introduction To Kafka
The Architecture of Apache Flume
Apache Spark Ecosystem
Let us discuss the essential features of Hadoop in below section:
Open source:
Apache Hadoop is an open-source project. This means that its code can be modified following business requirements.
Distributed Processing:
As data is stored in HDFS throughout the cluster in a distributed manner, data is processed on a cluster of nodes in parallel.
Fault Tolerance:
By default, three replicas of each block are stored in Hadoop throughout the cluster and can also be changed as required. So if any node comes down, with the assistance of this feature, information on that node can be readily retrieved from other nodes. The frame automatically recovers node or task failures. That is how tolerant of fault is Hadoop. It is one of the most important features of Hadoop.
Fig: Features of Hadoop
Reliability:
Despite machine failures, information is reliably stored on the computer cluster due to data replication in the cluster. In the Hadoop cluster, if any node fails, it will not affect the whole cluster. Instead, another node will replace the failed node. Hadoop cluster will continue functioning as nothing has happened. Hadoop has a built-in fault tolerance feature.
High availability:
Despite hardware failure owing to numerous copies of information, information is highly available and accessible. If a machine or a few hardware crashes, information from another route will be obtained.
Scalability:
Hadoop is exceptionally scalable in how to add new hardware to the nodes readily. This Hadoop function also offers horizontal scalability, which implies that without any downtime, new nodes can be added on the fly.
Economic Apache:
As it operates on a commodity hardware cluster, Hadoop is not very costly. No specialized machine is needed for it. Hadoop also offers enormous cost savings, as adding more nodes on the fly here is very simple. So if the demand rises, it can also boost nodes without any downtime and much pre-planning.
Easy to use:
No customer needs to handle distributed computing; all things are taken care of by the framework. So it is simple to use this Hadoop function.
Data Locality:
This one is Hadoop's unique features that made Big Data easy to handle. Hadoop operates on the principle of data locality, which states that computation is moved to information rather than computation information.
Don't miss out!