i
Characteristics of Big Data
Application of Big Data Processing
Introduction to BIG DATA
Where to get Big Data?
Types of Big Data
Storage layer - HDFS (Hadoop Distributed File System)
MapReduce
YARN
How Hadoop works?
Hadoop Eco System
Hadoop Architecture
Hadoop Installation & Environment Setup
Setting Up A Single Node Hadoop Cluster
Ubuntu User Configuration
SSH Setup With Key Generation
Disable IPv6
Download and Install Hadoop 3.1.2
Working with Configuration Files
Start The Hadoop instances
Hadoop Distributed File System (HDFS)
HDFS Features and Goals
HDFS Architecture
Read Operations in HDFS
Write Operations In HDFS
HDFS Operations
YARN
YARN Features
YARN Architecture
Resource Manager
Node Manager
Application Master
Container
Application Workflow in Hadoop YARN
Hadoop MapReduce
How MapReduce Works?
MapReduce Examples with Python
Running The MapReduce Program & Storing The Data File To HDFS
Create A Python Script
Hadoop Environment Setup
Execute The Script
Apache Hive Definition
Why Apache Hive?
Features Of Apache Hive
Hive Architecture
Hive Metastore
Hive Query Language
SQL vs Hive
Hive Installation
Apache Pig Definition
MapReduce vs. Apache Pig vs. Hive
Apache Pig Architecture
Installation Process Of Apache Pig
Execute Apache Pig Script
Hadoop Eco Components
NoSQL Data Management
Apache Hbase
Apache Cassandra
Mongodb
Introduction To Kafka
The Architecture of Apache Flume
Apache Spark Ecosystem
Flume is a distributed, available, and reliable service for efficiently gathering, aggregating, and moving massive amounts of data. The distributed property of Flume makes it available and dependable service. In simple terms, It is a data ingestion tool that transfers data from one place to another and guarantees data delivery.
The Architecture of Flume:
Let's discuss the below diagram of Flume architecture:
Fig: Architecture of Flume
1. First, an event is a single unit of data that is transported by Flume.
2. A client is an entity that creates events that are ingested via Flume.
3. A source is how the event enters into Flume. There are two types of sources in Flume:
4. Passively Waiting is a type of source that waits for events to be sent to it via the Client.
5. Polling actively is a type of source that continues to request events from the Client. Sources send events to the Channel.
6. The channel is the bridge between source to sink. Channels are used for buffers in order not to overload the sink with incoming events from the source.
7. Sinks are the method used by Flume to deliver the data to the destination. Flume bunches events into transactions while in the channel to help transmit data through the sink which can write transactions into file systems such as HDFS, or it can pass the data into another Flume agent.
Apache Flume has a very straightforward architecture that enables minor moving parts to pass through the data. It's great to take streaming data from the source and write it in batches to HDFS. It also has the ability to write to HDFS in a real-time fashion.
Don't miss out!