i
Characteristics of Big Data
Application of Big Data Processing
Introduction to BIG DATA
Where to get Big Data?
Types of Big Data
Storage layer - HDFS (Hadoop Distributed File System)
MapReduce
YARN
How Hadoop works?
Hadoop Eco System
Hadoop Architecture
Hadoop Installation & Environment Setup
Setting Up A Single Node Hadoop Cluster
Ubuntu User Configuration
SSH Setup With Key Generation
Disable IPv6
Download and Install Hadoop 3.1.2
Working with Configuration Files
Start The Hadoop instances
Hadoop Distributed File System (HDFS)
HDFS Features and Goals
HDFS Architecture
Read Operations in HDFS
Write Operations In HDFS
HDFS Operations
YARN
YARN Features
YARN Architecture
Resource Manager
Node Manager
Application Master
Container
Application Workflow in Hadoop YARN
Hadoop MapReduce
How MapReduce Works?
MapReduce Examples with Python
Running The MapReduce Program & Storing The Data File To HDFS
Create A Python Script
Hadoop Environment Setup
Execute The Script
Apache Hive Definition
Why Apache Hive?
Features Of Apache Hive
Hive Architecture
Hive Metastore
Hive Query Language
SQL vs Hive
Hive Installation
Apache Pig Definition
MapReduce vs. Apache Pig vs. Hive
Apache Pig Architecture
Installation Process Of Apache Pig
Execute Apache Pig Script
Hadoop Eco Components
NoSQL Data Management
Apache Hbase
Apache Cassandra
Mongodb
Introduction To Kafka
The Architecture of Apache Flume
Apache Spark Ecosystem
In this section, we will create a Pig script and execute it.
Step9: first, we are going to create a text file. We can use any text editor to do this.
$sudo nano employee |
We will add some sample lines of data on it. The sample data file contains four columns FirstName, LastName, ID, and Dept separated by tab key. Our goal is to read the content of this file from the HDFS and display the specific columns of these records.
Step 10: We have to start (start-all.sh) all the services of Hadoop as we will work on HDFS. First, we will create a pig directory on it and provide necessary permissions on it.
Step 11: In this step, we are going to store the employee text file into a pig directory. We will use the below command to do this operation.
Check the file from the Hadoop browser:
Step 12: Now, we will create a Pig script using an editor (nano). The following command will create an out. pig file inside the home directory of the hduser user.
Write the following Pig commands in out. pig file.
The first command loads the file employee into variable A with indirect schema (FName, LName, ID, and Dept). The second command loads FName, ID, and Dept data from variable A to variable B. The final line displays the content of variable B on the terminal.
Step 13: We will execute the out.pig file using the below command.
It will display the desired output in the terminal.
Don't miss out!