i
Characteristics of Big Data
Application of Big Data Processing
Introduction to BIG DATA
Where to get Big Data?
Types of Big Data
Storage layer - HDFS (Hadoop Distributed File System)
MapReduce
YARN
How Hadoop works?
Hadoop Eco System
Hadoop Architecture
Hadoop Installation & Environment Setup
Setting Up A Single Node Hadoop Cluster
Ubuntu User Configuration
SSH Setup With Key Generation
Disable IPv6
Download and Install Hadoop 3.1.2
Working with Configuration Files
Start The Hadoop instances
Hadoop Distributed File System (HDFS)
HDFS Features and Goals
HDFS Architecture
Read Operations in HDFS
Write Operations In HDFS
HDFS Operations
YARN
YARN Features
YARN Architecture
Resource Manager
Node Manager
Application Master
Container
Application Workflow in Hadoop YARN
Hadoop MapReduce
How MapReduce Works?
MapReduce Examples with Python
Running The MapReduce Program & Storing The Data File To HDFS
Create A Python Script
Hadoop Environment Setup
Execute The Script
Apache Hive Definition
Why Apache Hive?
Features Of Apache Hive
Hive Architecture
Hive Metastore
Hive Query Language
SQL vs Hive
Hive Installation
Apache Pig Definition
MapReduce vs. Apache Pig vs. Hive
Apache Pig Architecture
Installation Process Of Apache Pig
Execute Apache Pig Script
Hadoop Eco Components
NoSQL Data Management
Apache Hbase
Apache Cassandra
Mongodb
Introduction To Kafka
The Architecture of Apache Flume
Apache Spark Ecosystem
Step 17 Configure ~/.bashrc: Add the following lines to the end of the ~/.bsahrc file of the user hduser. First, we will set Map-reduce home, HDFS home, YARN home, Common home to HADOOP_HOME. Then set the JAVA_HOME and finally set the Hadoop and Java path to the PATH variable.
# Set Hadop Home export HADOOP_HOME=/usr/local/hadoop export HADOOP_MAPRED_HOME=${HADOOP_HOME} export HADOOP_COMMON_HOME=${HADOOP_HOME} export HADOOP_HDFS_HOME=${HADOOP_HOME} export YARN_HOME=${HADOOP_HOME} export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
#Set Java Home export JAVA_HOME=/home/ehsan/Downloads/jdk1.8.0_231 export PATH=$PATH:$HADOOP_HOME/bin: $PATH:$JAVA_HOME/bin: $HADOOP_HOME/sbin
|
Save and exit the file. Now we should apply all the changes to the current running system.
Step 18: Configure hadoop-env.sh: You will find all the Hadoop configuration files in the location "/usr/local/hadoop/etc/hadoop". It is always required to make changes in those configuration files according to your Hadoop infrastructure.
To develop and run Hadoop programs in java, you have to set the java environment variables in hadoop-env.sh, file by adding JAVA_HOME value with the location of java in your system.
export HADOOP_OPTS= -Djava.net.preferIPv4Stack=true export HADOOP_HOME_WARN_SUPPRESS=”TRUE” export JAVA_HOME=/home/ehsan/Downloads/jdk1.8.0_231
|
Exit and save the current configuration.
Step 19 Configure core-site.xml: The core-site.xml file generally contains information such as memory allocated for the file system, the port number used for Hadoop instance, memory limit for storing the data, and size of reading/Write buffers. Go to the core-site.xml and add the following properties in between , tags.
hadoop.tmp.dir /app/hadoop/tmp
fs.default.name hdfs://localhost:9000
|
Exit and save the current configuration.
Now create the /app/hadoop/tmp directory and permit it.
sudo mkdir –p /app/hadoop/tmp sudo chown hduser:hadoop –R /app/hadoop/tmp sudo chmod 750 /app/hadoop/tmp |
Step 20 Configure mapred-site.xml: This file is used to specify which MapReduce framework we are using. Open the mapred-site.xml file and add following command in between , tags in this file.
mapreduce.framework.name yarn
mapreduce.jobhistory.address localhost:10020
|
Exit and save the current configuration.
Step 21 Configure yarn-site.xml: This file is especially used to configure yarn into the Hadoop cluster. Go to the yarn-site.xml file and add the following properties in between the , tags in this file.
yarn.nodemanager.aux-services mapreduce_shuffle
yarn.nodemanager.auxservices.mapreduce.shuffle.class org.apache.hadoop.mapred.ShuffleHandler
|
Save and exit to effect the changes on yarn-site.xml.
Step 22 Configure hdfs-site.xml: The hdfs-site.xml file contains information such as the value of replication data, namenode path, and datanode paths of your local file systems.
dfs.replication 1 dfs.namenode.name.dir file:/usr/local/hadoop_tmp/hdfs/namenode dfs.datanode.data.dir file:/usr/local/hadoop_tmp/hdfs/datanode |
Now create two directories as of the hdfs-site.xml and set permission on those.
sudo mkdir –p /usr/local/hadoop_tmp/hdfs/namenode sudo mkdir –p /usr/local/hadoop_tmp/hdfs/datanode sudo chown hduser:hadoop –R /usr/local/hadoop_tmp/ |
If you forget to set the required ownership and permissions, you will receive java.io.IOException when you try to format the namenode.
Don't miss out!